cisticola.scraper.gettr module
- class cisticola.scraper.gettr.GettrScraper
Bases:
ScraperAn implementation of a Scraper for Gettr, using gogettr library
- can_handle(channel)
Whether or not the scraper can scrape the specified channel.
- Parameters:
channel (Channel) – Channel to be scraped.
- Returns:
Trueif the scraper is capable of scrapingchannel,Falseif not.- Return type:
bool
- get_posts(channel: Channel, since: ScraperResult | None = None) Generator[ScraperResult, None, None]
Scrape all posts from the specified Channel.
- Parameters:
channel (Channel) – Channel to be scraped.
since (ScraperResult or None) – Most recently scraped ScraperResult from a previous scrape, or
Noneif scraper has not run before.
- Yields:
ScraperResult – Scraper result from a single post/comment from the specified Channel.
- get_profile(channel: Channel) RawChannelInfo
- get_username_from_url(url)
Extract a channel’s username from its URL.
- Parameters:
url (str) – URL of the channel on a given platform e.g.
"https://twitter.com/EliotHiggins"- Returns:
username – Extracted username of the channel. e.g.
"EliotHiggins"- Return type:
str
- url_to_key(url: str, content_type: str) str
Generate a unique identifier for media from a specified post.
- Parameters:
url (str) – URL of original post. e.g.
"https://twitter.com/bellingcat/status/1503397267675533313"content_type (str) – Content-Type of media. e.g.
"image/jpeg"
- Returns:
key – Unique identifier for the media file from a specified post based on the original post URL and the media’s Content-Type.
- Return type:
str