cisticola.scraper.gettr module

class cisticola.scraper.gettr.GettrScraper

Bases: Scraper

An implementation of a Scraper for Gettr, using gogettr library

can_handle(channel)

Whether or not the scraper can scrape the specified channel.

Parameters:: channel (Channel) – Channel to be scraped.
Returns:: True if the scraper is capable of scraping channel, False if not.
Return type:: bool

get_posts(channel: Channel, since: ScraperResult | None = None) → Generator[ScraperResult, None, None]

Scrape all posts from the specified Channel.

Parameters:

channel (Channel) – Channel to be scraped.
since (ScraperResult or None) – Most recently scraped ScraperResult from a previous scrape, or None if scraper has not run before.

Yields:

ScraperResult – Scraper result from a single post/comment from the specified Channel.

get_username_from_url(url)

Extract a channel’s username from its URL.

Parameters:: url (str) – URL of the channel on a given platform e.g. "https://twitter.com/EliotHiggins"
Returns:: username – Extracted username of the channel. e.g. "EliotHiggins"
Return type:: str

url_to_key(url: str, content_type: str) → str

Generate a unique identifier for media from a specified post.

Parameters:

url (str) – URL of original post. e.g. "https://twitter.com/bellingcat/status/1503397267675533313"
content_type (str) – Content-Type of media. e.g. "image/jpeg"

Returns:

key – Unique identifier for the media file from a specified post based on the original post URL and the media’s Content-Type.

Return type:

str