pip install -r requirements.txt
- Change working directory to this project root
- To run the pipeline from terminal:
python app.py
- To run the pipeline from experimental:
jupyter notebook experimental.ipynb
- Data model for tweets
@dataclass class Tweet(JSONTrait): id: int id_str: str url: str date: datetime user: User lang: str rawContent: str replyCount: int retweetCount: int likeCount: int quoteCount: int conversationId: int hashtags: list[str] cashtags: list[str] mentionedUsers: list[UserRef] links: list[TextLink] viewCount: int | None = None retweetedTweet: Optional["Tweet"] = None quotedTweet: Optional["Tweet"] = None place: Optional[Place] = None coordinates: Optional[Coordinates] = None inReplyToTweetId: int | None = None inReplyToUser: UserRef | None = None source: str | None = None sourceUrl: str | None = None sourceLabel: str | None = None media: Optional["Media"] = None
- Data model for users:
@dataclass class User(JSONTrait): id: int id_str: str url: str username: str displayname: str rawDescription: str created: datetime followersCount: int friendsCount: int statusesCount: int favouritesCount: int listedCount: int mediaCount: int location: str profileImageUrl: str profileBannerUrl: str | None = None protected: bool | None = None verified: bool | None = None blue: bool | None = None blueType: str | None = None descriptionLinks: list[TextLink] = field(default_factory=list)
- Raw data structure
profile = { User.id_str: { "user": User "followers": List[User], "following": List[User], "tweets": List[Tweet] } }