-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base feature: Download tweets which are missing or incomplete #144
Comments
I just tried your fork and got this error:
|
Ah, sorry. This error was recently found and fixed, but not in the correct branch. I just pushed this commit into |
Also just wondering if the function for retrieving the full text of any retweets your archive is in this PR? |
I think the liked tweets need re-downloading, as the only fields in the archive are |
We have several issues that can only be solved if we download some additional tweets:
I already started working on an implementation for this on Nov 22th. I figured that we still didn't have an issue for it until now, but an issue might be helpful to keep track of the progress, especially if multiple PRs are created / updated / closed / merged to implement this feature.
What does this do?
These are the features that are (mostly) finished in the branch
downloadtweets
, but not yet available onmain
:known_tweets.json
What's still missing
like.js
need re-downloading for some reason. They are ignored right now.Where is the progress?
So there's the PR #97 which merged my first set of commits into the branch
downloadtweets
in this repo. And the PR #122 which tracks my current work on it, which happens indownloadtweets
in my fork. None of this is currently merged into main.The PR is already quite huge, and looks even bigger due to the many merge commits which just bring it up to date with
main
.For several days, the online diff for #122 was broken, but now it works again.
Why is the PR so huge?
I under-estimated the complexity of the tweet json format. Many properties are quite similar and contain redundant data. There are slight differences in the format form the API and from the archive. Many number properties are sometimes encoded as numbers, and sometimes as strings, which makes equality checks and merging difficult.
Also, throwing all tweets (from the archive, and those downloaded for several different reasons) into a single dict / json file has some downside. But since every tweet can be referenced in multiple different ways and be part of the original archive, keeping them separate is also not trivial - maybe impossible.
How will this continue?
I think we could merge this really soon. The remaining problems are not as big as they might seem:
What do you think, @timhutton? What should I do before we can merge #122 into
downloadtweets
? And what should be done before that result can be merged intomain
?The text was updated successfully, but these errors were encountered: