-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document idempotency #147
Comments
These are good questions and I agree that this should be documented in the README. I'll try to give a quick (but not short) answer here, and I or someone else will probably update the README later. I've highlighted the sections which are (IMHO) most important. Regarding updates: yes, we really try to make sure that upgrading the script and re-running it on the same folder is safe and convenient. But there's no guarantee, and with the quick pace of development and without automated software tests, there might be problems sometimes. The script should be idempotent, and in it's current version it probably is in the broad sense, but with some catches. Say, you run the script, answer its questions a certain way, and everything goes well. Now, if you run it again, and give the same answer to its questions, then the result will be the same as after the first run. The script is also (somewhat) incremental: if some resources could not be downloaded on the first run, the result is incomplete. The second may be able to fetch more online resources, and generate a result that is (more) complete. This is definitely true for media resources (images and videos). But there's the catch: some online resources are not cached / saved properly, and if at the time of the second run the online availability of certain resources, e.g. Twitter user profiles, is worse than on the first run, the resulting .md and .html files might be less complete then after the first run. My work on this script focuses on improving this, because I think will become more important if / when Twitters API become less reliable over time. Ideally, it should be safe to run this script even when Twitter is offline, or returns only empty / nonsensical responses. We're definitely not there yet, and it's not trivial. In issue #144 and in this branch / fork we're working on downloading and saving referenced tweets. The new tweets are merged with the ones in the archive (without modifying the original file), so that the locally available data only becomes more complete over time. But there are still some bugs, so that the tweet cache grows a bit each time, instead of settling on a "complete" state and staying there forever. If you want to be 100% sure: copy the full output of the script before running it again. If you want to be 99,5% sure: copy the .html and .md output. The media folder will be fine anyway. |
Thank you, I had this actual literal question and just came here to ask it. 👍🏻 |
OK, could this info go in the wiki or somewhere? |
@timhutton where do you think would be the best place to put such documentation? |
Is it safe to run this script multiple times on the same archive? Is it safe to run new versions of the script on versions of an archive that have already been processed by an old version of the script? Whether or not the script is idempotent should be documented in the README, IMO.
The text was updated successfully, but these errors were encountered: