Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Suggestion] Creation of a single timeline-styled index.html file comprising all tweets in sorted order (most recent on top) #130

Open
Wikinaut opened this issue Nov 26, 2022 · 4 comments

Comments

@Wikinaut
Copy link
Contributor

I would prefer to have a single timeline-styled index.html file comprising all tweets in sorted order (most recent on top).

This also allows a quick search using the browser's search function.

See also this example of https://github.com/Webklex/tbm (Twitter Bookmark Manager)

@lenaschimmel
Copy link
Collaborator

We had a single html file with all tweets before. For users with many tweets, it was impossible to open the file with a browser, see #103.

Even with a realtively small number of 3000 tweets, it was not completely broken, but slow, as you can see in this comment. Therefor I think, separate HTML files for each month are a good default.

For users with (much) less then 3000 tweets, maybe this would be a good option? I think this would be a nice addition, but probably less important than the other things we're currently working on.

@flauschzelle
Copy link
Collaborator

What do you think would be the maximum number of tweets that can be in one file, so that it doesn't break or slow down the browser too much?

@Wikinaut
Copy link
Contributor Author

Wikinaut commented Dec 17, 2022

I am working on a postprocessor.

I intentionally do not use the sed "-i" option to avoid changes on the orginal html files.

First approach creates a raw all.html file (done):

for f in *Tweet-Archive*.html;do sed "1,/<body*/d ; /<\/body/,//d;s/<h1>Your twitter archive<\/h1>/<h3>$f<\/h3><hr>/g" $f > $f.body; done
cat $(ls -r *.body) > all.html

Next steps:

  • add head/body
  • add skinning, especially for images (fixed smaller size, browser scaled, because we usually do not have thumbnail images)
  • ad hoc: sed "s/<img src=\"media/<img style='width:25%' src=\"media/g" all.html > all.smallimages.html
  • add jQuery for lazy loading of the images (= images within the view are fetched immediately, next images later)
  • optionally later: add cache for thumbnail images and use them when needed

Basic data for my Twitter archive:
all.html:

  • ~ 32 MB
  • grep -c /media/tweet.ico all.html : 82.886 = number of archived tweets
  • images, video: grep -o \"media/ all.html | wc -l : 15.166 = number of local media objects linked in all.html (9,2 GB)

Works. It takes ~ 130 seconds *) to load the all.html including the original sized images on my slow NUC i7, 15 Watt thermal power.

*) Including browser cold start time: to avoid the chaching of the local images during this test.

@Wikinaut
Copy link
Contributor Author

@flauschzelle

Here is the final code in one file:

https://gist.github.com/Wikinaut/39b2be7a5570a6cd41181f11c2577e30#file-patch-parsed-twitter-archive-sh

Excerpt:

# Patch parsed Twitter Archiv Parser
# Postprocessor for files generated by https://github.com/timhutton/twitter-archive-parser
# init 18.12.2022

# Usage: in the parsed twitter-archive directory with the numerous *.html, run
# ./patch-parsed-twitter-archive-sh

# It creates:
# for each existing monthly (or so) *.html it creates one new *.html.body file
# patches and strips several html tags
# adds jQuery and the lazy-images loader
# adds links to the images to facilitate the original view, opens the original image in a new tab 
# concatenates all *.html.body files into a single all.html file

# TODO:
# the sorting order in each monthly block is downwards (most recent tweet: at the end)
# where as the concatenated monthly files are added with the "most recent on top"
# Example: NOV 1, 2, ... 30, OCT 1, 2, ... 31

Example output with a video and an image:
grafik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants