Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Export unshortened URLs to CSV (e.g. for archiving them in the Internet Archive) #90

Open
jpluimers opened this issue Nov 20, 2022 · 6 comments

Comments

@jpluimers
Copy link

Michele Weigle has a nice thread on archiving t.co links into the Internet Archive running an Internet Archive service that archives URLs in a Google Sheets spreadsheet: https://twitter.com/weiglemc/status/1593698822257102851

Her script prepares the list of t.co URLs using awk: https://gist.github.com/weiglemc/312a11356420b3bc4c8e196e8454002a

The idea from that script might be a thing you want to include in your Python script.

@jpluimers
Copy link
Author

Related: #42 (via #38 (comment))

@jpluimers jpluimers changed the title Expanding t.co links or archiving then in the Internet Archive Expanding t.co links for archiving them in the Internet Archive Nov 22, 2022
@timhutton
Copy link
Owner

@jpluimers We do this already, in parse_tweets(). #42 is then taking the next step, which is to make remote calls to t.co (wp.me, etc.) to retrieve the expanded URLs directly for the ones we couldn't find in the archive.

@timhutton
Copy link
Owner

What I mean is, we already get the expanded versions from the JSON. We don't push the mappings to the internet archive, though I can see that that is a useful service to humanity and something we could support.

@jpluimers
Copy link
Author

What I mean is, we already get the expanded versions from the JSON. We don't push the mappings to the internet archive, though I can see that that is a useful service to humanity and something we could support.

That would be cool. What's the best way to rephrase this issue to reflect that intent better?

@cooljeanius
Copy link

What I mean is, we already get the expanded versions from the JSON. We don't push the mappings to the internet archive, though I can see that that is a useful service to humanity and something we could support.

Yeah if there could be some sort of "export to Google Sheet" option to make the step outlined in the 3rd tweet of the thread linked in the OP easier, I would find that useful: https://twitter.com/weiglemc/status/1593698828171067393

@timhutton timhutton changed the title Expanding t.co links for archiving them in the Internet Archive Feature request: Export unshortened URLs to CSV (e.g. for archiving them in the Internet Archive) Nov 29, 2022
@timhutton timhutton reopened this Nov 29, 2022
@cooljeanius
Copy link

What I mean is, we already get the expanded versions from the JSON. We don't push the mappings to the internet archive, though I can see that that is a useful service to humanity and something we could support.

Yeah if there could be some sort of "export to Google Sheet" option to make the step outlined in the 3rd tweet of the thread linked in the OP easier, I would find that useful: twitter.com/weiglemc/status/1593698828171067393

...Archive link to it, in case it becomes inaccessible itself: http://web.archive.org/web/20221125001650/https://twitter.com/weiglemc/status/1593698828171067393

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants