Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we stop accepting unpreservable pages? #5836

Open
non-descriptive opened this issue Aug 16, 2024 · 7 comments
Open

Can we stop accepting unpreservable pages? #5836

non-descriptive opened this issue Aug 16, 2024 · 7 comments

Comments

@non-descriptive
Copy link
Contributor

Once in a while I meet articles in TWiR which are either located on some self-hosted site way beyond some kind of ddos protection, which may ban access to me because of VPN or geo-IP resolution, or behind paywall (some Medium.com if I remember right was like that). Those articles might be good but likely gonna perish in a long run - which may lead to dangling links in TWiR. Usually, I just run "Save page" in web.archive.org for those links and read it as they are, but sometimes resources ban archivation which means if it goes down - it dies with all the info.
Event pages, videos and audios is understandably hard to preserve, but texts are pretty easy, so could we stop accepting posts which are not archivable? Maybe make little routine to save them before adding them to new issue.
Most recent example:
557: Async Rust: The new billion-dollar mistake?

@oskgo
Copy link

oskgo commented Aug 22, 2024

Most pages will perish in the long run, and "this week in Rust" isn't an archive. It's a newsletter. Wouldn't requiring preservability harm that purpose?

What is your criteria for pages being "preservable"? What are examples of preservable blog posts? Can blogs do something to become preservable?

I agree with not accepting blogs that are behind a paywall, but I haven't seen that happen recently and the Readme explicitly lists things behind a pay- or register-wall as not wanted.

@non-descriptive
Copy link
Contributor Author

What is your criteria for pages being "preservable"?

If you can shove it into https://web.archive.org/ save page and have at least textual snapshot, than it fits named category. Otherwise the article will not be accessible for many people, because of reasons I listed above.

@extrawurst
Copy link
Contributor

extrawurst commented Aug 27, 2024

does web.archive.org have an API for this?

thats really the only way to make that feasible because then it can be made part of CI

@non-descriptive
Copy link
Contributor Author

Yes, they have. Wikipedia bots certainly use it to archive some external references, but I'm not sure which one they use exactly.

@oskgo
Copy link

oskgo commented Aug 27, 2024

It seems like the example site explicitly forbids archiving their site, and blocks it (check out the "reproduction" section at the bottom). I thought the problem was accidental when I made my first comment.

I really don't like this, but for integrity reasons rather than accessibility.

@non-descriptive
Copy link
Contributor Author

non-descriptive commented Aug 27, 2024

It seems like the example site explicitly forbids archiving their site

I can't look at any of it because
a) IP from my country are banned in their setting of Cloudflare front
b) IP addresses related to VPNs/proxies I managed to use are also banned by their CF front

So the way I tried to proxy read it is either using google translate or using Internet archive. Good articles normally are preserved and even have several snapshots already. But that exact article bans either of them.

@bennyvasquez
Copy link
Contributor

I think it's an acceptable thought to try to do this for accessibility reasons, but I don't think it's in the best interest of our readers at large to blanket prohibit articles from being submitted only because they might not be archivable or might not be around forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants