Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch multipage articles #897

Open
jonocodes opened this issue Jul 25, 2024 · 1 comment
Open

Fetch multipage articles #897

jonocodes opened this issue Jul 25, 2024 · 1 comment

Comments

@jonocodes
Copy link

I have noticed that for articles that are multiple pages, readability only gets the first one. But for postlight parser, this is not the case. It usually manages to page through to the end and capture it all.

Arstechnica for example has multi page articles, like so:
https://arstechnica.com/tech-policy/2024/05/how-dark-money-groups-help-private-isps-lobby-against-municipal-broadband/

This looks like how postlight does it:
https://github.com/postlight/parser/blob/main/src/extractors/generic/next-page-url/extractor.js

@fchasen
Copy link

fchasen commented Aug 8, 2024

While that does seem like a good use case, I think we'd want to implement this in readers that use readability instead of in the library itself.

I'd imagine it would be something like calling readability on each page that the reader finds for a URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants