Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful handling of connection errors #35

Merged
merged 2 commits into from
Dec 8, 2023
Merged

Graceful handling of connection errors #35

merged 2 commits into from
Dec 8, 2023

Conversation

jannisborn
Copy link
Owner

@jannisborn jannisborn commented Dec 7, 2023

Close #34

When scraping biorxiv/medrxiv, occasional connection error occurs, as described in #34. With this PR we handle such errors more gracefully and attempt up to max_retries retries to download the same batch of papers.

Version bump to 0.2.8

@jannisborn jannisborn added the invalid This doesn't seem right label Dec 7, 2023
@jannisborn
Copy link
Owner Author

Still downloading but looks like this now:

>>> from paperscraper.get_dumps import biorxiv, medrxiv, chemrxiv
WARNING:paperscraper.load_dumps: No dump found for biorxiv. Skipping entry.
WARNING:paperscraper.load_dumps: No dump found for chemrxiv. Skipping entry.
WARNING:paperscraper.load_dumps: No dump found for medrxiv. Skipping entry.
WARNING:paperscraper.load_dumps: No dumps found for either biorxiv or medrxiv. Consider using paperscraper.get_dumps.* to fetch the dumps.
>>> medrxiv()
5101it [03:59, 22.87it/s]ERROR:paperscraper.xrxiv.xrxiv_api:Connection error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')). Retrying (1/10)
26101it [24:57,  5.00it/s]

@jannisborn jannisborn merged commit db4f0c1 into master Dec 8, 2023
1 check passed
@jannisborn jannisborn deleted the grace_api branch December 8, 2023 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remote diconnected and didnt download files
1 participant