Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support compressed download in database.rcsb.fetch() #532

Open
padix-key opened this issue Feb 3, 2024 · 4 comments
Open

Support compressed download in database.rcsb.fetch() #532

padix-key opened this issue Feb 3, 2024 · 4 comments

Comments

@padix-key
Copy link
Member

The RCSB PDB provides all files also in gzipped format. Therefore, to improve download times in database.rcsb.fetch(), one could optionally download the gzipped files and and unzip the HTTP response content via Python's gzip module, before writing the structure file to disk.

@Orpowell
Copy link

Orpowell commented Feb 25, 2024

Hi!

I'm really keen to contribute to Biotite so I ran a few tests on this. It seems that the speed up for downloading gzipped files is fairly negligible when you account the time for required to unzip the file. The results were generated using repeat() from timeit with 10 runs and 100 repetitions (1000 repetitions in total) and are in the table below. You can find the test code here.

download type speed (s)
pdb 5.02787
gzipped pdb 5.00965
difference 0.01822

There might be a way to eek-out more performance but I'm not sure how you'd do it. If you still think this is worth adding to the library - I'm happy to finish off the implementation. Let me know what you think!

Cheers,

Ollie

@padix-key
Copy link
Member Author

padix-key commented Feb 26, 2024

Thanks for the benchmark. I created a modified version of your script (larger structure, omitted writing step) and found similar results: The differences are marginal and which one is faster is not clear.

Still a compressed download probably makes sense, in case the bandwidth is limiting. I just would not use it as the default. So if you still like to implement this feature, feel free to do so 👍.

@Orpowell
Copy link

Orpowell commented Mar 1, 2024

Awesome I'll start working on it 👍.

@padix-key
Copy link
Member Author

Hi, are you still planning to implement this feature? If not, this is also fine, but I would free the issue up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants