You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The RCSB PDB provides all files also in gzipped format. Therefore, to improve download times in database.rcsb.fetch(), one could optionally download the gzipped files and and unzip the HTTP response content via Python's gzip module, before writing the structure file to disk.
The text was updated successfully, but these errors were encountered:
I'm really keen to contribute to Biotite so I ran a few tests on this. It seems that the speed up for downloading gzipped files is fairly negligible when you account the time for required to unzip the file. The results were generated using repeat() from timeit with 10 runs and 100 repetitions (1000 repetitions in total) and are in the table below. You can find the test code here.
download type
speed (s)
pdb
5.02787
gzipped pdb
5.00965
difference
0.01822
There might be a way to eek-out more performance but I'm not sure how you'd do it. If you still think this is worth adding to the library - I'm happy to finish off the implementation. Let me know what you think!
Thanks for the benchmark. I created a modified version of your script (larger structure, omitted writing step) and found similar results: The differences are marginal and which one is faster is not clear.
Still a compressed download probably makes sense, in case the bandwidth is limiting. I just would not use it as the default. So if you still like to implement this feature, feel free to do so 👍.
The RCSB PDB provides all files also in gzipped format. Therefore, to improve download times in
database.rcsb.fetch()
, one could optionally download the gzipped files and and unzip the HTTP response content via Python'sgzip
module, before writing the structure file to disk.The text was updated successfully, but these errors were encountered: