-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Automated Benchmarking Suite #366
Comments
I think this might combine well with #362 . If we have a tutorial series that takes a real-world example from end to end throughout our package, we can also use it to benchmark performance. |
I think it'd be great to benchmark against all of Genbank or uniprot or pdb. Would take a server with decent hard drives, or just a lot of data per month to stream, and would actually validate that our parsers work well. |
This is a great idea! We could have our new CI/CD pipeline (#365) incorproate this. I don't think it'd be advisable to have it run against ALL of these massive datasets every time we merge, but we could have it pick a consistent, representative subset. It'd be nice to also have all new entries in these DBs run against the latest version of our parsers. Also, these DBs aren't that big size-wise since it's just text and not image data, right? I have no clue, this is a genuine question. |
Genbank I think is a little over a terabyte, so not that bad. Uniprot is like 250gb. SRA, on the other hand, is 33 petabytes (and the wayback machine is 57 petabytes), so kinda puts it into perspective. SRA there is NO WAY we could handle, but Genbank+uniprot would probably be doable. |
This issue has had no activity in the past 2 months. Marking as |
It'd be really cool to have a benchmarking suite that we can run to see if we've unintentionally introduced any performance changes before merging into main.
Idea would be that on PR creation we'd run the benchmarks on both the main branch and the PR branch, and use it to highlight any significant changes (positive and negative).
We can start slow with what we'd consider "problem areas" and continue out from there.
The text was updated successfully, but these errors were encountered: