Create Automated Benchmarking Suite #366

TimothyStiles · 2023-09-23T19:32:28Z

It'd be really cool to have a benchmarking suite that we can run to see if we've unintentionally introduced any performance changes before merging into main.

Idea would be that on PR creation we'd run the benchmarks on both the main branch and the PR branch, and use it to highlight any significant changes (positive and negative).

We can start slow with what we'd consider "problem areas" and continue out from there.

carreter · 2023-09-23T20:18:54Z

I think this might combine well with #362 . If we have a tutorial series that takes a real-world example from end to end throughout our package, we can also use it to benchmark performance.

Koeng101 · 2023-09-24T02:30:59Z

I think it'd be great to benchmark against all of Genbank or uniprot or pdb. Would take a server with decent hard drives, or just a lot of data per month to stream, and would actually validate that our parsers work well.

carreter · 2023-09-24T02:54:12Z

This is a great idea! We could have our new CI/CD pipeline (#365) incorproate this.

I don't think it'd be advisable to have it run against ALL of these massive datasets every time we merge, but we could have it pick a consistent, representative subset.

It'd be nice to also have all new entries in these DBs run against the latest version of our parsers.

Also, these DBs aren't that big size-wise since it's just text and not image data, right? I have no clue, this is a genuine question.

Koeng101 · 2023-09-24T02:57:59Z

Genbank I think is a little over a terabyte, so not that bad. Uniprot is like 250gb. SRA, on the other hand, is 33 petabytes (and the wayback machine is 57 petabytes), so kinda puts it into perspective. SRA there is NO WAY we could handle, but Genbank+uniprot would probably be doable.

github-actions · 2023-11-23T18:29:48Z

This issue has had no activity in the past 2 months. Marking as stale.

TimothyStiles added this to poly development roadmap Feb 13, 2023

TimothyStiles converted this from a draft issue Sep 23, 2023

carreter added devops Improvements to DevOps (e.g. GitHub actions, linting, etc.) low priority Would be nice to fix, but doesn't have to happen right now/there are more important things hard A major or complex undertaking labels Sep 23, 2023

carreter mentioned this issue Sep 24, 2023

Move pipelines to more reproducible platform with less YAML #365

Closed

github-actions bot added the stale label Nov 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Automated Benchmarking Suite #366

Create Automated Benchmarking Suite #366

TimothyStiles commented Sep 23, 2023

carreter commented Sep 23, 2023

Koeng101 commented Sep 24, 2023

carreter commented Sep 24, 2023

Koeng101 commented Sep 24, 2023

github-actions bot commented Nov 23, 2023

Create Automated Benchmarking Suite #366

Create Automated Benchmarking Suite #366

Comments

TimothyStiles commented Sep 23, 2023

carreter commented Sep 23, 2023

Koeng101 commented Sep 24, 2023

carreter commented Sep 24, 2023

Koeng101 commented Sep 24, 2023

github-actions bot commented Nov 23, 2023