-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RE2022-217: Workspace uploader performance #426
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #426 +/- ##
==========================================
+ Coverage 46.97% 47.00% +0.02%
==========================================
Files 75 75
Lines 6518 6612 +94
==========================================
+ Hits 3062 3108 +46
- Misses 3456 3504 +48 ☔ View full report in Codecov by Sentry. |
save_assemblies_from_fastas_integration_test.py::test_import_fasta_mass_timing Start uploading to 69141 in test_import_fasta_mass_timing fn GCA_002506415.1_ASM250641v1_genomic.fna.gz
1693955723.5434816: Shock url: https://ci.kbase.us/services/shock-api
----------- coverage: platform linux, python 3.6.3-final-0 ----------- ========================= 1 passed in 83.58s (0:01:23) ========================= |
I highlight the key time stats from the console via running
|
@MrCreosote @Tianhao-Gu
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor stuff in this review I think
Still need to review tests, but I'll wait until the code is done
I'll be really glad when we merge this PR, it takes a long time to review...
Note to self: all my comments above this point have been addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, went thought the tests finally
Addressed all comments in the tests. We are getting really close to merge this PR. Maybe after Podman test ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, looking about done. I think there are 2 things left beyond the fairly trivial comment in this review:
- Test the code with podman, including the recovery code, now that it's fixed
- @Tianhao-Gu should take another look as the PR has been basically completely rewritten since his review and I've looked at it so many times now for so long I've probably lost objectivity
@MrCreosote Tested the uploader with podman, including the recovery code. Everything looks good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good except some typos. Are we planning to refactor it and upload genome all together in the future? Currently, workspace downloader is expecting every Assembly has an associated Genome obj.
If not supplied, the token must be provided in the environment variable KB_AUTH_TOKEN. | ||
The KB_ADMIN_AUTH_TOKEN environment variable will get set by this token if the user runs as catalog admin. | ||
au_service_ver (str): The service verison of AssemblyUtilClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
au_service_ver (str): The service verison of AssemblyUtilClient | |
au_service_ver (str): The service version of AssemblyUtilClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
workers (int): The number of workers to use for multiprocessing. | ||
max_callback_server_tasks (int): The maxmium subtasks for the callback server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_callback_server_tasks (int): The maxmium subtasks for the callback server. | |
max_callback_server_tasks (int): The maximum number of subtasks for the callback server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
"--cbs_max_tasks", | ||
type=int, | ||
default=20, | ||
help="The maxmium subtasks for the callback server", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help="The maxmium subtasks for the callback server", | |
help="The maximum subtasks for the callback server", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
yep
There's no toggle to choose to d/l genomes or not? |
As far as I understand, no option to ignore genome existence. Looks to me, it doesn't really download the genome obj. It just saves the genome_upa to the meta file. |
Yeah, it needs that for exporting genome sets. There's no real need for the genome data proper in the pipeline as yet |
No description provided.