Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RE2022-272: Add a bulk version of genbank_to_genome in GFU #208

Merged
merged 174 commits into from
May 21, 2024

Conversation

Xiangs18
Copy link
Contributor

No description provided.

Copy link

codecov bot commented Jan 27, 2024

Codecov Report

Attention: Patch coverage is 99.42197% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 80.61%. Comparing base (d48a690) to head (5097ef0).

Files Patch % Lines
lib/GenomeFileUtil/GenomeFileUtilImpl.py 90.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #208      +/-   ##
==========================================
+ Coverage   79.25%   80.61%   +1.35%     
==========================================
  Files          11       11              
  Lines        2902     3007     +105     
==========================================
+ Hits         2300     2424     +124     
+ Misses        602      583      -19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Xiangs18 Xiangs18 requested a review from MrCreosote January 29, 2024 19:35
GenomeFileUtil.spec Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/GenomeFileUtilImpl.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/GenomeFileUtilImpl.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/GenomeFileUtilImpl.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Outdated Show resolved Hide resolved
@Xiangs18 Xiangs18 requested a review from MrCreosote May 14, 2024 21:24
Copy link
Member

@MrCreosote MrCreosote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is looking pretty good now. Lesson learned: we need to be a lot more careful about making smaller changes per PR. If it looks like the changes are going to be big we should get together and try and hash out a way to split things up.

Here's the list of stuff I can find we've said we need to do in future PRs:

  • Add tests for the various input parameters for the bulk method
  • delete export_genome_features_protein_to_fasta from spec and recompile
  • validate / parse all data, both genome & assembly data, before saving anything
  • batch saving genomes vs multiple calls to save_one_genome
  • Read through the code looking for places where a lot of stuff is being loaded into memory (e.g. contigs) and be sure that it's removed from memory as soon as possible
  • Same thing for files - there are places where files are copied that might not be necessary or files can be deleted earlier
  • Handle the case where there are > 10000 inputs (workspace will reject)
  • parallelization

@Tianhao-Gu should do the final review & approval for this PR

@Xiangs18
Copy link
Contributor Author

Xiangs18 commented May 15, 2024

@MrCreosote Have we discussed about export_genome_features_protein_to_fasta? It doesn't ring a bell. This function is no long in use?

@MrCreosote
Copy link
Member

lib/GenomeFileUtil/GenomeFileUtilImpl.py Show resolved Hide resolved
# dict with feature 'id's that have been used more than once.
self.used_twice_identifiers = {}

# related info for genome process and upload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it matter that 'gc_content', 'dna_size', and 'md5' attributes are absent from _Genome()?

Copy link
Contributor Author

@Xiangs18 Xiangs18 May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not matter because these attributes will be assigned before being used.
But for consistency, I can initiate and assign them to None in _Genome() in the next PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@Tianhao-Gu Tianhao-Gu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Xiangs18
Copy link
Contributor Author

Xiangs18 commented May 16, 2024

#208 (comment)

@MrCreosote This link only directs me to kbase_coders channel.

@MrCreosote
Copy link
Member

ok, try https://kbase.slack.com/archives/C4E7KUGTD/p1710806243115819

@Xiangs18 Xiangs18 merged commit e0b1dd3 into master May 21, 2024
3 checks passed
@Xiangs18 Xiangs18 mentioned this pull request Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants