Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch saving genomes #212

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

batch saving genomes #212

wants to merge 24 commits into from

Conversation

Xiangs18
Copy link
Contributor

No description provided.

Copy link

codecov bot commented Aug 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.06%. Comparing base (4819598) to head (01a1dc8).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #212      +/-   ##
==========================================
+ Coverage   80.88%   81.06%   +0.18%     
==========================================
  Files          11       11              
  Lines        2998     3011      +13     
==========================================
+ Hits         2425     2441      +16     
+ Misses        573      570       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Xiangs18
Copy link
Contributor Author

Xiangs18 commented Aug 16, 2024

self note:

a2fabf4 verifies that the refactored code works with the save_one_genome function.
330d6c2 verifies that the refactored code works with the save_genome_mass function.

@Xiangs18 Xiangs18 changed the title [WIP] add save_genomes add save_genomes Aug 17, 2024
@Xiangs18 Xiangs18 changed the title add save_genomes batch saving genomes Aug 17, 2024
@Xiangs18 Xiangs18 requested a review from MrCreosote August 17, 2024 06:22
Copy link
Member

@MrCreosote MrCreosote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at tests yet, but this is already a lot of comments

EDIT: All comments addressed

RELEASE_NOTES.md Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Show resolved Hide resolved
lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenomeInterface.py Show resolved Hide resolved
lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved
@Xiangs18 Xiangs18 requested a review from MrCreosote September 10, 2024 23:50
ws_name_to_id_func: Callable[[str], int]
) -> Dict[str, Any]:
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
f"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I was wrong about this. From https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals

Formatted string literals cannot be used as docstrings, even if they do not include expressions.

... which is a bummer.

Returns:
Dict[str, Any]: A dictionary containing the workspace ID and the processed parameters. The dictionary
has keys '_WSID' and '_INPUTS', where '_WSID' is the workspace ID and '_INPUTS' is a list containing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
has keys '_WSID' and '_INPUTS', where '_WSID' is the workspace ID and '_INPUTS' is a list containing
has keys {_WSID} and {_INPUTS}, where {_WSID} is the workspace ID and {_INPUTS} is a list containing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

validate_params_func: Callable[[Dict[str, Any]], None]
) -> None:
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
f"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 548 to 549
- _INPUTS: A list of parameter dictionaries, each of which must be validated by `validate_params_func`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- _WSID: A workspace ID, which must be present and valid.
- _INPUTS: A list of parameter dictionaries, each of which must be validated by `validate_params_func`.
- {_WSID}: A workspace ID, which must be present and valid.
- {_INPUTS}: A list of parameter dictionaries, each of which must be validated by `validate_params_func`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more of these need to be done below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

lib/GenomeFileUtil/core/GenomeUtils.py Outdated Show resolved Hide resolved
lib/GenomeFileUtil/core/GenbankToGenome.py Show resolved Hide resolved
test/problematic_tests/save_genome_test.py Show resolved Hide resolved
if contains:
self.assertIn(error, str(context.exception))
else:
self.assertEqual(error, str(context.exception))

def check_save_one_genome_output(self, ret, genome_name):
def check_save_one_genome_output(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checker barely does checks anything, but I realize it was that way when you got here

Copy link
Member

@MrCreosote MrCreosote Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could've sworn you spent a bunch of time adding rigorous tests to this module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@MrCreosote MrCreosote Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to the comment below - the tests use the check_save_one_genome_output method as a helper for testing mass saves, so the mass save method isn't actually getting tested if it relies on that function

Copy link
Contributor Author

@Xiangs18 Xiangs18 Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mass saves use the check_save_one_genome_output method as a helper for testing. Why do you think the mass isn't actually being tested? In my opinion, it conducted some testing, but it wasn't tested thoroughly.

Copy link
Member

@MrCreosote MrCreosote Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because check_save_one_genome output does very little testing - all it checks is the genome name, the type, and user name. There's no testing of the object contents, the attached files, the provenance, any of the other information in the object_info, etc.

Copy link
Contributor Author

@Xiangs18 Xiangs18 Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no testing of the object contents, the attached files, the provenance, any of the other information in the object_info, etc.

I agree with you on the above statement, but I thought we decided to document it for future work unless you changed your mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, I misread this comment when I was on my phone: #212 (comment)

I thought it was saying tests needed to be added for the save_one_genome api call.

My position is that for the mass call that's been added to the API we should add tests to any code we change. Tests for single genome calls needed to be added, but for now can just be documented as needed. Where I'm confused is that for the mass call we're using the check_save_one_genome_output which hardly does anything, which therefore means we're not testing the mass call - or we're calling that function for reasons I don't understand, since it has almost no benefit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mass call is not added to the API; save_genome_mass is used internally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about the mass call that was added as a part of this entire parllelization project

Comment on lines +233 to +245
def test_genomes_with_hidden(self):
self.start_test()
genome_name = 'test_genome_hidden'
inputs = [
{
'name': genome_name,
'data': self.test_genome_data,
'hidden': 1,
}
]
params = {'workspace_id': self.wsID, 'inputs': inputs}
ret = self.genome_interface.save_genome_mass(params)[0]
self.check_save_one_genome_output(ret, genome_name, warnings=[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually test that the genome is hidden

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there's nothing tin check_save_one_genome_output that checks the genome is hidden

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how that comment applies here. The function under test is save_genome_mass, the test just uses check_save_one_genome as a heler

Comment on lines +165 to +170
# check features
self.gi.check_dna_sequence_in_features(genome_obj.genome_data)

# validate genome
genome_obj.genome_data['warnings'] = self.gi.validate_genome(genome_obj.genome_data)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there tests for G2G that exercise these code paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we have genbank_upload_full_test.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there are tests that cause errors to be thown from the check / validate methods?

@Xiangs18 Xiangs18 requested a review from MrCreosote November 1, 2024 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants