New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

batch saving genomes #212

Open

Xiangs18 wants to merge 24 commits into master from dev-batch_genome

Contributor

Xiangs18 commented Aug 15, 2024

No description provided.


          add save_genomes function

6208a9a

Xiangs18 requested review from jsfillman, jkbaumohl and Tianhao-Gu as code owners

August 15, 2024 00:15


          fix positional arg #1 is the wrong type bug

a2fabf4

codecov bot commented Aug 16, 2024 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.06%. Comparing base (4819598) to head (01a1dc8).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #212      +/-   ##
==========================================
+ Coverage   80.88%   81.06%   +0.18%     
==========================================
  Files          11       11              
  Lines        2998     3011      +13     
==========================================
+ Hits         2425     2441      +16     
+ Misses        573      570       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Contributor Author

Xiangs18 commented Aug 16, 2024 •

edited

Loading

self note:

a2fabf4 verifies that the refactored code works with the save_one_genome function.
330d6c2 verifies that the refactored code works with the save_genome_mass function.

Xiangs18 added 3 commits

August 15, 2024 19:58


          use batch genome save in GenbankToGenome.py

330d6c2


          make save_genome_mass function internal

27158ec


          add tests for save_genome_mass function

04aa203

Xiangs18 changed the title ~~[WIP] add save_genomes~~ add save_genomes

Xiangs18 changed the title ~~add save_genomes~~ batch saving genomes


          fix bug

8d672d1

Xiangs18 requested a review from MrCreosote

August 17, 2024 06:22

MrCreosote reviewed

View reviewed changes

Member

MrCreosote left a comment •

edited

Loading

Haven't looked at tests yet, but this is already a lot of comments

EDIT: All comments addressed

RELEASE_NOTES.md Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenbankToGenome.py Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

Xiangs18 added 10 commits

August 26, 2024 14:42


          update release notes && make the dicts in the loop

d1dd768


          remove logging && add NOTE for workspace_datatype

e79c297


          move set_up_single_params && validate_mass_params into GenomeUtils

35030a0


          remove redundant tests

4b1b88b


          add test to cover the missing line

b56d77f


          fix params name typo

788c7ee


          add metagenome json file && cover the missing line

65467ea


          rm gff_handle_ref

66af0b7


          add features_handle_ref && protein_handle_ref before upload

fcbb510


          add boolean flag for validate_genome

32f96a0

Xiangs18 requested a review from MrCreosote

September 10, 2024 23:50

Xiangs18 added 4 commits

September 10, 2024 17:10


          test validate_genome boolean flag

41eed0b


          1. add pydoc for save_genme_mass; 2. make the dicts in the _save_geno…

71670bc

…me_mass loop; 3. make the note much more explicit


          update release notes && remove tiny files

0cbe155


          add more info and warnings checks in test

52e280d

Xiangs18 added 2 commits

September 18, 2024 21:40


          remove metagenome from test

f018f0e


          remove unused lib

54fedf0

MrCreosote reviewed

View reviewed changes

lib/GenomeFileUtil/core/GenomeUtils.py Outdated

+                  ws_name_to_id_func: Callable[[str], int]
+              ) -> Dict[str, Any]:
+                  """

Member

MrCreosote Sep 19, 2024

Suggested change

Contributor Author

Xiangs18 Oct 30, 2024

👍

Member

MrCreosote Nov 4, 2024

Unfortunately I was wrong about this. From https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals

Formatted string literals cannot be used as docstrings, even if they do not include expressions.

... which is a bummer.

lib/GenomeFileUtil/core/GenomeUtils.py Outdated

+                  Returns:
+                      Dict[str, Any]: A dictionary containing the workspace ID and the processed parameters. The dictionary
+                          has keys '_WSID' and '_INPUTS', where '_WSID' is the workspace ID and '_INPUTS' is a list containing

Member

MrCreosote Sep 19, 2024

Suggested change

      
                        has keys '_WSID' and '_INPUTS', where '_WSID' is the workspace ID and '_INPUTS' is a list containing
          
                        has keys {_WSID} and {_INPUTS}, where {_WSID} is the workspace ID and {_INPUTS} is a list containing

Contributor Author

Xiangs18 Oct 30, 2024

👍

lib/GenomeFileUtil/core/GenomeUtils.py Outdated

+                  validate_params_func: Callable[[Dict[str, Any]], None]
+              ) -> None:
+                  """

Member

MrCreosote Sep 19, 2024

Suggested change

Contributor Author

Xiangs18 Oct 30, 2024

👍

lib/GenomeFileUtil/core/GenomeUtils.py Outdated

Comment on lines 548 to 549

                          - _INPUTS: A list of parameter dictionaries, each of which must be validated by `validate_params_func`.

            

Member

MrCreosote Sep 19, 2024

Suggested change

      
                        - _WSID: A workspace ID, which must be present and valid.
          
                        - _INPUTS: A list of parameter dictionaries, each of which must be validated by `validate_params_func`.
          
                        - {_WSID}: A workspace ID, which must be present and valid.
          
                        - {_INPUTS}: A list of parameter dictionaries, each of which must be validated by `validate_params_func`.

Member

MrCreosote Sep 19, 2024

Some more of these need to be done below

Contributor Author

Xiangs18 Oct 30, 2024

👍

lib/GenomeFileUtil/core/GenomeUtils.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenbankToGenome.py Show resolved Hide resolved

test/problematic_tests/save_genome_test.py Show resolved Hide resolved

test/problematic_tests/save_genome_test.py

                       if contains:
                           self.assertIn(error, str(context.exception))
                       else:
                           self.assertEqual(error, str(context.exception))
-                  def check_save_one_genome_output(self, ret, genome_name):
+                  def check_save_one_genome_output(

Member

MrCreosote Sep 19, 2024

This checker barely does checks anything, but I realize it was that way when you got here

Member

MrCreosote Sep 19, 2024 •

edited

Loading

I could've sworn you spent a bunch of time adding rigorous tests to this module

Contributor Author

Xiangs18 Nov 1, 2024

Member

MrCreosote Nov 4, 2024 •

edited

Loading

This is similar to the comment below - the tests use the check_save_one_genome_output method as a helper for testing mass saves, so the mass save method isn't actually getting tested if it relies on that function

Contributor Author

Xiangs18 Nov 4, 2024 •

edited

Loading

Mass saves use the check_save_one_genome_output method as a helper for testing. Why do you think the mass isn't actually being tested? In my opinion, it conducted some testing, but it wasn't tested thoroughly.

Member

MrCreosote Nov 4, 2024 •

edited

Loading

Because check_save_one_genome output does very little testing - all it checks is the genome name, the type, and user name. There's no testing of the object contents, the attached files, the provenance, any of the other information in the object_info, etc.

Contributor Author

Xiangs18 Nov 4, 2024 •

edited

Loading

There's no testing of the object contents, the attached files, the provenance, any of the other information in the object_info, etc.

I agree with you on the above statement, but I thought we decided to document it for future work unless you changed your mind.

Member

MrCreosote Nov 4, 2024

Oh, I see, I misread this comment when I was on my phone: #212 (comment)

I thought it was saying tests needed to be added for the save_one_genome api call.

My position is that for the mass call that's been added to the API we should add tests to any code we change. Tests for single genome calls needed to be added, but for now can just be documented as needed. Where I'm confused is that for the mass call we're using the check_save_one_genome_output which hardly does anything, which therefore means we're not testing the mass call - or we're calling that function for reasons I don't understand, since it has almost no benefit

Contributor Author

Xiangs18 Nov 4, 2024

The mass call is not added to the API; save_genome_mass is used internally.

Member

MrCreosote Nov 4, 2024

I'm talking about the mass call that was added as a part of this entire parllelization project

test/problematic_tests/save_genome_test.py

Comment on lines +233 to +245

+                  def test_genomes_with_hidden(self):
+                      self.start_test()
+                      genome_name = 'test_genome_hidden'
+                      inputs = [
+                          {
+                              'name': genome_name,
+                              'data': self.test_genome_data,
+                              'hidden': 1,
+                          }
+                      ]
+                      params = {'workspace_id': self.wsID, 'inputs': inputs}
+                      ret = self.genome_interface.save_genome_mass(params)[0]
+                      self.check_save_one_genome_output(ret, genome_name, warnings=[])

Member

MrCreosote Sep 19, 2024

This doesn't actually test that the genome is hidden

Contributor Author

Xiangs18 Sep 19, 2024

why?

Member

MrCreosote Sep 22, 2024

Because there's nothing tin check_save_one_genome_output that checks the genome is hidden

Contributor Author

Xiangs18 Nov 1, 2024

Member

MrCreosote Nov 4, 2024

I don't understand how that comment applies here. The function under test is save_genome_mass, the test just uses check_save_one_genome as a heler

lib/GenomeFileUtil/core/GenbankToGenome.py

Comment on lines +165 to +170

+                          # check features
+                          self.gi.check_dna_sequence_in_features(genome_obj.genome_data)
+                          # validate genome
+                          genome_obj.genome_data['warnings'] = self.gi.validate_genome(genome_obj.genome_data)

Member

MrCreosote Sep 19, 2024

Are there tests for G2G that exercise these code paths?

Contributor Author

Xiangs18 Oct 31, 2024

Yeah, we have genbank_upload_full_test.py.

Member

MrCreosote Nov 4, 2024

And there are tests that cause errors to be thown from the check / validate methods?

Xiangs18 added 2 commits

October 30, 2024 16:35


          fix documentation

861a50b


          add workspace_id in GenomeFileUtil.spec

01a1dc8

Xiangs18 requested a review from MrCreosote

November 1, 2024 18:16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

jsfillman Awaiting requested review from jsfillman jsfillman is a code owner

jkbaumohl Awaiting requested review from jkbaumohl jkbaumohl is a code owner

Tianhao-Gu Awaiting requested review from Tianhao-Gu Tianhao-Gu is a code owner

MrCreosote Awaiting requested review from MrCreosote

At least 1 approving review is required to merge this pull request.

Labels

None yet