Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RE2022-272: Add a bulk version of genbank_to_genome in GFU #208

Merged
merged 174 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
174 commits
Select commit Hold shift + click to select a range
e3962f2
move default catalog params to GenbankToGenome.py
Xiangs18 Jan 26, 2024
ff1116f
add bulk version of genbank_to_genome
Xiangs18 Jan 27, 2024
f07f80c
update GenomeFileUtilServer.py
Xiangs18 Jan 27, 2024
1245e61
fix typos
Xiangs18 Jan 27, 2024
8d16ab5
fix validate_params fun call
Xiangs18 Jan 27, 2024
faf2c9d
use input_params
Xiangs18 Jan 27, 2024
6229ca2
add workspace_name check
Xiangs18 Jan 27, 2024
33c278c
replace workspace_name by workspace_id
Xiangs18 Jan 27, 2024
0933532
add genbank_to_genome bulk test
Xiangs18 Jan 27, 2024
34c3108
test genbank_upload_full_test.py
Xiangs18 Jan 28, 2024
473b22a
debug test_genbanks_to_genomes
Xiangs18 Jan 28, 2024
e5bcd97
run a specific test
Xiangs18 Jan 28, 2024
234c84d
run a single fun
Xiangs18 Jan 28, 2024
9220025
use mass method to upload a single genbank
Xiangs18 Jan 28, 2024
8cc8068
retest mass function
Xiangs18 Jan 28, 2024
03fcc7f
add tests to increase coverage
Xiangs18 Jan 28, 2024
b711c1b
run all tests && clean up
Xiangs18 Jan 28, 2024
12bd88d
add doc string for genbanks_to_genomes
Xiangs18 Jan 29, 2024
62d58d4
rename functions && correct typos
Xiangs18 Jan 30, 2024
090fbe3
refactor and test
Xiangs18 Jan 31, 2024
f504773
finish refactor && clean up
Xiangs18 Jan 31, 2024
9706af7
make Genome class private
Xiangs18 Jan 31, 2024
961bb93
refactor save assembly function
Xiangs18 Feb 2, 2024
3911611
fix metadata key error
Xiangs18 Feb 2, 2024
5d79e1d
update client code && test
Xiangs18 Feb 2, 2024
496aa2f
fix bug
Xiangs18 Feb 2, 2024
4c70b9d
fix objects index bug
Xiangs18 Feb 2, 2024
eb33f6a
debug index out of bounds
Xiangs18 Feb 2, 2024
58e836b
add try except in _save_genomes
Xiangs18 Feb 2, 2024
09dd0da
fix missing genome_names
Xiangs18 Feb 2, 2024
46ec762
run pass all tests && clean up
Xiangs18 Feb 2, 2024
eafa232
add default param for version
Xiangs18 Feb 3, 2024
c812cca
test genbank_assembly_ref_test.py only
Xiangs18 Feb 3, 2024
6965069
update _save_assemblies function logic
Xiangs18 Feb 3, 2024
4ffcd80
rerun genbank_assembly_ref_test
Xiangs18 Feb 3, 2024
d2be8ee
run all tests && final cleanup
Xiangs18 Feb 3, 2024
81dbadc
finish ASU refactor
Xiangs18 Feb 3, 2024
65bbcef
make validate_params private
Xiangs18 Feb 6, 2024
b491fe9
add missing self
Xiangs18 Feb 6, 2024
c21f3da
validate first before upload
Xiangs18 Feb 13, 2024
b56f03a
fix objects not speicifed bug
Xiangs18 Feb 13, 2024
5418499
fix fail error message
Xiangs18 Feb 13, 2024
ae8000a
add gc_content, dna_size, and md5
Xiangs18 Feb 13, 2024
dce257d
test dfu output
Xiangs18 Feb 13, 2024
59c9f01
fix metadata type
Xiangs18 Feb 13, 2024
eb8aac1
fix type check
Xiangs18 Feb 13, 2024
a3ead74
fix specs
Xiangs18 Feb 13, 2024
9cb61de
finish && cleanup
Xiangs18 Feb 13, 2024
347a791
test genome output
Xiangs18 Feb 14, 2024
ed8abdb
fix typo
Xiangs18 Feb 14, 2024
1224189
add assembly_info
Xiangs18 Feb 14, 2024
adfb3a4
add more tests
Xiangs18 Feb 14, 2024
2f4bbc8
update release notes && run all tests
Xiangs18 Feb 14, 2024
6649f13
keep ws and remove object_info def
Xiangs18 Feb 14, 2024
cf12176
update specs; fix release version; rename file_handle
Xiangs18 Feb 16, 2024
60c4e37
add valid params test
Xiangs18 Feb 16, 2024
130e958
fix error message
Xiangs18 Feb 16, 2024
8b54339
run all tests && clean up
Xiangs18 Feb 16, 2024
c4f3c95
bump version; update GenomeFileUtilImpl.py; use get_object_info3
Xiangs18 Feb 23, 2024
3347a5e
rename _get_contigs_and_validate_existing_assembly func and move down…
Xiangs18 Feb 23, 2024
a953fcc
fix ws client error
Xiangs18 Feb 23, 2024
558ed47
add usrname, missing comment; rename fun
Xiangs18 Feb 26, 2024
e2e7da9
remove out_contigs
Xiangs18 Feb 26, 2024
31b4132
add contigs_output to avoid pass in null
Xiangs18 Feb 26, 2024
1168eb2
move input_params into _Genome
Xiangs18 Mar 5, 2024
9599d70
add _check_result_object_info_fields func
Xiangs18 Mar 5, 2024
10cb15d
seperate mass test function
Xiangs18 Mar 5, 2024
42f828c
add new test
Xiangs18 Mar 5, 2024
d9f6d4b
fix bug in get_object_info3 func call
Xiangs18 Mar 5, 2024
b34a5b9
check info output
Xiangs18 Mar 5, 2024
3b99e10
fix info bug
Xiangs18 Mar 5, 2024
bb51ab5
display metadata to check
Xiangs18 Mar 5, 2024
ace9c90
fix tests
Xiangs18 Mar 6, 2024
1d0f965
check error output
Xiangs18 Mar 6, 2024
a251871
test output
Xiangs18 Mar 6, 2024
784e4b4
fix match problem
Xiangs18 Mar 6, 2024
4e14017
fix bugs in tests
Xiangs18 Mar 6, 2024
f769404
debug assertion fail
Xiangs18 Mar 6, 2024
f77eab1
run pass all tests
Xiangs18 Mar 6, 2024
19d2c31
add missing metadata param
Xiangs18 Mar 6, 2024
fe703f3
add provenance
Xiangs18 Mar 6, 2024
5f14f52
fix assert error
Xiangs18 Mar 6, 2024
c08f235
more checks
Xiangs18 Mar 6, 2024
fb60e20
finish adding provenance test
Xiangs18 Mar 6, 2024
f1a133a
add TODO and remove print messages
Xiangs18 Mar 7, 2024
9f9da3e
rm duplicate error check
Xiangs18 Mar 7, 2024
c8dfda9
check mRNA missing annotations
Xiangs18 Mar 8, 2024
3775aee
add rmna test
Xiangs18 Mar 8, 2024
3272e72
remove logs
Xiangs18 Mar 8, 2024
40fb78a
remove idx and add input params
Xiangs18 Mar 8, 2024
93b8b91
run all tests
Xiangs18 Mar 8, 2024
d5bce8a
more tests
Xiangs18 Mar 9, 2024
d770130
test spoof
Xiangs18 Mar 11, 2024
d3ab203
display spoof rtn
Xiangs18 Mar 11, 2024
21ac1ef
check data
Xiangs18 Mar 11, 2024
6c066f9
add check spoof function
Xiangs18 Mar 11, 2024
47aae1b
display spoof warning
Xiangs18 Mar 11, 2024
41c2339
more check
Xiangs18 Mar 11, 2024
062c3d3
test diff genbank file
Xiangs18 Mar 11, 2024
5c9ea7c
pass filed check
Xiangs18 Mar 11, 2024
84fe0b8
display warnings
Xiangs18 Mar 11, 2024
27d3fee
run all tests
Xiangs18 Mar 12, 2024
daf54ca
run mRNA with no parent
Xiangs18 Mar 12, 2024
d2a9be1
add mRNA_with_no_parent.gbff file
Xiangs18 Mar 12, 2024
5e44f27
check curated file meta
Xiangs18 Mar 12, 2024
7aaa537
remove redudant tests
Xiangs18 Mar 13, 2024
33f56f5
run all tests
Xiangs18 Mar 13, 2024
a32bdc1
cover ontology
Xiangs18 Mar 18, 2024
0488a57
correct ontology genbank name
Xiangs18 Mar 18, 2024
0d64e5b
fix ontology
Xiangs18 Mar 18, 2024
c49f54b
cover all missing lines && run all tests
Xiangs18 Mar 18, 2024
989ddc8
display data/info/prov
Xiangs18 Mar 19, 2024
42916a3
check info, metadata, and prov
Xiangs18 Mar 20, 2024
4e860d9
fix datetime iso check
Xiangs18 Mar 20, 2024
f8a5e17
fix metadata bug
Xiangs18 Mar 20, 2024
f98dcc0
debug on metadata
Xiangs18 Mar 20, 2024
89bca42
fix prov and metadata
Xiangs18 Mar 20, 2024
3233347
add TODOs and delete print
Xiangs18 Mar 20, 2024
1359c39
test prov again
Xiangs18 Mar 20, 2024
1957748
fix prov tests
Xiangs18 Mar 20, 2024
cd84e97
remove print message && finish
Xiangs18 Mar 20, 2024
5a15f2d
add genome data check
Xiangs18 Mar 22, 2024
2e88a66
display genome data
Xiangs18 Mar 25, 2024
ca640a5
rerun genome data check
Xiangs18 Mar 25, 2024
11f8e34
compare sorted data
Xiangs18 Mar 25, 2024
01a7da6
test genome data with order
Xiangs18 Mar 26, 2024
f3dbe55
print genome data
Xiangs18 Mar 26, 2024
378e39c
rerun ordered comparison
Xiangs18 Mar 26, 2024
e971cca
rerun
Xiangs18 Mar 26, 2024
27d64d2
rerun small genome files
Xiangs18 Mar 27, 2024
02e4ec4
fix test failure
Xiangs18 Mar 28, 2024
909e12c
add fix to download json file
Xiangs18 Apr 2, 2024
660d145
fix check
Xiangs18 Apr 4, 2024
449da57
genome data check using downloaded files
Apr 5, 2024
3decca9
fix filename error
Xiangs18 Apr 6, 2024
f91072e
new data files
Apr 8, 2024
4e1d603
rerun tests
Xiangs18 Apr 8, 2024
087e440
fix bug
Xiangs18 Apr 8, 2024
7867216
format document
Xiangs18 Apr 9, 2024
0083baf
complete genome check
Xiangs18 Apr 9, 2024
c69f797
display assembly
Xiangs18 Apr 10, 2024
e5ab860
debug key error
Xiangs18 Apr 10, 2024
61571fc
add assembly check first cut
Xiangs18 Apr 10, 2024
51340d5
fix annotations
Xiangs18 Apr 10, 2024
a52c980
debug assembly data check
Xiangs18 Apr 11, 2024
6b937c2
fix assembly data check bug
Xiangs18 Apr 11, 2024
d5c74f6
add token and rerun
Xiangs18 Apr 11, 2024
758ed57
refactor code
Xiangs18 Apr 11, 2024
55b4273
rerun all tests
Xiangs18 Apr 11, 2024
a11415a
redownload genome files
Xiangs18 Apr 16, 2024
05d4ff6
uplaod download genome files
Apr 16, 2024
bc630b4
rerun all tests
Xiangs18 Apr 16, 2024
0aaf931
finish && cleanup
Xiangs18 Apr 16, 2024
ee5638b
1.check version; 2.validate assembly_upa; 3. remove idx arg for self.…
Xiangs18 Apr 23, 2024
0b5ad8a
fix prov and aliases
Xiangs18 Apr 23, 2024
04aa6e1
download new data
Apr 24, 2024
8939c93
fix aliases bug
Xiangs18 Apr 24, 2024
cc2a609
check assembly handle_id, blob_id, and url
Xiangs18 Apr 24, 2024
13733c2
check genome handle ref
Xiangs18 Apr 24, 2024
3302dcb
add token
Xiangs18 Apr 25, 2024
b6aeac3
check blob_info and data md5
Xiangs18 Apr 25, 2024
2b60903
fix format error
Xiangs18 Apr 25, 2024
a8835d4
check consolidated file path
Xiangs18 Apr 26, 2024
f7f7071
rerun and check if checksum is changing
Xiangs18 Apr 26, 2024
96b3cc2
add _download_file_from_blobstore and _md5sum_string functions
Xiangs18 May 3, 2024
580b8e5
fix _md5sum_string bug
Xiangs18 May 3, 2024
5edad3c
use shock to file
Xiangs18 May 9, 2024
f732bd4
fix output dir
Xiangs18 May 10, 2024
58e79d8
check genome md5sum
Xiangs18 May 10, 2024
a2e271d
fix calculate_md5sum bug
Xiangs18 May 10, 2024
8a88a00
check assembly md5sum
Xiangs18 May 10, 2024
c23e9a4
add missing rtn
Xiangs18 May 10, 2024
43c420f
finish assembly md5sum check
Xiangs18 May 10, 2024
5097ef0
add genome upa check; blobstore filename check; reduce nums of args
Xiangs18 May 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/kb_sdk_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,5 @@ jobs:
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: true
43 changes: 43 additions & 0 deletions GenomeFileUtil.spec
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ module GenomeFileUtil {

/*
genome_name - becomes the name of the object
workspace_id - the immutable, numeric ID of the target workspace. Always prefer
providing the ID over the name.
workspace_name - the name of the workspace it gets saved to.
source - Source of the file typically something like RefSeq or Ensembl
taxon_ws_name - where the reference taxons are : ReferenceTaxons
Expand All @@ -42,6 +44,7 @@ module GenomeFileUtil {
typedef structure {
File file;

int workspace_id;
string genome_name;
string workspace_name;

Expand All @@ -65,6 +68,46 @@ module GenomeFileUtil {
funcdef genbank_to_genome(GenbankToGenomeParams params)
returns (GenomeSaveResult result) authentication required;

typedef structure {
File file;
string genome_name;

string source;
string taxon_wsname;
string taxon_id;

string release;
string generate_ids_if_needed;
int genetic_code;
string scientific_name;
usermeta metadata;
MrCreosote marked this conversation as resolved.
Show resolved Hide resolved
boolean generate_missing_genes;
string use_existing_assembly;
} GenbankToGenomeInput;

typedef structure {
int workspace_id;
list<GenbankToGenomeInput> inputs;
} GenbanksToGenomesParams;

typedef structure {
string genome_ref;
string assembly_ref;
string assembly_path;
Workspace.object_info assembly_info;
Workspace.object_info genome_info;
} GenbankToGenomeSaveResult;

/* Results for the genbanks_to_genomes function.
results - the results of the save operation in the same order as the input.
*/
typedef structure {
list<GenbankToGenomeSaveResult> results;
} GenbanksToGenomesSaveResults;

funcdef genbanks_to_genomes(GenbanksToGenomesParams params)
returns (GenbanksToGenomesSaveResults results) authentication required;

/*
is_gtf - optional flag switching export to GTF format (default is 0,
which means GFF)
Expand Down
7 changes: 7 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.11.7] - 2024-02-14
MrCreosote marked this conversation as resolved.
Show resolved Hide resolved

### Added

- The `genbanks_to_genomes` method was added to allow users to upload multiple
genome objects at once

## [0.11.6] - 2022-01-29
Fixed performance issue with FastaGFF impacting metagenome uploads

Expand Down
4 changes: 2 additions & 2 deletions kbase.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ service-language:
python

module-version:
0.11.6
0.11.7

owners:
[jkbaumohl, tgu2]
[jkbaumohl, tgu2, sijiex]
208 changes: 172 additions & 36 deletions lib/GenomeFileUtil/GenomeFileUtilImpl.py

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions lib/GenomeFileUtil/GenomeFileUtilServer.py
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,10 @@ def __init__(self):
name='GenomeFileUtil.genbank_to_genome',
types=[dict])
self.method_authentication['GenomeFileUtil.genbank_to_genome'] = 'required' # noqa
self.rpc_service.add(impl_GenomeFileUtil.genbanks_to_genomes,
name='GenomeFileUtil.genbanks_to_genomes',
types=[dict])
self.method_authentication['GenomeFileUtil.genbanks_to_genomes'] = 'required' # noqa
self.rpc_service.add(impl_GenomeFileUtil.genome_to_gff,
name='GenomeFileUtil.genome_to_gff',
types=[dict])
Expand Down
Loading
Loading