-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add actions to fetch data from BV-BRC. #202
ENH: Add actions to fetch data from BV-BRC. #202
Conversation
Thanks for working on this @VinzentRisch! When I try and install this branch I obtain the following error:
I suggest that you update / merge your master fork with the master branch of our repo. Then merge/sync your |
Hi @mikerobeson |
Never mind, I was mistakingly not running this in the developer version of QIIME 2. 🤦♂️ Testing it now. :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start @VinzentRisch! I was able to run your example code via the dev version of QIIME 2.
There should be more content within the parameter_descriptions
and descriptions
. Also, it would be nice to have as part of your PR description, and the code help text / descriptions, the "use cases" for these actions. That is, what does a user do once they have these QZA files? How do they work with it? Perhaps consider constructing a RESCRIPt tutorial to eventually go along with this?
I'd also recommend changing the names of your actions so that they resemble the other actions:
- get-bv-brc-genome-features
- get-bv-brc-genomes
- get-bv-brc-metadata
- get-bv-brc-taxonomy
🤔 I wonder if it'd be simpler to combine to one action, with options for all the other downloadable items? Or combine a couple of the simple ones like metadata and taxonomy? I currently do not have an opinion on this as I've never used BV-BRC before, and do not know what people routinely use it for.... I mean I have some ideas but...
rescript/plugin_setup.py
Outdated
'genomes': 'genomes', | ||
}, | ||
name='fetch genomes', | ||
description="fetch genomes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide a detailed description of what this action does. What is the use case for this? Can you provide an example?
In all of the descriptions please define and explain what BV-BRC stands for, etc... Explain what is this tool used for?
rescript/plugin_setup.py
Outdated
output_descriptions={ | ||
'metadata': 'metadata'}, | ||
name='Fetch BV-BCR metadata.', | ||
description="Fetch BV-BCR metadata for a specific data type with an RQL " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, please provide a detailed description of what this action retrieves and what it outputs. What is the downstream use case for this? Can you provide an example?
rescript/plugin_setup.py
Outdated
|
||
|
||
plugin.methods.register_function( | ||
function=fetch_taxonomy_bv_brc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these new actions supposed to be used together for something? Nothing about these actions or their use case have been explained. What is the user expected to do with all of this output?
|
||
}, | ||
name='Fetch genome features from BV-BRC.', | ||
description='Fetch DNA and protein sequences of genome features from ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should explicitly describe the difference between fetch-genomes-bv-brc
and fetch-genome-features-bv-brc
. For example, a user might be confused as to why they need a genome if they have the features? Or vice versa?
Each action description should contain enough information so that the user understands why they are using a given action. Even better if the user knows how that data can be used downstream. See the other RESCRIPt code help text / examples.
Hi @mikerobeson
|
Hi @VinzentRisch, I apologize for taking a while to get to this. 😬 I really like to improvements. I was able to run everything w/o issue. Again, as I've never really used this tool, I'm not sure what the next steps are with the QZA outputs. Once I have an idea of what the "final" steps are I should be able to finalize my review. Other than that, nothing really stands out to me to fix.. at least for the moment. :-) |
Hey @mikerobeson thank you for reviewing this! This is an action with yet-unrealized potential for downstream re-use. The output types are the same as used in moshpit, so these outputs could be used, e.g., to create a reference genome catalog or to perform a comparative genomics analyses. But I think new actions and transformers are needed to really utilize these data, as the types are very new. So it's a chicken-vs-egg type problem, and @VinzentRisch has just created a chicken. Next I think we need to figure out how to lay some eggs... |
Thanks for that @nbokulich! I am okay to merge this as it is, if you think it is ready. I've nothing more to add. 👍 |
closes #200
Adds new action get_bv_brc_genomes to fetch genome sequences outputted as GenomeData[DNASequences] with corresponding taxonomy data as FeatureData[Taxonomy].
Adds new action get_bv_brc_metadata to fetch any type of metadata outputted as immutable metadata.
Adds new action get_bv_brc_genome_features to fetch genome feature sequences outputted as GenomeData[Genes] and GenomeData[Proteins]. Additionally taxonomy as FeatureData[Taxonomy] is provided and GenomeData[Loci]
All actions make use of the Data API by BV-BRC.
The data that should be downloaded can be queried with three options. First for all actions an RQL style query can be provided. Second by providing IDs/values and a corresponding data field, you can retrieve all data associated with those specific values in that data field. And as a third option a metadata column can be provided, to use metadata obtained with the action get-bv-brc-metadata as a new query.
Run it locally