Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add fetch-ncbi-taxonomy action #118

Merged
merged 26 commits into from
Jan 29, 2024
Merged

Conversation

Sann5
Copy link
Contributor

@Sann5 Sann5 commented Jan 18, 2024

About this repo

  • q2-moshpit is a qiime2 plugin that offers various analyses for shotgun metagenome data.
  • The current PR wraps adds action fetch-ncbi-taxonomy to the eggnog._dbs module, as well as test and test data for it.

What's new

  • The build-custom-diamond-db action can take an optional taxonomy input. By providing such input users can create a Diamond database with taxonomy features. Previously, to provide this optional input, users had to rely on the qiime tools import functionality. The goal of this action is to replace this import method with an action that fetches the corresponding data from the NCBI ftp server.
  • Closes ENH: Fetch NCBI Taxonomy Data (input to build_custom_diamond_db action) #107

Additional Info

Merge after

Preview Give feedback
  1. enhancement
    Sann5
  2. enhancement
    Sann5
  3. 2 of 2
    enhancement
    Sann5

Set up an environment

# For linux: 
# export MY_OS="linux"
# For mac:
export MY_OS="osx" 
wget "https://data.qiime2.org/distro/shotgun/qiime2-shotgun-2023.9-py38-"$MY_OS"-conda.yml"
conda env create -n q2-shotgun --file qiime2-shotgun-2023.9-py38-osx-conda.yml
rm "qiime2-shotgun-2023.9-py38-"$MY_OS"-conda.yml"

Run it locally

  1. Clone the repo and checkout the PR branch:
# Remove q2-types-genomics so you can install your local version.
conda activate q2-shotgun
conda remove q2-types-genomics q2-types q2-moshpit
pip install git+https://github.com/qiime2/q2-types.git
git clone [email protected]:bokulich-lab/q2-types-genomics.git
git clone [email protected]:bokulich-lab/q2-moshpit.git
cd q2-types-genomics
gh pr checkout 73
pip install .
cd ../q2-moshpit
gh pr checkout 118
pip install -e .
  1. Test it out! You need ~30 GB of free disk space to run this. In light of the size of the download, it might take some time to complete.
qiime moshpit fetch-ncbi-taxonomy --o-taxonomy erasme_later.qza --verbose

Running the tests

pytest -W ignore -vv --pyargs q2_moshpit

@Sann5 Sann5 added enhancement New feature or request stat:blocked labels Jan 18, 2024
@Sann5 Sann5 requested a review from VinzentRisch January 18, 2024 13:34
@Sann5 Sann5 self-assigned this Jan 18, 2024
@Sann5
Copy link
Contributor Author

Sann5 commented Jan 18, 2024

Hey @VinzentRisch, do you mind giving this one a review too? It is the follow up from the last one. I know it looks huge but it's way smaller than it looks (194 insertions(+), 4 deletions(-)). But it builds on top of branches that are not yet merged (moshpit PRs listed in the tasklist), so that is why it looks so big in the diff. If you want to see only the pertinent changes you can run the following in you local copy:

git diff build_eggnog_diamond_db_issue_115 fetch_ncbi_iss_107

@misialq
Copy link
Contributor

misialq commented Jan 18, 2024

Hey @Sann5, why does this depend on the two other PRs in q2-moshpit?

@Sann5
Copy link
Contributor Author

Sann5 commented Jan 18, 2024

Hey @misialq . It does not strictly depend on them but because it modifies the same files I decided to make this branch off of the same branch as those two PRs, to avoid merging conflicts. Therefore it would be nice if those other two are merged before this one :).

@misialq
Copy link
Contributor

misialq commented Jan 19, 2024

I see. Let's try to avoid that in the future though - it makes the review process much more complicated than it needs to be - now whoever is reviewing your PRs has to look at the same code changes three times. The merge conflicts you would get from what you have in those would be rather easy to solve, if any, so let's try not to focus too much on what may need resolution later.
I merged the other one so please rebase this one to remove all the unnecessary commits/changes (please also do it for the other PR which has the same changes as the one that was just merged). Thanks! 🙏

Copy link
Contributor

@VinzentRisch VinzentRisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @Sann5, I ran the command and it runs without problems. 🙂
But there are two tests that fail for me

  1. kraken2/tests/test_classification.py::TestClassifyKraken2Reads::test_paired_end_reads_parallel <- ../q2-moshpit/q2_moshpit/kraken2/tests/test_classification.py Fatal Python error: Segmentation fault

raise Exception("Interchange failed to start")
Exception: Interchange failed to start

  1. kraken2/tests/test_classification.py::TestClassifyKraken2Contigs::test_contigs_parallel <- ../q2-moshpit/q2_moshpit/kraken2/tests/test_classification.py FAILED[ 48%

raise ValueError('ParallelConfig already loaded, cannot nest ParallelConfigs)
ValueError: ParallelConfig already loaded, cannot nest ParallelConfigs

could you look into that?

@Sann5
Copy link
Contributor Author

Sann5 commented Jan 19, 2024

@VinzentRisch

HI @Sann5, I ran the command and it runs without problems. 🙂

These are from another PR :). They are failing for me too.

Copy link
Contributor

@VinzentRisch VinzentRisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Sann5 Sann5 force-pushed the fetch_ncbi_iss_107 branch from 8883f6d to 8233505 Compare January 19, 2024 11:32
Copy link
Contributor

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm testing the action itself, please clean up this PR as pointed out below.

q2_moshpit/eggnog/_dbs.py Outdated Show resolved Hide resolved
q2_moshpit/plugin_setup.py Outdated Show resolved Hide resolved
q2_moshpit/eggnog/tests/test_dbs.py Outdated Show resolved Hide resolved
setup.py Show resolved Hide resolved
q2_moshpit/plugin_setup.py Outdated Show resolved Hide resolved
@Sann5
Copy link
Contributor Author

Sann5 commented Jan 19, 2024

@misialq once #116 is merged all that diff will disappear. I think it would be easier to merge that and then review this. Sorry for the mess. In the future, I will branch every PR branch from the main branch.

@misialq
Copy link
Contributor

misialq commented Jan 19, 2024

Alright, alright 😊 my bad too, I looked at PRs in the wrong order 😅 Waiting for the changes from the other one then ;)

q2_moshpit/eggnog/_dbs.py Outdated Show resolved Hide resolved
@Sann5 Sann5 requested a review from misialq January 22, 2024 06:35
@Sann5
Copy link
Contributor Author

Sann5 commented Jan 23, 2024

q2_moshpit/_utils.py Outdated Show resolved Hide resolved
q2_moshpit/_utils.py Outdated Show resolved Hide resolved
q2_moshpit/eggnog/_dbs.py Outdated Show resolved Hide resolved
q2_moshpit/eggnog/_dbs.py Outdated Show resolved Hide resolved
q2_moshpit/eggnog/_dbs.py Outdated Show resolved Hide resolved
q2_moshpit/eggnog/_dbs.py Outdated Show resolved Hide resolved
q2_moshpit/eggnog/_dbs.py Outdated Show resolved Hide resolved
q2_moshpit/eggnog/_dbs.py Outdated Show resolved Hide resolved
q2_moshpit/eggnog/_dbs.py Outdated Show resolved Hide resolved
q2_moshpit/plugin_setup.py Outdated Show resolved Hide resolved
@Sann5 Sann5 requested review from misialq January 24, 2024 16:12
Copy link

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 15 lines in your changes are missing coverage. Please review.

Comparison is base (0668991) 97.74% compared to head (9a2de57) 97.74%.
Report is 14 commits behind head on main.

Files Patch % Lines
q2_moshpit/kraken2/classification.py 91.80% 5 Missing ⚠️
q2_moshpit/_examples.py 50.00% 2 Missing ⚠️
q2_moshpit/kaiju/classification.py 97.75% 2 Missing ⚠️
q2_moshpit/kraken2/database.py 81.81% 2 Missing ⚠️
q2_moshpit/_utils.py 93.33% 1 Missing ⚠️
q2_moshpit/eggnog/_utils.py 83.33% 1 Missing ⚠️
q2_moshpit/kaiju/tests/test_classification.py 99.02% 1 Missing ⚠️
q2_moshpit/kraken2/helpers.py 95.45% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main     #118    +/-   ##
========================================
  Coverage   97.74%   97.74%            
========================================
  Files          35       45    +10     
  Lines        2172     2799   +627     
========================================
+ Hits         2123     2736   +613     
- Misses         49       63    +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @Sann5! 🥇

@Sann5 Sann5 merged commit 137eff9 into bokulich-lab:main Jan 29, 2024
9 checks passed
@Sann5 Sann5 deleted the fetch_ncbi_iss_107 branch January 29, 2024 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Fetch NCBI Taxonomy Data (input to build_custom_diamond_db action)
3 participants