Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completeness and contamination missing from fetch_ncbi.py #80

Open
amardeepranu opened this issue Dec 19, 2023 · 3 comments
Open

Completeness and contamination missing from fetch_ncbi.py #80

amardeepranu opened this issue Dec 19, 2023 · 3 comments

Comments

@amardeepranu
Copy link

Are we suppose to compile this data ourselves? fetch_ena.py seems to generate this data, but fetch_ncbi.py does not.

@amardeepranu
Copy link
Author

amardeepranu commented Dec 19, 2023

Ah looks like its done here: https://github.com/EBI-Metagenomics/genomes-pipeline/blob/853487f6dda1420fd8b6b41dd4aff5c8540c7e37/subworkflows/prepare_data.nf#L27-L29

is there a reason the CHECKM step isn't done for the ENA data as well? Specifically for data that isn't pulled using the fetch_ena.py script?

@amardeepranu
Copy link
Author

https://github.com/EBI-Metagenomics/genomes-pipeline/blob/853487f6dda1420fd8b6b41dd4aff5c8540c7e37/workflows/genomes_annotation.nf#L90-L99

I also see here that all ncbi data is ignored, can I edit this and include ncbi data without issue? Thanks

@tgurbich
Copy link
Contributor

Hi @amardeepranu,

All genomes, regardless of where they were downloaded from, should be passed to the pipeline using the --ena_genomes flag. If any of your genomes were fetched using the fetch_ncbi.py script, you need to run CheckM on them and combine all completeness and contamination results into one file for all ENA and NCBI genomes. Pass the path to this combined file to the pipeline using the --ena_genomes_checkm flag.

We will adjust this in the future releases to avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants