From 64f7fb04e8558df7a17db3c0dd0f58cef27297ed Mon Sep 17 00:00:00 2001 From: pchaumeil Date: Thu, 30 Mar 2023 15:19:52 +1000 Subject: [PATCH] docs(Improve documentation for GTDB-Tk): --- docs/src/changelog.rst | 4 + docs/src/commands/classify.rst | 55 ++++++++------ docs/src/commands/classify_wf.rst | 120 ++++++++++++++++++++---------- docs/src/faq.rst | 31 ++++++-- 4 files changed, 137 insertions(+), 73 deletions(-) diff --git a/docs/src/changelog.rst b/docs/src/changelog.rst index 78b07457..ae5a936e 100644 --- a/docs/src/changelog.rst +++ b/docs/src/changelog.rst @@ -19,9 +19,11 @@ Minor changes: ----- Bug Fixes: + * gtdbtk.json is now reset when the pipeline is re run and the status of ani_Screen is not 'complete' Minor changes: + * When using '--genes' , ANI steps are skipped and Warnings are raised to the user to inform them that classification is less accurate. * (`#486 `_) Environment variables can be used in GTDBTK_DATA_PATH @@ -33,9 +35,11 @@ Minor changes: ----- Bug Fixes: + * gtdbtk.json is now reset when the pipeline is re run and the status of ani_Screen is not 'complete' Minor changes: + * When using '--genes' , ANI steps are skipped and Warnings are raised to the user to inform them that classification is less accurate. * (`#486 `_) Environment variables can be used in GTDBTK_DATA_PATH diff --git a/docs/src/commands/classify.rst b/docs/src/commands/classify.rst index caeec4e3..e525beb4 100644 --- a/docs/src/commands/classify.rst +++ b/docs/src/commands/classify.rst @@ -57,7 +57,7 @@ Input .. code-block:: bash - gtdbtk classify --genome_dir genomes/ --align_dir align_output/ --out_dir classify_output --cpus 3 + gtdbtk classify --align_dir align_3lines/ --batchfile 3lines_batchfile.tsv --out_dir 3classify_ani --mash_db mash_db_dir/ --cpus 20 @@ -67,26 +67,33 @@ Output .. code-block:: text - [2023-02-08 12:53:42] INFO: GTDB-Tk v2.2.0 - [2023-02-08 12:53:42] INFO: gtdbtk classify --align_dir align_3lines/ --batchfile 3lines_batchfile.tsv --out_dir 3classify_ani --mash_db mash_db_dir/ --cpus 20 - [2023-02-08 12:53:42] INFO: Using GTDB-Tk reference data version r207: /path/to/gtdbtk/database/release207_v2/ - [2023-02-08 12:53:43] INFO: Loading reference genomes. - [2023-02-08 12:53:43] INFO: Using Mash version 2.2.2 - [2023-02-08 12:53:43] INFO: Loading data from existing Mash sketch file: 3classify_ani/classify/ani_screen/intermediate_results/mash/gtdbtk.user_query_sketch.msh - [2023-02-08 12:53:43] INFO: Loading data from existing Mash sketch file: mash_db_dir/gtdb_ref_sketch.msh - [2023-02-08 12:53:46] INFO: Calculating Mash distances. - [2023-02-08 12:53:49] INFO: Calculating ANI with FastANI v1.3. - [2023-02-08 12:53:49] INFO: Completed 12 comparisons in 0.44 seconds (27.54 comparisons/second). - [2023-02-08 12:53:49] INFO: 2 genome(s) have been classified using the ANI pre-screening step. - [2023-02-08 12:53:49] TASK: Placing 1 bacterial genomes into backbone reference tree with pplacer using 20 CPUs (be patient). - [2023-02-08 12:53:49] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 - [2023-02-08 12:55:02] INFO: Calculating RED values based on reference tree. - [2023-02-08 12:55:03] INFO: 1 out of 1 have an class assignments. Those genomes will be reclassified. - [2023-02-08 12:55:03] TASK: Placing 1 bacterial genomes into class-level reference tree 5 (1/1) with pplacer using 20 CPUs (be patient). - [2023-02-08 12:57:38] INFO: Calculating RED values based on reference tree. - [2023-02-08 12:57:40] TASK: Traversing tree to determine classification method. - [2023-02-08 12:57:40] INFO: Completed 1 genome in 0.04 seconds (23.86 genomes/second). - [2023-02-08 12:57:40] INFO: 0 genome(s) have been classified using FastANI and pplacer. - [2023-02-08 12:57:40] WARNING: 1 of 3 genome has a warning (see summary file). - [2023-02-08 12:57:40] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode. - [2023-02-08 12:57:40] INFO: Done. \ No newline at end of file + [2023-02-15 08:37:11] INFO: GTDB-Tk v2.2.2 + [2023-02-15 08:37:11] INFO: gtdbtk classify --align_dir align_3lines/ --batchfile 3lines_batchfile.tsv --out_dir 3classify_ani --mash_db mash_db_dir/ --cpus 20 + [2023-02-15 08:37:11] INFO: Using GTDB-Tk reference data version r207: /srv/projects/gtdbtk/test_new_features/release207_v2/ + [2023-02-15 08:37:12] INFO: Loading reference genomes. + [2023-02-15 08:37:13] INFO: Using Mash version 2.2.2 + [2023-02-15 08:37:13] INFO: Loading data from existing Mash sketch file: 3classify_ani/classify/ani_screen/intermediate_results/mash/gtdbtk.user_query_sketch.msh + [2023-02-15 08:37:13] INFO: Loading data from existing Mash sketch file: mash_db_dir/gtdb_ref_sketch.msh + [2023-02-15 08:37:16] INFO: Calculating Mash distances. + [2023-02-15 08:37:20] INFO: Calculating ANI with FastANI v1.3. + [2023-02-15 08:37:21] INFO: Completed 12 comparisons in 0.62 seconds (19.21 comparisons/second). + [2023-02-15 08:37:21] INFO: 1 genome(s) have been classified using the ANI pre-screening step. + [2023-02-15 08:37:21] TASK: Placing 2 bacterial genomes into backbone reference tree with pplacer using 20 CPUs (be patient). + [2023-02-15 08:37:21] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 + [2023-02-15 08:39:24] INFO: Calculating RED values based on reference tree. + [2023-02-15 08:39:25] INFO: 2 out of 2 have an class assignments. Those genomes will be reclassified. + [2023-02-15 08:39:25] TASK: Placing 1 bacterial genomes into class-level reference tree 6 (1/2) with pplacer using 20 CPUs (be patient). + [2023-02-15 08:43:39] INFO: Calculating RED values based on reference tree. + [2023-02-15 08:43:42] TASK: Traversing tree to determine classification method. + [2023-02-15 08:43:42] INFO: Completed 1 genome in 0.00 seconds (2,451.38 genomes/second). + [2023-02-15 08:43:42] TASK: Calculating average nucleotide identity using FastANI (v1.3). + [2023-02-15 08:43:43] INFO: Completed 34 comparisons in 0.90 seconds (37.77 comparisons/second). + [2023-02-15 08:43:43] INFO: 0 genome(s) have been classified using FastANI and pplacer. + [2023-02-15 08:43:43] TASK: Placing 1 bacterial genomes into class-level reference tree 5 (2/2) with pplacer using 20 CPUs (be patient). + [2023-02-15 08:46:38] INFO: Calculating RED values based on reference tree. + [2023-02-15 08:46:40] TASK: Traversing tree to determine classification method. + [2023-02-15 08:46:40] INFO: Completed 1 genome in 0.05 seconds (20.80 genomes/second). + [2023-02-15 08:46:40] INFO: 0 genome(s) have been classified using FastANI and pplacer. + [2023-02-15 08:46:41] WARNING: 1 of 3 genome has a warning (see summary file). + [2023-02-15 08:46:41] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode. + [2023-02-15 08:46:41] INFO: Done. \ No newline at end of file diff --git a/docs/src/commands/classify_wf.rst b/docs/src/commands/classify_wf.rst index a360c331..96c0dad1 100644 --- a/docs/src/commands/classify_wf.rst +++ b/docs/src/commands/classify_wf.rst @@ -86,44 +86,82 @@ Output .. code-block:: text - [2022-04-11 12:48:53] INFO: GTDB-Tk v2.0.0 - [2022-04-11 12:48:53] INFO: gtdbtk classify_wf --genome_dir genomes/ --out_dir classify_wf_out --cpus 3 -x gz - [2022-04-11 12:48:53] INFO: Using GTDB-Tk reference data version r207: /srv/db/gtdbtk/official/release207 - [2022-04-11 12:48:53] INFO: Identifying markers in 3 genomes with 3 threads. - [2022-04-11 12:48:53] TASK: Running Prodigal V2.6.3 to identify genes. - [2022-04-11 12:49:04] INFO: Completed 3 genomes in 10.96 seconds (3.65 seconds/genome). - [2022-04-11 12:49:04] TASK: Identifying TIGRFAM protein families. - [2022-04-11 12:49:10] INFO: Completed 3 genomes in 5.88 seconds (1.96 seconds/genome). - [2022-04-11 12:49:10] TASK: Identifying Pfam protein families. - [2022-04-11 12:49:10] INFO: Completed 3 genomes in 0.41 seconds (7.30 genomes/second). - [2022-04-11 12:49:10] INFO: Annotations done using HMMER 3.1b2 (February 2015). - [2022-04-11 12:49:10] TASK: Summarising identified marker genes. - [2022-04-11 12:49:11] INFO: Completed 3 genomes in 0.07 seconds (40.18 genomes/second). - [2022-04-11 12:49:11] INFO: Done. - [2022-04-11 12:49:11] INFO: Aligning markers in 3 genomes with 3 CPUs. - [2022-04-11 12:49:11] INFO: Processing 3 genomes identified as archaeal. - [2022-04-11 12:49:11] INFO: Read concatenated alignment for 3,412 GTDB genomes. - [2022-04-11 12:49:11] TASK: Generating concatenated alignment for each marker. - [2022-04-11 12:49:11] INFO: Completed 3 genomes in 0.02 seconds (167.25 genomes/second). - [2022-04-11 12:49:11] TASK: Aligning 52 identified markers using hmmalign 3.1b2 (February 2015). - [2022-04-11 12:49:11] INFO: Completed 52 markers in 0.54 seconds (96.16 markers/second). - [2022-04-11 12:49:11] TASK: Masking columns of archaeal multiple sequence alignment using canonical mask. - [2022-04-11 12:49:16] INFO: Completed 3,415 sequences in 4.15 seconds (822.38 sequences/second). - [2022-04-11 12:49:16] INFO: Masked archaeal alignment from 13,540 to 10,153 AAs. - [2022-04-11 12:49:16] INFO: 0 archaeal user genomes have amino acids in <10.0% of columns in filtered MSA. - [2022-04-11 12:49:16] INFO: Creating concatenated alignment for 3,415 archaeal GTDB and user genomes. - [2022-04-11 12:49:18] INFO: Creating concatenated alignment for 3 archaeal user genomes. - [2022-04-11 12:49:18] INFO: Done. - [2022-04-11 12:49:18] TASK: Placing 3 archaeal genomes into reference tree with pplacer using 3 CPUs (be patient). - [2022-04-11 12:49:18] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 - [2022-04-11 12:54:22] INFO: Calculating RED values based on reference tree. - [2022-04-11 12:54:23] TASK: Traversing tree to determine classification method. - [2022-04-11 12:54:23] INFO: Completed 3 genomes in 0.00 seconds (23,563.51 genomes/second). - [2022-04-11 12:54:23] TASK: Calculating average nucleotide identity using FastANI (v1.32). - [2022-04-11 12:54:25] INFO: Completed 6 comparisons in 1.96 seconds (3.06 comparisons/second). - [2022-04-11 12:54:25] INFO: 3 genome(s) have been classified using FastANI and pplacer. - [2022-04-11 12:54:25] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode. - [2022-04-11 12:54:25] INFO: Done. - [2022-04-11 12:54:25] INFO: Removing intermediate files. - [2022-04-11 12:54:25] INFO: Intermediate files removed. - [2022-04-11 12:54:25] INFO: Done. + [2023-02-22 16:10:50] INFO: GTDB-Tk v2.2.3 + [2023-02-22 16:10:50] INFO: gtdbtk classify_wf --batchfile 3lines_batchfile.tsv --out_dir classify_wf_outdir_test --keep_intermediates --cpus 20 --mash_db mash_sketch/cli/mash_db.msh + [2023-02-22 16:10:50] INFO: Using GTDB-Tk reference data version r207: /srv/projects/gtdbtk/test_new_features/release207_v2/ + [2023-02-22 16:10:50] INFO: Loading reference genomes. + [2023-02-22 16:10:51] INFO: Using Mash version 2.3 + [2023-02-22 16:10:51] INFO: Loading data from existing Mash sketch file: classify_wf_outdir_test/classify/ani_screen/intermediate_results/mash/gtdbtk.user_query_sketch.msh + [2023-02-22 16:10:51] INFO: Creating Mash sketch file: mash_sketch/cli/mash_db.msh + [2023-02-22 16:10:51] INFO: Calculating RED values based on reference tree. + [2023-02-22 16:10:54] TASK: Traversing tree to determine classification method. + [2023-02-22 16:10:54] INFO: Completed 1 genome in 0.00 seconds (2,335.36 genomes/second). + [2023-02-22 16:10:54] TASK: Calculating average nucleotide identity using FastANI (v1.3). + [2023-02-22 16:10:57] INFO: Completed 34 comparisons in 2.27 seconds (14.95 comparisons/second). + [2023-02-22 16:10:57] INFO: 0 genome(s) have been classified using FastANI and pplacer. + [2023-02-22 16:10:57] TASK: Placing 1 bacterial genomes into class-level reference tree 5 (2/2) with pplacer using 20 CPUs (be patient). + [2023-02-22 16:14:29] INFO: Calculating RED values based on reference tree. + [2023-02-22 16:14:31] TASK: Traversing tree to determine classification method. + [2023-02-22 16:14:31] INFO: Completed 1 genome in 0.06 seconds (16.77 genomes/second). + [2023-02-22 16:14:31] INFO: 0 genome(s) have been classified using FastANI and pplacer. + [2023-02-22 16:14:31] WARNING: 1 of 3 genome has a warning (see summary file). + [2023-02-22 16:14:31] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode. + [2023-02-22 16:14:31] INFO: Done. + [2023-02-22 16:20:06] INFO: Completed 65,703 genomes in 9.25 minutes (7,103.32 genomes/minute). + [2023-02-22 16:20:06] INFO: Calculating Mash distances. + [2023-02-22 16:20:10] INFO: Calculating ANI with FastANI v1.3. + [2023-02-22 16:20:11] INFO: Completed 12 comparisons in 0.63 seconds (18.90 comparisons/second). + [2023-02-22 16:20:11] INFO: Summary of results saved to: classify_wf_outdir_test_mash/classify/ani_screen/gtdbtk.bac120.ani_summary.tsv + [2023-02-22 16:20:11] INFO: 1 genome(s) have been classified using the ANI pre-screening step. + [2023-02-22 16:20:11] INFO: Done. + [2023-02-22 16:20:11] INFO: 1 genome(s) have been classified using the ANI pre-screening step. + [2023-02-22 16:20:11] INFO: Done. + [2023-02-22 16:20:11] INFO: Identifying markers in 2 genomes with 20 threads. + [2023-02-22 16:20:11] TASK: Running Prodigal V2.6.3 to identify genes. + [2023-02-22 16:20:12] INFO: Completed 2 genomes in 0.22 seconds (9.07 genomes/second). + [2023-02-22 16:20:12] WARNING: Prodigal skipped 2 genomes due to pre-existing data, see warnings.log + [2023-02-22 16:20:12] TASK: Identifying TIGRFAM protein families. + [2023-02-22 16:20:12] INFO: Completed 2 genomes in 0.03 seconds (65.39 genomes/second). + [2023-02-22 16:20:12] WARNING: TIGRFAM skipped 2 genomes due to pre-existing data, see warnings.log + [2023-02-22 16:20:12] TASK: Identifying Pfam protein families. + [2023-02-22 16:20:12] INFO: Completed 2 genomes in 0.03 seconds (68.36 genomes/second). + [2023-02-22 16:20:12] WARNING: Pfam skipped 2 genomes due to pre-existing data, see warnings.log + [2023-02-22 16:20:12] INFO: Annotations done using HMMER 3.1b2 (February 2015). + [2023-02-22 16:20:12] TASK: Summarising identified marker genes. + [2023-02-22 16:20:12] INFO: Completed 2 genomes in 0.06 seconds (32.55 genomes/second). + [2023-02-22 16:20:12] INFO: Done. + [2023-02-22 16:20:12] INFO: Aligning markers in 2 genomes with 20 CPUs. + [2023-02-22 16:20:12] INFO: Processing 2 genomes identified as bacterial. + [2023-02-22 16:20:21] INFO: Read concatenated alignment for 62,291 GTDB genomes. + [2023-02-22 16:20:21] TASK: Generating concatenated alignment for each marker. + [2023-02-22 16:20:22] INFO: Completed 2 genomes in 0.03 seconds (79.85 genomes/second). + [2023-02-22 16:20:23] TASK: Aligning 100 identified markers using hmmalign 3.1b2 (February 2015). + [2023-02-22 16:20:25] INFO: Completed 100 markers in 1.06 seconds (93.94 markers/second). + [2023-02-22 16:20:25] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask. + [2023-02-22 16:22:21] INFO: Completed 62,293 sequences in 1.93 minutes (32,233.24 sequences/minute). + [2023-02-22 16:22:21] INFO: Masked bacterial alignment from 41,084 to 5,036 AAs. + [2023-02-22 16:22:21] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. + [2023-02-22 16:22:22] INFO: Creating concatenated alignment for 62,293 bacterial GTDB and user genomes. + [2023-02-22 16:22:46] INFO: Creating concatenated alignment for 2 bacterial user genomes. + [2023-02-22 16:22:46] INFO: Done. + [2023-02-22 16:22:47] TASK: Placing 2 bacterial genomes into backbone reference tree with pplacer using 20 CPUs (be patient). + [2023-02-22 16:22:47] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 + [2023-02-22 16:25:01] INFO: Calculating RED values based on reference tree. + [2023-02-22 16:25:02] INFO: 2 out of 2 have an class assignments. Those genomes will be reclassified. + [2023-02-22 16:25:02] TASK: Placing 1 bacterial genomes into class-level reference tree 6 (1/2) with pplacer using 20 CPUs (be patient). + [2023-02-22 16:29:46] INFO: Calculating RED values based on reference tree. + [2023-02-22 16:29:48] TASK: Traversing tree to determine classification method. + [2023-02-22 16:29:48] INFO: Completed 1 genome in 0.00 seconds (2,391.28 genomes/second). + [2023-02-22 16:29:48] TASK: Calculating average nucleotide identity using FastANI (v1.3). + [2023-02-22 16:29:50] INFO: Completed 34 comparisons in 1.53 seconds (22.22 comparisons/second). + [2023-02-22 16:29:50] INFO: 0 genome(s) have been classified using FastANI and pplacer. + [2023-02-22 16:29:50] TASK: Placing 1 bacterial genomes into class-level reference tree 5 (2/2) with pplacer using 20 CPUs (be patient). + [2023-02-22 16:33:17] INFO: Calculating RED values based on reference tree. + [2023-02-22 16:33:19] TASK: Traversing tree to determine classification method. + [2023-02-22 16:33:19] INFO: Completed 1 genome in 0.06 seconds (17.02 genomes/second). + [2023-02-22 16:33:19] INFO: 0 genome(s) have been classified using FastANI and pplacer. + [2023-02-22 16:33:19] WARNING: 1 of 3 genome has a warning (see summary file). + [2023-02-22 16:33:19] INFO: 0 genome(s) have been classified using FastANI and pplacer. + [2023-02-22 16:33:19] WARNING: 1 of 3 genome has a warning (see summary file). + [2023-02-22 16:33:19] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode. + [2023-02-22 16:33:19] INFO: Done. diff --git a/docs/src/faq.rst b/docs/src/faq.rst index 91da709e..de3d2de0 100644 --- a/docs/src/faq.rst +++ b/docs/src/faq.rst @@ -3,16 +3,19 @@ FAQ === -Why is there a discrepancy in the naming system between GTDB-Tk and NCBI or Silva taxonomic names? --------------------------------------------------------------------------------------------------- +Taxonomy FAQ +------------ +Why is there a discrepancy in the naming system between GTDB-Tk and NCBI or Silva taxonomic names? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ GTDB-Tk uses the GTDB taxonomy (`https://gtdb.ecogenomic.org/ `_). This taxonomy is similar, but not identical to NCBI and Silva. In many cases the GTDB taxonomy more strictly follows the nomenclatural rules for rank suffixes which is why there is Nitrospirota instead of Nitrospirae. + Can you combine the bacterial and archaeal trees into a single tree? --------------------------------------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The bacterial and archaeal trees are inferred from different marker genes. Currently, the correct rootings of these trees remain an open area of research. @@ -20,10 +23,13 @@ GTDB-Tk does not provide a tool to merge the trees but It is possible to artific One solution would be to use (`DendroPy `_); a Python library used for phylogenetic computing. +GTDB-Tk FAQ +------------ + .. _faq_pplacer: GTDB-Tk reaches the memory limit / pplacer crashes --------------------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The host may report that GTDB-Tk has exceeded the memory requirements due to how ``pplacer`` is implemented. Briefly, this is only the reported value and is not true for how much memory is actually in use. @@ -66,8 +72,8 @@ memory, but the host will report 300 GB of memory in use. Using the ``--scratch_dir`` parameter and ``--pplacer_cpus 1`` may help. -Validating species assignments with average nucleotide identity ---------------------------------------------------------------- +How is GTDB-Tk validating species assignments using average nucleotide identity? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ GTDB-Tk uses `FastANI `_ to estimate the ANI between genomes. We recommend you have FastANI >= 1.32 as this version introduces a fix that makes the results deterministic. @@ -78,8 +84,8 @@ GTDB r207 strictly uses ANI to circumscribe species and GTDB-Tk follows this met The species-specific ANI circumscription radii are available from the `GTDB `_ website. -FastANI using more threads than allocated ------------------------------------------ +Why is FastANI using more threads than allocated? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you are using FastANI version 1.33 then you may run into an issue where FastANI will use more threads than you allocate. This can be problematic if running GTDB-Tk on a HPC where you have a limited number of threads available. @@ -103,3 +109,12 @@ From GTDB-Tk v2.0.0 the conda environment will automatically have FastANI v1.3 i From GTDB-Tk v2.2.2 the Docker container will automatically have FastANI v1.32 installed. Otherwise, manually build the container from the `Dockerfile `_, making sure to specify FastANI v1.32. + +What is the difference between the mutually exclusive options ``--mash_db`` and ``--skip_ani_screen``? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +| Starting with GTDB-Tk v2.2+, the ``classify_wf`` and ``classify`` function require an extra parameter to run: ``--mash_db`` or ``--skip_ani_screen``. +| With this new version of Tk, The first stage of ``classify`` pipelines (``classify_wf`` and ``classify``) is to compare all user genomes to all reference genomes and annotate them, if possible, based on ANI matches. +| Using the ``--mash_db`` option will indicate to GTDB-Tk the path of the sketched Mash database require for ANI screening. +| If no database are available ( i.e. this is the first time running classify ), the ``--mash_db`` option will sketch a new Mash database that can be used for subsequent calls. +| The ``--skip_ani_screen`` option will skip the pre-screening step and classify all genomes similar to previous versions of GTDB-Tk. \ No newline at end of file