Skip to content

Releases: PacificBiosciences/pb-metagenomics-tools

v1.5.0

13 Apr 18:04
dbc183d
Compare
Choose a tag to compare

HiFi-MAG-Pipeline:

  • Updated to require GTDB-Tk V2.0.0+, and ensured compatibility with GTDB 07-RS207 (release 207).
  • Set default max contigs per MAG to 20.
  • Updated docs, including image showing "circular-aware binning" strategy.

Taxonomic-Functional-Profiling-Protein:

  • New default behavior is to output two RMA files per sample. The {sample}_filtered.protein.{mode}.rma results from the optimal MEGAN-LR filter for precision/recall balance, whereas the {sample}_unfiltered.protein.{mode}.rma file has no filtering (e.g., any read assigned to any taxon is reported). The filtering parameter for {sample}_filtered.protein.{mode}.rma is still be controlled with the sam2rma: minSupportPercent argument in the config.yaml file; the default is 0.01.
  • Ensured compatibility with newest MEGAN mapping file: megan-map-Feb2022.db.zip.
  • Updated docs.

v1.4.2

08 Feb 23:56
Compare
Choose a tag to compare
  • Changed default settings for MEGAN-LR in the Taxonomic-Functional-Profiling-Protein workflow based on results of new benchmarks, and made the minSupportPercent parameter accessible in the config file.
  • The MEGAN-LR default value for the minSupportPercent parameter (0.05) is too conservative and results in high precision (no false positives) but lower recall (false negatives; low abundance species are missed). The optimal value based on results from Portik et al 2022 (https://www.biorxiv.org/content/10.1101/2022.01.31.478527v1) is 0.01, which increases recall without reducing precision. To avoid any filtering based on this threshold, a value of 0 can be set instead. This will report ALL assigned reads, which will potentially include thousands of false positives at ultra-low abundances (<0.01%), similar to results from short-read methods (e.g., Kraken2, Centrifuge, etc). Updated docs and config file to describe these changes.
  • Fixed a bug related to taxon name parsing when making mpa and kreport files in MEGAN-RMA-summary when the name contains a double underscore character (__).
  • Increased default number of threads for gtdb-tk in HiFi-MAG-Pipeline due to reports of pplacer failing with segmentation faults (but pplacer threads should NOT be increased!).
  • Added new script to pb-metagenomics-scripts/Convert-to-kreport-mpa, called Adjust-kreport-taxonomy.py. This can be used to update a kraken-style report to the most current NCBI taxonomy.

v1.4.1

22 Nov 21:12
af3a192
Compare
Choose a tag to compare
  • Added bin checking step in HiFi-MAG-Pipeline to prevent low quality assemblies from reaching the DAS_Tool step. This will throw an error which will be reported in the logs/SAMPLE.CheckForBins.log file. If no bins are created with MetaBat2 but there are circular contigs present, these will move on to DAS_Tool. This can also cause an error if the circular contigs are not bacteria/archaea sequences.
  • Fixed potential database collisions during DAS_Tool step.

v1.4.0

10 Nov 23:11
811f1b3
Compare
Choose a tag to compare
  • Major change in binning strategy for HiFi-MAG-Pipeline to include three strategies: 1) circular contigs are assigned to bins directly, 2) linear contigs are binned with metabat2, and 3) the full contig set is binned with metabat2. The bins are compared and merged using DAS_Tool. This improves overall bin/MAG yield and also prevents complete circular contigs from being mis-binned and eliminated due to contamination (e.g., presence of multiple single copy genes). This workflow design should allow other binning tools to be incorporated in future releases (e.g. maxbin2, concoct).
  • Added Compare-kreport-taxonomic-profiles to pb-metagenomics-scripts. Allows easy comparisons of taxonomic profiling kreports across samples. Exports counts tables and creates stacked barplots and heatmaps for desired rank level (species, genus, family, etc.)
  • Several small bug fixes.

v1.3.0

30 Mar 23:36
Compare
Choose a tag to compare

The following changes have been made in v1.3.0:

  • MEGAN-RMA-summary workflow has been added. This allows simple exports of all taxonomic and functional class counts from RMA files created using the Taxonomic-Functional-Profiling-Protein and Taxonomic-Profiling-Nucleotide workflows. MEGAN counts are also exported in kraken report (kreport) and metaphlan (mpa) format to facilitate easy comparisons to other profiling programs.
  • Genome-Binning-Pipeline has been renamed to HiFi-MAG-Pipeline.
  • HiFi-MAG-Pipeline conda environments are now specific to each rule to prevent environment conflicts.
  • pb-metagenomics-scripts folder has been added. This currently contains multiple scripts to convert the outputs of several taxonomic profiling programs to standard formats, including kraken report (kreport) and metaphlan (mpa) format. For an example, see the post here. These scripts will be broadly useful in facilitating comparisons among programs.
  • Taxonomic-Profiling-Nucleotide: Sorting of HiFi fasta files is now explicitly based on read order in alignments within the SAM files. This prevents any errors in sam2rma relating to mismatched reads and alignments.
  • Taxonomic-Profiling-Nucleotide: SAM files are filtered to remove de:f:-inf tags which cause an unresolved issue with sam2rma. See post on the MEGAN community here.
  • General conda environments updated to require Python 3.7.
  • Updated all relevant documentation.

v1.2.2

08 Oct 20:08
Compare
Choose a tag to compare

Issues were addressed that were caused by differences from snakemake v4.8 to 5+:

  • Genome-Binning-Pipeline: In rule RunMetabat the output was changed from 2-metabat-bins/{sample}/completed.txt to 2-metabat-bins/{sample}.completed.txt so that it is no longer contained in the simultaneously created output directory. This eliminated a ChildIOException error. It was tested with snakemake v5.4 and v5.19.
  • Genome-Binning-Pipeline: Directories that are outputs from a rule are explicitly labeled with the directory() syntax to prevent errors in snakemake v5+. Tested with v5.19.

Documentation now explicitly states requirements for snakemake v5+, with 5.19.+ recommended.

v1.2.1

29 Sep 23:23
Compare
Choose a tag to compare

Fixed incorrect location for reads when making RMA in Taxonomic-Functional-Profiling-Protein snakefile.

v1.2

28 Sep 20:20
Compare
Choose a tag to compare

Changed repo name of Taxonomic-Functional-Profiling-Nucleotide to Taxonomic-Profiling-Nucleotide to reflect the fact that functional annotations are not created with MEGAN for nucleotide alignments. Updated all docs and ancillary files to match the new repo name. Added readme files to all pipeline repos with brief description and links to docs.

v1.1

28 Sep 19:43
Compare
Choose a tag to compare

Now includes a docs folder with tutorials for all three pipelines. Minor fixes to some annotations in config files, and renamed sample config files for consistency with docs.

v1.0

24 Sep 00:45
Compare
Choose a tag to compare

This is the initial release of pb-metagenomics-tools, which includes three Snakemake workflows including: Genome-Binning-Pipeline, Taxonomic-Functional-Profiling-Nucleotide, and Taxonomic-Functional-Profiling-Protein.