Releases: PacificBiosciences/pb-metagenomics-tools
Releases · PacificBiosciences/pb-metagenomics-tools
v1.5.0
HiFi-MAG-Pipeline
:
- Updated to require GTDB-Tk V2.0.0+, and ensured compatibility with GTDB 07-RS207 (release 207).
- Set default max contigs per MAG to
20
. - Updated docs, including image showing "circular-aware binning" strategy.
Taxonomic-Functional-Profiling-Protein
:
- New default behavior is to output two RMA files per sample. The
{sample}_filtered.protein.{mode}.rma
results from the optimal MEGAN-LR filter for precision/recall balance, whereas the{sample}_unfiltered.protein.{mode}.rma
file has no filtering (e.g., any read assigned to any taxon is reported). The filtering parameter for{sample}_filtered.protein.{mode}.rma
is still be controlled with thesam2rma: minSupportPercent
argument in theconfig.yaml
file; the default is0.01
. - Ensured compatibility with newest MEGAN mapping file: megan-map-Feb2022.db.zip.
- Updated docs.
v1.4.2
- Changed default settings for MEGAN-LR in the
Taxonomic-Functional-Profiling-Protein
workflow based on results of new benchmarks, and made theminSupportPercent
parameter accessible in the config file. - The MEGAN-LR default value for the
minSupportPercent
parameter (0.05
) is too conservative and results in high precision (no false positives) but lower recall (false negatives; low abundance species are missed). The optimal value based on results from Portik et al 2022 (https://www.biorxiv.org/content/10.1101/2022.01.31.478527v1) is0.01
, which increases recall without reducing precision. To avoid any filtering based on this threshold, a value of0
can be set instead. This will report ALL assigned reads, which will potentially include thousands of false positives at ultra-low abundances (<0.01%), similar to results from short-read methods (e.g., Kraken2, Centrifuge, etc). Updated docs and config file to describe these changes. - Fixed a bug related to taxon name parsing when making mpa and kreport files in
MEGAN-RMA-summary
when the name contains a double underscore character (__
). - Increased default number of threads for gtdb-tk in
HiFi-MAG-Pipeline
due to reports of pplacer failing with segmentation faults (but pplacer threads should NOT be increased!). - Added new script to pb-metagenomics-scripts/Convert-to-kreport-mpa, called
Adjust-kreport-taxonomy.py
. This can be used to update a kraken-style report to the most current NCBI taxonomy.
v1.4.1
- Added bin checking step in
HiFi-MAG-Pipeline
to prevent low quality assemblies from reaching the DAS_Tool step. This will throw an error which will be reported in thelogs/SAMPLE.CheckForBins.log
file. If no bins are created with MetaBat2 but there are circular contigs present, these will move on to DAS_Tool. This can also cause an error if the circular contigs are not bacteria/archaea sequences. - Fixed potential database collisions during DAS_Tool step.
v1.4.0
- Major change in binning strategy for
HiFi-MAG-Pipeline
to include three strategies: 1) circular contigs are assigned to bins directly, 2) linear contigs are binned with metabat2, and 3) the full contig set is binned with metabat2. The bins are compared and merged using DAS_Tool. This improves overall bin/MAG yield and also prevents complete circular contigs from being mis-binned and eliminated due to contamination (e.g., presence of multiple single copy genes). This workflow design should allow other binning tools to be incorporated in future releases (e.g. maxbin2, concoct). - Added
Compare-kreport-taxonomic-profiles
topb-metagenomics-scripts
. Allows easy comparisons of taxonomic profiling kreports across samples. Exports counts tables and creates stacked barplots and heatmaps for desired rank level (species, genus, family, etc.) - Several small bug fixes.
v1.3.0
The following changes have been made in v1.3.0:
MEGAN-RMA-summary
workflow has been added. This allows simple exports of all taxonomic and functional class counts from RMA files created using theTaxonomic-Functional-Profiling-Protein
andTaxonomic-Profiling-Nucleotide
workflows. MEGAN counts are also exported in kraken report (kreport) and metaphlan (mpa) format to facilitate easy comparisons to other profiling programs.Genome-Binning-Pipeline
has been renamed toHiFi-MAG-Pipeline
.HiFi-MAG-Pipeline
conda environments are now specific to each rule to prevent environment conflicts.pb-metagenomics-scripts
folder has been added. This currently contains multiple scripts to convert the outputs of several taxonomic profiling programs to standard formats, including kraken report (kreport) and metaphlan (mpa) format. For an example, see the post here. These scripts will be broadly useful in facilitating comparisons among programs.Taxonomic-Profiling-Nucleotide
: Sorting of HiFi fasta files is now explicitly based on read order in alignments within the SAM files. This prevents any errors insam2rma
relating to mismatched reads and alignments.Taxonomic-Profiling-Nucleotide
: SAM files are filtered to removede:f:-inf
tags which cause an unresolved issue withsam2rma
. See post on the MEGAN community here.- General conda environments updated to require Python 3.7.
- Updated all relevant documentation.
v1.2.2
Issues were addressed that were caused by differences from snakemake v4.8 to 5+:
- Genome-Binning-Pipeline: In rule
RunMetabat
the output was changed from2-metabat-bins/{sample}/completed.txt
to2-metabat-bins/{sample}.completed.txt
so that it is no longer contained in the simultaneously created output directory. This eliminated a ChildIOException error. It was tested with snakemake v5.4 and v5.19. - Genome-Binning-Pipeline: Directories that are outputs from a rule are explicitly labeled with the
directory()
syntax to prevent errors in snakemake v5+. Tested with v5.19.
Documentation now explicitly states requirements for snakemake v5+, with 5.19.+ recommended.
v1.2.1
v1.2
Changed repo name of Taxonomic-Functional-Profiling-Nucleotide
to Taxonomic-Profiling-Nucleotide
to reflect the fact that functional annotations are not created with MEGAN for nucleotide alignments. Updated all docs and ancillary files to match the new repo name. Added readme files to all pipeline repos with brief description and links to docs.