Releases: PacificBiosciences/pb-metagenomics-tools
v3.3.0
Upgraded workflows and documentation for snakemake v8
- Workflows tested with v8.25
- snakemake environment file updated (
snakemake-environment.yaml
) - Documentation updated for cluster execution modules in snakemake v8
HiFi-MAG-Pipeline
sembin
upgraded to v2.1, changed command-line syntaxcheckm2
upgraded to v1.0.2gtdbtk
upgraded to v2.4.0, database must now be R220, changed command line syntax and metadata gatheringdas_tool
upgraded to v1.1.7- fixed
checkm2
database bug in #87 - fixed issue reported in #86
- upgraded conda envs may solve #88
- added memory resources, defaults selected based on benchmarks of typical and deep-sequencing datasets
v3.2.0
Added new feature to HiFi-MAG-Pipeline
:
- Identifies "superbins" that are output from SemiBin2, which are >100 Mb in size. Including very large superbins (~1GB) will cause crashes in DAS_Tool. This new identification step will move all superbins to a
superbins
folder in the SemiBin2 sample output directory. These individual superbins can be inspected to determine their contents, which sometimes contain interesting eukaryotic genomes.
v3.1.0
Added new features to HiFi-MAG-Pipeline
:
- Use local version of CheckM2 database, rather than rely on automated download in workflow. Fixes issues previously raised.
- Added new mapping feature. Converts existing bam to paf, then filters based on unique reads, percent read aligned, and percent identity. Performs at contig and MAG level, generates a table and figure of percent reads mapped.
v3.0.0
Added pb-MAG-mirror
workflow to compare and consolidate two MAG sets.
- Associated with preprint available here: https://doi.org/10.1101/2024.05.10.593587
v2.1.0
HiFi-MAG-Pipeline:
Bug fix. Given a reference longer than 4Gb, minimap2 is unable to see all the sequences and thus can't produce a correct SAM header. This throws an error in the MinimapToBam rule with the sorting step. This fix avoids the issue by splitting this rule into two new rules: MinimapIndex and MinimapToBam. These rules index the reference prior to alignment and run minimap2 with the --split-prefix option, respectively.
Taxonomic-Profiling-Diamond-Megan:
- Bug fix (#65). Created separate environments for rules to prevent bad env recipes.
- Feature request (#20). Added access to KEGG annotations with MEGAN-UE. New snakemake added to repo called
Snakefile-diamond-megan-ue.smk
to enable workflow. Docs updated to explain usage and file download requirements (requires binaries from MEGAN-UE and mapping file for MEGAN-UE). Apparently does not require a license to map KEGG annotations using CLI tools.
v2.0.2
HiFi-MAG-Pipeline:
Depending on version of DAS_Tool, the name of the helper script used to generate input files change:
<= v1.1.3 : Fasta_to_Scaffolds2Bin.sh
> v1.1.3 : Fasta_to_Contig2Bin.sh
Snakemake was designed to run with the script from <= v1.1.3, causing errors with the newer versions.
This release pins DAS_Tool to v1.1.6 and uses the updated Fasta_to_Contig2Bin.sh
name in the workflow.
v2.0.1
HiFi-MAG-Pipeline:
Made changes to environment recipes.
- Add fuzzy matching to problematic environments (dastool, semibin)
- Added full channel set to environments (some only had bioconda)
Bug fix for edge case in Filter-Complete-Contigs.py
: Dataset consisting of only bins with 100% completeness raises an error in plotting the histogram due to improper bin sizes. Bug fix adds conditional statement to handle this case.
v2.0.0
HiFi-MAG-Pipeline received major improvements.
The new version of HiFi-MAG-Pipeline is "completeness-aware":
- Long contigs >500kb are identified and placed in individual fasta files.
- They are then examined using CheckM2 to determine percent completeness.
- All long contigs that are >93% complete are then moved directly to the final MAG set.
- The long contigs that are <93% complete are pooled with other shorter incomplete contigs from the starting set, and this contig set is subjected to binning.
- Binning algorithms include MetaBat2 and SemiBin2 (using long read settings).
- The two bin sets are merged using DAS_Tool.
- The dereplicated bin set consists of the merged bin set from above and all long complete contigs found.
- This dereplicated bin set is examined using CheckM2, and subsequently filtered based on several qualities (defaults = >70% completeness, <10% contamination, <20 contigs).
- All bins/MAGs passing filtering undergo taxonomic assignment using GTDB-Tk.
- The final MAGs are written as a set of fasta files, several figures are produced, and a summary file of metadata is generated.
The new "completeness-aware" strategy is highly effective at preventing improper binning of complete contigs.
- It is more effective than the previous "circular-aware" binning used in v1.5 and v1.6.
- Compared to a standard binning pipeline (e.g., MetaBat2), it results in a 14-67% increase in total MAGs (average 36%) and 13-186% increase in single contig MAGs (average 87%).
- Compared to the "circular-aware" binning in v1.5, it results in a 14-39% increase in total MAGs (average 27%) and 10-28% increase in single contig MAGs (average 20%).
Beyond the "completeness-aware" strategy, there are several other important updates:
- It now uses CheckM2 instead of CheckM, and no longer requires the manual download of the Checkm database.
- For binning, Concoct and MaxBin2 have been retired, and SemiBin2 is used in conjunction with MetaBat2. SemiBin2 is highly effective at binning contigs from long-read assemblies and obtains better results.
- This version also introduces checkpoints to create forked workflows depending on the properties of the sample, thereby preventing crashes when no bins pass filtering. This applies to the long contig completeness evaluation stage and the binning of incomplete contigs.
- New figures are produced as part of the long contig evaluations and final summary steps.
v1.6.1
v1.6.0
Major changes:
- Added Taxonomic-Profiling-Sourmash workflow (all credit to @bluegenes !) .
- Consolidated existing profiling workflows.
- Taxonomic-Profiling-Diamond-Megan is the combined workflow of
Taxonomic-Functional-Profiling-Protein
+MEGAN-RMA-Summary
. - Taxonomic-Profiling-Minimap-Megan is the combined workflow of
Taxonomic-Profiling-Nucleotide
+MEGAN-RMA-Summary
.
- Taxonomic-Profiling-Diamond-Megan is the combined workflow of
- Added new binning methods to the custom strategy in HiFi-MAG-Pipeline, including
CONCOCT
andMaxBin2
. - Updated all relevant documentation and images.
Minor changes:
- Taxonomic-Profiling-Minimap-Megan: Added filtered & unfiltered versions of RMA file, using the
--minSupportPercent 0.01
flag insam2rma
. Allows filtered & unfiltered taxonomic report outputs. - HiFi-MAG-Pipeline: Set minimum contig size for binning to
50000
across all binning methods (MetaBAT2
,CONCOCT
,MaxBin2
). - HiFi-MAG-Pipeline: Updated GTDB-TK requirement to v2.1.1.