Releases: metagenome-atlas/atlas
Use checkM2
V2.13
What's Changed
- use minimap for contigs, genecatalog and genomes in #569 #577
- filter genomes my self in #568
The filter function is defined in the config file:
genome_filter_criteria: "(Completeness-5*Contamination >50 ) & (Length_scaffolds >=50000) & (Ambigious_bases <1e6) & (N50 > 5*1e3) & (N_scaffolds < 1e3)"
The genome filtering is similar as other publications in the field, e.g. GTDB. What is maybe a bit different is that genomes with completeness around 50% and contamination around 10% are excluded where as using the default parameters dRep would include those.
- use Drep again in #579
We saw better performances using drep. This scales also now to ~1K samples - Use new Dram version 1.4 by in #564
Full Changelog: v2.12.0...v2.13.0
v2.12.0
What's Changed
- GTDB-tk requires rule
extract_gtdb
to run first by @Waschina in #551 - use Galah instead of Drep
- use bbsplit for mapping to genomes (maybe move to minimap in future)
- faster gene catalogs quantification using minimap.
- Compatible with snakemake v7.15
New Contributors
Full Changelog: v2.11.1...v2.12.0
Fix Enormous gene catalog
Due to an bug, the genecatalog was created based on all gene not only the representatives in v.2.11
If you have an oversized gene catalog:
Rerun:
atlas run genecatalog -R generate_orf_info
Small change in Dram environment to fix #547
Use parquet and pyfastx to handle large gene catalogs
What's Changed
- Make atlas handle large gene catalogs using parquet and pyfastx (Fix #515)
parquet files can be opened in python with
import pandas as pd
coverage = pd.read_parquet("working_dir/Genecatalog/counts/median_coverage.parquet")
coverage.set_index("GeneNr", inplace=True)
and in R it should be something like:
arrow::read_parquet("working_dir/Genecatalog/counts/median_coverage.parquet")
Full Changelog: v2.10.0...v2.11.0
GTDB v 207 low memory profiling
New Features
- GTDB version 207
- Low memory taxonomic annotation
Minor changes
Full Changelog: v2.9.1...v2.10.0
Go Public
What's Changed
- ✨ Start an atlas project from public data in SRA Docs
- Make atlas ready for python 3.10 #498
- Add strain profiling using inStrain You can run
atlas run genomes strains
New Contributors
Full Changelog: v2.8.2...v2.9.0
V2.8 - Toiminnot
This is a major update of metagenome-atlas. It was developed for the 3-day course in Finnland, that's also why it has a finish release name.
What is new?
New binners
It integrates bleeding-edge binners Vamb
and SemiBin
that use Co-binning based on co-abundance. Thank you @yanhui09 and @psj1997 for helping with this. The first results show better results using these binners over the default.
Pathway annotations
The command atlas run genomes
produces genome-level functional annotation and Kegg pathways respective modules. It uses DRAM from @shafferm with a hack to produce all available Kegg modules.
Genecatalog
The command atlas run gene catalog
now produces directly the abundance of the different genes. See more in #276
In future this part of the pipeline will include protein assembly to better tackle complicated metagenomes.
Minor updates
Reports are back
See for example the QC report
Update of all underlying tools
All tools use in atlas are now up to date. From assebler to GTDB.
The one exception is, BBmap which contains a bug and ignores the minidenty parameter.
Atlas init
Atlas init correctly parses fastq files even if they are in subfolders and if paired-ends are named simply Sample_1/Sample_2. @Sofie8 will be happy about this.
Atlas log uses nice colors.
Default clustering of Subspecies
The default ANI threshold for genome-dereplication was set to 97.5% to include more sub-species diversity.
Python 3.8, new ruaml
Bug fixes Drep
2.6a3 Set jobs to default