Skip to content

Commit

Permalink
Merge pull request #16 from EBI-Metagenomics/feature/long_read_qc
Browse files Browse the repository at this point in the history
Addition of long-reads pre-assembly qcs
  • Loading branch information
mberacochea authored Nov 14, 2024
2 parents d63ce26 + aa18aa0 commit 36f9c8e
Show file tree
Hide file tree
Showing 61 changed files with 3,202 additions and 239 deletions.
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,20 +35,24 @@ Input/output options
--library_strategy [string] Force the library_strategy value for the study / reads (accepted: metagenomic, metatranscriptomic,
genomic, transcriptomic, other)
--library_layout [string] Force the library_layout value for the study / reads (accepted: single, paired)
--platform [string] Force the sequencing_platform value for the study / reads
--spades_version [string] null [default: 3.15.5]
--megahit_version [string] null [default: 1.2.9]
--reference_genome [string] The genome to be used to clean the assembly, the genome will be taken from the Microbiome Informatics
--flye_version [string] null [default: 2.9]
--reference_genome [string] The genome to be used to clean the assembly, the genome will be taken from the Microbiome Informatics
internal directory (accepted: chicken.fna, salmon.fna, cod.fna, pig.fna, cow.fna, mouse.fna,
honeybee.fna, rainbow_trout.fna, ...)
--blast_reference_genomes_folder [string] The folder with the reference genome blast indexes, defaults to the Microbiome Informatics internal
directory.
--bwamem2_reference_genomes_folder [string] The folder with the reference genome bwa-mem2 indexes, defaults to the Microbiome Informatics internal

--reference_genomes_folder [string] The folder with reference genomes, defaults to the Microbiome Informatics internal
directory.
--remove_human_phix [boolean] Remove human and phiX reads pre assembly, and contigs matching those genomes. [default: true]
--human_phix_blast_index_name [string] Combined Human and phiX BLAST db. [default: human_phix]
--human_phix_bwamem2_index_name [string] Combined Human and phiX bwa-mem2 index. [default: human_phix]
--min_contig_length [integer] Minimum contig length filter. [default: 500]
--min_contig_length_metatranscriptomics [integer] Minimum contig length filter for metaT. [default: 200]
--short_reads_min_contig_length [integer] Minimum contig length filter. [default: 500]
--short_reads_min_contig_length_metat [integer] Minimum contig length filter for metaT. [default: 200]
--assembly_memory [integer] Default memory allocated for the assembly process. [default: 100]
--spades_only_assembler [boolean] Run SPAdes/metaSPAdes without the error correction step. [default: true]
--outdir [string] The output directory where the results will be saved. You have to use absolute paths to storage on Cloud
Expand Down Expand Up @@ -210,15 +214,15 @@ Runs that fail QC checks are excluded from the assembly process. These runs are
Example:
```csv
SRR6180434,filter_ratio_threshold_exceeded
SRR6180434,short_reads_filter_ratio_threshold_exceeded
```
##### Runs exclusion messages
| Exclusion Message | Description |
| --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `filter_ratio_threshold_exceeded` | The maximum fraction of reads that are allowed to be filtered out. If exceeded, it flags excessive filtering. The default value is 0.9, meaning that if more than 90% of the reads are filtered out, the threshold is considered exceeded, and the run is not assembled. |
| `low_reads_count_threshold` | The minimum number of reads required after filtering. If below, it flags a low read count, and the run is not assembled. |
| `short_reads_filter_ratio_threshold_exceeded` | The maximum fraction of reads that are allowed to be filtered out. If exceeded, it flags excessive filtering. The default value is 0.9, meaning that if more than 90% of the reads are filtered out, the threshold is considered exceeded, and the run is not assembled. |
| `short_reads_low_reads_count_threshold` | The minimum number of reads required after filtering. If below, it flags a low read count, and the run is not assembled. |
#### Assembled Runs
Expand Down
6 changes: 6 additions & 0 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@
"enum": ["metagenomic", "metatranscriptomic", "genomic", "transcriptomic", "other"],
"errorMessage": "library strategy should be only value from list: 'metagenomic', 'metatranscriptomic', 'genomic', 'transcriptomic', 'other'"
},
"platform": {
"type": "string"
},
"assembler": {
"type": "string",
"enum": ["spades", "metaspades", "megahit"],
Expand All @@ -57,6 +60,9 @@
"type": "integer",
"default": null,
"description": "Default memory (in GB) allocated for the assembly process for the run."
},
"assembler_config": {
"type": "string"
}
},
"required": ["study_accession", "reads_accession", "fastq_1", "library_layout", "library_strategy"]
Expand Down
1 change: 1 addition & 0 deletions conf/codon_slurm.config
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
params {
reference_genomes_folder = "/hps/nobackup/rdf/metagenomics/service-team/ref-dbs/bwa-mem2/"
bwamem2_reference_genomes_folder = "/hps/nobackup/rdf/metagenomics/service-team/ref-dbs/bwa-mem2/"
blast_reference_genomes_folder = "/nfs/production/rdf/metagenomics/pipelines/prod/assembly-pipeline/blast_dbs/"
human_phix_blast_index_name = "human_phix"
Expand Down
55 changes: 53 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ process {
ext.args = params.private_study ? "--private" : ""
}

withName: 'FASTP' {
withName: 'FASTP*' {
cpus = { check_max( 6 * task.attempt, 'cpus' ) }
memory = { check_max( 36.GB * task.attempt, 'memory' ) }
time = { check_max( 8.h * task.attempt, 'time' ) }
Expand Down Expand Up @@ -50,6 +50,16 @@ process {
]
}

withName: 'FASTP_LR' {
ext.args = [
'--average_qual',
'10',
'--length_required',
"${params.long_reads_min_read_length}",
'--disable_adapter_trimming'
].join(' ').trim()
}

withName: 'FASTQC' {
cpus = { check_max( 6 * task.attempt, 'cpus' ) }
memory = { check_max( 36.GB * task.attempt, 'memory' ) }
Expand Down Expand Up @@ -89,13 +99,54 @@ process {
ext.prefix = "decontaminated"
}

withName: 'HUMAN_PHIX_DECONTAMINATION' {
withName: 'HUMAN*_DECONTAMINATION' {
memory = { check_max( 64.GB * task.attempt, 'memory' ) }
}

withName: 'HOST_DECONTAMINATION' {
memory = { check_max( 24.GB * task.attempt, 'memory' ) }
}

withName: 'CANU*' {
cpus = { check_max( 4 , 'cpus' ) }
memory = { check_max( 3.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }

ext.args = [
'-trim',
'-corrected',
'corMinCoverage=0',
'stopOnLowCoverage=0',
'minInputCoverage=0',
'maxInputCoverage=10000',
'corOutCoverage=all',
'corMhapSensitivity=high',
'corMaxEvidenceCoverageLocal=10',
'corMaxEvidenceCoverageGlobal=10',
'oeaMemory=10',
'redMemory=10',
'batMemory=10',
].join(' ').trim()
}

withName: 'CANU_ONT' {
ext.args2 = [
'correctedErrorRate=0.16',
].join(' ').trim()
}

withName: 'CANU_PACBIO' {
ext.args2 = [
'correctedErrorRate=0.105',
].join(' ').trim()
}

withName: 'PORECHOP_ONT' {
cpus = { check_max( 1 , 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
}

/* --------- */

/* Assembly */
Expand Down
15 changes: 11 additions & 4 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,17 @@ profiles {
max_memory = '6.GB'
max_time = '6.h'

bwamem2_reference_genomes_folder = "tests/human_phix/bwa2mem"
blast_reference_genomes_folder = "tests/human_phix/blast"
human_phix_blast_index_name = "human_phix"
human_phix_bwamem2_index_name = "human_phix"
bwamem2_reference_genomes_folder = "${projectDir}/tests/human_phix/bwa2mem"
blast_reference_genomes_folder = "${projectDir}/tests/human_phix/blast"
reference_genomes_folder = "${projectDir}/tests/human/"

max_spades_retries = -1
max_megahit_retries = -1
}

process {
errorStrategy = 'ignore'
maxRetries = 0
}
}
}
Expand Down
32 changes: 31 additions & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,19 @@
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"canu": {
"branch": "master",
"git_sha": "8fc1d24c710ebe1d5de0f2447ec9439fd3d9d66a",
"installed_by": ["modules"]
},
"custom/dumpsoftwareversions": {
"branch": "master",
"git_sha": "82024cf6325d2ee194e7f056d841ecad2f6856e9",
"installed_by": ["modules"]
},
"fastp": {
"branch": "master",
"git_sha": "95cf5fe0194c7bf5cb0e3027a2eb7e7c89385080",
"git_sha": "1ceaa8ba4d0fd886dbca0e545815d905b7407de7",
"installed_by": ["modules"],
"patch": "modules/nf-core/fastp/fastp.diff"
},
Expand All @@ -49,6 +54,16 @@
"installed_by": ["modules"],
"patch": "modules/nf-core/fastqc/fastqc.diff"
},
"flye": {
"branch": "master",
"git_sha": "8fc1d24c710ebe1d5de0f2447ec9439fd3d9d66a",
"installed_by": ["modules"]
},
"medaka": {
"branch": "master",
"git_sha": "8fc1d24c710ebe1d5de0f2447ec9439fd3d9d66a",
"installed_by": ["modules"]
},
"megahit": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
Expand All @@ -60,17 +75,32 @@
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"minimap2/align": {
"branch": "master",
"git_sha": "a33ef9475558c6b8da08c5f522ddaca1ec810306",
"installed_by": ["modules"]
},
"multiqc": {
"branch": "master",
"git_sha": "314d742bdb357a1df5f9b88427b3b6ac78aa33f7",
"installed_by": ["modules"]
},
"porechop/abi": {
"branch": "master",
"git_sha": "870f9af2eaf0000c94d74910d762cf153752af98",
"installed_by": ["modules"]
},
"quast": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"],
"patch": "modules/nf-core/quast/quast.diff"
},
"racon": {
"branch": "master",
"git_sha": "8fc1d24c710ebe1d5de0f2447ec9439fd3d9d66a",
"installed_by": ["modules"]
},
"samtools/idxstats": {
"branch": "master",
"git_sha": "a64788f5ad388f1d2ac5bd5f1f3f8fc81476148c",
Expand Down
19 changes: 14 additions & 5 deletions modules/local/fetchtool_reads.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,17 @@ process FETCHTOOL_READS {

label 'process_single'

container "quay.io/microbiome-informatics/fetch-tool:v1.0.0rc"
container "quay.io/microbiome-informatics/fetch-tool:v1.0.2"

input:
tuple val(meta), val(study_accession), val(reads_accession)
path fetchtool_config

output:
tuple val(meta), path("download_folder/${study_accession}/raw/${reads_accession}*.fastq.gz"), env(library_strategy), env(library_layout), emit: reads
tuple val(meta), path("download_folder/${study_accession}/raw/${reads_accession}*.fastq.gz"), env(library_strategy), env(library_layout), env(platform), emit: reads
// The '_mqc.' is for multiQC
tuple val(meta), path("download_folder/${study_accession}/${study_accession}.txt") , emit: metadata_tsv
path "versions.yml" , emit: versions
tuple val(meta), path("download_folder/${study_accession}/${study_accession}.txt") , emit: metadata_tsv
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when
Expand All @@ -32,6 +32,15 @@ process FETCHTOOL_READS {
library_strategy=\$(echo "\$(grep ${reads_accession} download_folder/${study_accession}/${study_accession}.txt | cut -f 7)" | tr '[:upper:]' '[:lower:]')
library_layout=\$(echo "\$(grep ${reads_accession} download_folder/${study_accession}/${study_accession}.txt | cut -f 5)" | tr '[:upper:]' '[:lower:]')
export metadata_platform=\$(echo "\$(grep ${reads_accession} download_folder/${study_accession}/${study_accession}.txt | cut -f 8)" | tr '[:upper:]' '[:lower:]')
if [[ \$metadata_platform == "minion" || \$metadata_platform == "promethion" || \$metadata_platform == "gridion" ]]; then
platform="ont"
elif [[ \$metadata_platform == "pacbio rs" || \$metadata_platform == "pacbio rs ii" ]]; then
platform="pacbio"
else
platform="\$metadata_platform"
fi
cat <<-END_VERSIONS > versions.yml
"${task.process}":
fetch-tool: \$(fetch-read-tool --version)
Expand All @@ -53,4 +62,4 @@ process FETCHTOOL_READS {
fetch-tool: \$(fetch-read-tool --version)
END_VERSIONS
"""
}
}
6 changes: 6 additions & 0 deletions modules/nf-core/canu/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

50 changes: 50 additions & 0 deletions modules/nf-core/canu/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 36f9c8e

Please sign in to comment.