This pipeline is based on the original YAMP repo. Modifications have been made to make use of our infrastrucutre more readily. If you're here for a more customizable and flexible pipeline, please consider taking a look at the original repo.
aws batch submit-job \
--job-name nf-readprofiler_20240709_DS-mNGS \
--job-queue priority-maf-pipelines \
--job-definition nextflow-production \
--container-overrides command="FischbachLab/nf-reads-profiler, \
"--project","20240709_DS-mNGS", \
"--singleEnd","false", \
"--seedfile","s3://genomics-workflow-core/Results/reads-profiler/seedfiles/20240709_DS-mNGS.seedfile.csv", \
"--outdir","s3://genomics-workflow-core/Results/reads-profiler" "
- The
seedfile
should be a THREE column csv file with the following headers.
sampleName,R1,R2
20240614_DS037_D01_R1.fastq.gz,s3://genomics-workflow-core/Results/Basespace/NextSeq/20240709_DS-mNGS_HKT5LBGXW/20240614_DS037_D01_R1.fastq.gz_R1_001.fastq.gz,s3://genomics-workflow-core/Results/Basespace/NextSeq/20240709_DS-mNGS_HKT5LBGXW/20240614_DS037_D01_R1.fastq.gz_R2_001.fastq.gz
20240614_DS038_E01_R1.fastq.gz,s3://genomics-workflow-core/Results/Basespace/NextSeq/20240709_DS-mNGS_HKT5LBGXW/20240614_DS038_E01_R1.fastq.gz_R1_001.fastq.gz,s3://genomics-workflow-core/Results/Basespace/NextSeq/20240709_DS-mNGS_HKT5LBGXW/20240614_DS038_E01_R1.fastq.gz_R2_001.fastq.gz
aws batch submit-job \
--profile maf \
--job-name nf-rp-1101-2 \
--job-queue priority-maf-pipelines \
--job-definition nextflow-production \
--container-overrides command=fischbachlab/nf-reads-profiler,\
"--project","TEST",\
"--prefix","branch_metaphlan4",\
"--singleEnd","false",\
"--reads1","s3://dev-scratch/fastq/small/random_ncbi_reads_with_duplicated_and_contaminants_R1.fastq.gz",\
"--reads2","s3://dev-scratch/fastq/small/random_ncbi_reads_with_duplicated_and_contaminants_R2.fastq.gz"
- The final output is a single tab-delimited table from a set of sample-specific abundance profiles (the sample names, feature taxonomies, and relative abundances) in the folder, e.g., s3://genomics-workflow-core/Results/reads-profiler/20240709_DS-mNGS/merged_metaphlan_results/
20240614_LKV_AK12_DC_240604_C05 20240614_LKV_AK22_DC_240604_C06
UNCLASSIFIED 1.38723 5.63586
Bacteroides_fragilis 74.71778448588894 0.0
Odoribacter_splanchnicus 7.2665189180272725 2.12202312969113
Bacteroides_xylanisolvens 3.4467235358389776 0.09979951893316047
Bacteroides_thetaiotaomicron 1.8451929194769707 0.0
Ligilactobacillus_salivarius 1.4416497259818657 0.0
Alistipes_onderdonkii 1.38628851469499 0.0
Prevotella_bivia 1.2946871089871548 0.0
Alistipes_communis 1.2718188067117246 5.0323360110009645
"--reads1","s3://czb-seqbot/fastqs/200817_NB501938_0185_AH23FNBGXG/MITI_Purification_Healthy/E8_SH0000236_0619-Cult-2-481_S22_R1_001.fastq.gz",\
"--reads2","s3://czb-seqbot/fastqs/200817_NB501938_0185_AH23FNBGXG/MITI_Purification_Healthy/E8_SH0000236_0619-Cult-2-481_S22_R2_001.fastq.gz"
Although the databases have been stored at the appropriate /mnt/efs/databases
location mentioned in the config file. There might come a time when these need to be updated. Here is a quick view on how to do that.
cd /mnt/efs/databases/Biobakery/Metaphlan/v4.0
docker container run \
--volume $PWD:$PWD \
--workdir $PWD \
--rm \
458432034220.dkr.ecr.us-west-2.amazonaws.com/biobakery/workflows:maf-20221028-a1 \
metaphlan \
--install \
--nproc 4 \
--bowtie2db .
This requires 3 databases.
cd /mnt/efs/databases/Biobakery/Humann/v3.6
docker container run \
--volume $PWD:$PWD \
--workdir $PWD \
--rm \
458432034220.dkr.ecr.us-west-2.amazonaws.com/biobakery/workflows:maf-20221028-a1 \
humann_databases \
--download \
chocophlan full .
This will create a subdirectory chocophlan
, and download and extract the database here.
cd /mnt/efs/databases/Biobakery/Humann/v3.6
docker container run \
--volume $PWD:$PWD \
--workdir $PWD \
--rm \
458432034220.dkr.ecr.us-west-2.amazonaws.com/biobakery/workflows:maf-20221028-a1 \
humann_databases \
--download \
uniref uniref90_diamond .
This will create a subdirectory uniref
, and download and extract the database here.
cd /mnt/efs/databases/Biobakery/Humann/v3.6
docker container run \
--volume $PWD:$PWD \
--workdir $PWD \
--rm \
458432034220.dkr.ecr.us-west-2.amazonaws.com/biobakery/workflows:maf-20221028-a1 \
humann_databases \
--download \
utility_mapping full .