nf-reads-profiler

Acknowledgement

This pipeline is based on the original YAMP repo. Modifications have been made to make use of our infrastrucutre more readily. If you're here for a more customizable and flexible pipeline, please consider taking a look at the original repo.

nf-reads-profiler

Usage

Example1: preferred usage (--seedfile)

aws batch submit-job \
   --job-name nf-readprofiler_20240709_DS-mNGS \
   --job-queue priority-maf-pipelines  \
   --job-definition nextflow-production \
   --container-overrides command="FischbachLab/nf-reads-profiler, \ 
"--project","20240709_DS-mNGS", \
"--singleEnd","false",  \
"--seedfile","s3://genomics-workflow-core/Results/reads-profiler/seedfiles/20240709_DS-mNGS.seedfile.csv", \
"--outdir","s3://genomics-workflow-core/Results/reads-profiler" "

A seedfile example for paired-end samples

The seedfile should be a THREE column csv file with the following headers.

sampleName,R1,R2
20240614_DS037_D01_R1.fastq.gz,s3://genomics-workflow-core/Results/Basespace/NextSeq/20240709_DS-mNGS_HKT5LBGXW/20240614_DS037_D01_R1.fastq.gz_R1_001.fastq.gz,s3://genomics-workflow-core/Results/Basespace/NextSeq/20240709_DS-mNGS_HKT5LBGXW/20240614_DS037_D01_R1.fastq.gz_R2_001.fastq.gz
20240614_DS038_E01_R1.fastq.gz,s3://genomics-workflow-core/Results/Basespace/NextSeq/20240709_DS-mNGS_HKT5LBGXW/20240614_DS038_E01_R1.fastq.gz_R1_001.fastq.gz,s3://genomics-workflow-core/Results/Basespace/NextSeq/20240709_DS-mNGS_HKT5LBGXW/20240614_DS038_E01_R1.fastq.gz_R2_001.fastq.gz

Example2: --reads1 and --reads2 flags for paired end sample

aws batch submit-job \
    --profile maf \
    --job-name nf-rp-1101-2 \
    --job-queue priority-maf-pipelines \
    --job-definition nextflow-production \
    --container-overrides command=fischbachlab/nf-reads-profiler,\
"--project","TEST",\
"--prefix","branch_metaphlan4",\
"--singleEnd","false",\
"--reads1","s3://dev-scratch/fastq/small/random_ncbi_reads_with_duplicated_and_contaminants_R1.fastq.gz",\
"--reads2","s3://dev-scratch/fastq/small/random_ncbi_reads_with_duplicated_and_contaminants_R2.fastq.gz"

Outputs

The final output is a single tab-delimited table from a set of sample-specific abundance profiles (the sample names, feature taxonomies, and relative abundances) in the folder, e.g., s3://genomics-workflow-core/Results/reads-profiler/20240709_DS-mNGS/merged_metaphlan_results/

	20240614_LKV_AK12_DC_240604_C05	20240614_LKV_AK22_DC_240604_C06
UNCLASSIFIED	1.38723	5.63586
Bacteroides_fragilis	74.71778448588894	0.0
Odoribacter_splanchnicus	7.2665189180272725	2.12202312969113
Bacteroides_xylanisolvens	3.4467235358389776	0.09979951893316047
Bacteroides_thetaiotaomicron	1.8451929194769707	0.0
Ligilactobacillus_salivarius	1.4416497259818657	0.0
Alistipes_onderdonkii	1.38628851469499	0.0
Prevotella_bivia	1.2946871089871548	0.0
Alistipes_communis	1.2718188067117246	5.0323360110009645

Cross account test

"--reads1","s3://czb-seqbot/fastqs/200817_NB501938_0185_AH23FNBGXG/MITI_Purification_Healthy/E8_SH0000236_0619-Cult-2-481_S22_R1_001.fastq.gz",\
"--reads2","s3://czb-seqbot/fastqs/200817_NB501938_0185_AH23FNBGXG/MITI_Purification_Healthy/E8_SH0000236_0619-Cult-2-481_S22_R2_001.fastq.gz"

Databases

Although the databases have been stored at the appropriate /mnt/efs/databases location mentioned in the config file. There might come a time when these need to be updated. Here is a quick view on how to do that.

Metaphlan4

cd /mnt/efs/databases/Biobakery/Metaphlan/v4.0
docker container run \
    --volume $PWD:$PWD \
    --workdir $PWD \
    --rm \
    458432034220.dkr.ecr.us-west-2.amazonaws.com/biobakery/workflows:maf-20221028-a1 \
    metaphlan \
        --install \
        --nproc 4 \
        --bowtie2db .

Humann3

This requires 3 databases.

Chocophlan

cd /mnt/efs/databases/Biobakery/Humann/v3.6
docker container run \
    --volume $PWD:$PWD \
    --workdir $PWD \
    --rm \
    458432034220.dkr.ecr.us-west-2.amazonaws.com/biobakery/workflows:maf-20221028-a1 \
        humann_databases \
        --download \
            chocophlan full .

This will create a subdirectory chocophlan, and download and extract the database here.

Uniref

cd /mnt/efs/databases/Biobakery/Humann/v3.6
docker container run \
    --volume $PWD:$PWD \
    --workdir $PWD \
    --rm \
    458432034220.dkr.ecr.us-west-2.amazonaws.com/biobakery/workflows:maf-20221028-a1 \
        humann_databases \
        --download \
        uniref uniref90_diamond .

This will create a subdirectory uniref, and download and extract the database here.

Utility Script Databases

cd /mnt/efs/databases/Biobakery/Humann/v3.6
docker container run \
    --volume $PWD:$PWD \
    --workdir $PWD \
    --rm \
    458432034220.dkr.ecr.us-west-2.amazonaws.com/biobakery/workflows:maf-20221028-a1 \
    humann_databases \
        --download \
        utility_mapping full .

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
assets		assets
bin		bin
conf		conf
docker		docker
modules		modules
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Acknowledgement

nf-reads-profiler

Usage

Example1: preferred usage (--seedfile)

A seedfile example for paired-end samples

Example2: --reads1 and --reads2 flags for paired end sample

Outputs

Cross account test

Databases

Metaphlan4

Humann3

Chocophlan

Uniref

Utility Script Databases

About

Releases

Packages

Contributors 2

Languages

License

FischbachLab/nf-reads-profiler

Folders and files

Latest commit

History

Repository files navigation

Acknowledgement

nf-reads-profiler

Usage

Example1: preferred usage (--seedfile)

A seedfile example for paired-end samples

Example2: --reads1 and --reads2 flags for paired end sample

Outputs

Cross account test

Databases

Metaphlan4

Humann3

Chocophlan

Uniref

Utility Script Databases

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages