Skip to content

Latest commit

 

History

History
111 lines (73 loc) · 3.29 KB

README.md

File metadata and controls

111 lines (73 loc) · 3.29 KB

Single-cell Nano CUT&Tag InDrop pipeline

This data engineering pipeline is designed to treat single-cell CUT&Tag InDrop with protocol design from the Institut Curie SingleCellOmics Platform and Vallot Lab. The dataset starts from raw BCL sequencing files to exploitable output such as count matrices, coverage file and report.

Simplifed scheme of the pipeline

The Pipeline

  • 0. Creation of config file
  • 1. Cell barcode mapping
  • 2. Trimming
  • 3. Genomic mapping
  • 4. Assignation of cell barcodes to mapped read
  • 5. Removal of Reverse Transcription (RT) & Polymerase Chain Reaction (PCR) duplicates
  • 6. Removal of reads based on window screening (if Read2 was unmapped)
  • 7. Counting (Generation of count matrix)
  • 8. Generation of coverage file (bigwig)
  • 9. Reporting
  • 10. Downstream R automatic analysis

Running the Pipeline

usage : schip_processing All -f FORWARD -r REVERSE -o OUTPUT -c CONFIG [-d] [-h] [-v]


[Sub-Commands] 

	All		 Execute the entire pipeline based on CONFIG file
	
	GetConf		 [PreRun] Complete a configuration template based on the genome assembly and the design type
	
	--version : print version



---------------

All

   -f|--forward R1_READ: forward fastq file
   -r|--reverse R2_READ: forward fastq file
   -c|--conf CONFIG: configuration file for ChIP processing
   -o|--output OUTPUT: output folder
   -n|--name NAME: name given to samples
   -s|--downstreamOutput R analysis downstream output: if present, will run downstream analysis in given dir
   -u|--override : Override defined arguments (semicolon-separated (;)) from config file (i.e: 'MIN_MAPQ=0;MIN_BAPQ=10') [optional]
   [-d|--dryrun]: dry run mode
   [-h|--help]: help
   [-v|--version]: version

   
   GetConf
   
   	-T/--template : Pipeline config template
	-C/--configFile : Config description file
	-D/--designType : Design type
	-G/--genomeAssembly : Genome assembly
	-O/--outputConfig : Output config file
	-O/--mark : Histone mark : either 'h3k27me3', 'h3k4me3' or 'unbound'. 
	-B/--targetBed : Target BED file

Creating the configuration file from the template :

Depending on your Bead type (Hifibio or LBC), your genomeAssembly (hg38, mm10), your bed target file.

Example :

cd ~/GitLab/ChIP-seq_single-cell_LBC_PAIRED_END_3.4/
ASSEMBLY=hg38
OUTPUT_CONFIG=/data/tmp/gjouault/results/CONFIG_LBC
MARK=h3k27me3
TARGET_BED=/data/users/gjouault/Annotation/bed/hg38.G5k.bed

./schip_processing.sh GetConf --template  CONFIG_TEMPLATE --configFile species_design_configs.csv --designType LBC --genomeAssembly ${ASSEMBLY} --outputConfig ${OUTPUT_CONFIG} --mark ${MARK} --targetBed ${TARGET_BED}

Running the pipeline on a single sample :

OUTPUT_DIR=/data/tmp/gjouault/results/test
DOWNSTREAM_DIR=/data/tmp/gjouault/results/
NAME=test
READ1=/data/users/gjouault/tests/A1082C1.R1.fastq.gz
READ2=/data/users/gjouault/tests/A1082C1.R2.fastq.gz

./schip_processing.sh All -f ${READ1} -r ${READ2} -c ${OUTPUT_CONFIG}  -o ${OUTPUT_DIR} --name ${NAME} -s ${DOWNSTREAM_DIR}

Authors

Authors - Gregoire Jouault ([email protected]), Pacôme Prompsy, Nicolas Servant date - 04 september 2023