Pipelines for ChIP-seq analysis, such as peak calling, differential enrichment detection, and pausing index calculation for PolII.
Here are some scripts I use for the analysis of ChIP-seq, after the preprocessing of ChIP-seq. So the PROJECT
, DATA
folders are just the same as the ChIP-seq preprocess
pipeline.
Now the pipelines including:
- peak calling:
- MACS2
- HOMER
- differential enrichment detection:
- diffReps
- pausing index of PolII ChIP-seq.
- bedtools.
- MACS2.
- HOMER. And don't forget install the genome annotation needed for the analysis.
- diffReps.
- region_analysis.
- samtools.
- ggplot2. An R graphic package.
- ChIPpeakAnno. A bioconductor package used for annotation and GO analysis.
Install these softwares or packages and make sure the softwares are in $PATH
.
Put all script in bin
folders to a place in $PATH
or add these folders to $PATH
.
Generally, all these pipelines could be run in this way:
nohup ./A_do.sh &
All parameters or options used in the projects could be edited, in A_do.sh
, to fit the demands before running.
The position of the files in the folder project_script
doesn't matter at all. But I prefer to put them under project/script/pipeline_name
folder.
For the organization of projects, I generally follow this paper: A Quick Guide to Organizing Computational Biology Projects. So $DATA
are the folder contains *.bam
alignment files, while $RESULT
folder are the results.
Important:
- To make comparisons between two conditions work, please name the bam files in this way:
Say condition A, B, each with 2 replicates, and one DNA input per condition.
Name the files as A_rep1.bam, A_rep2.bam, A_input.bam, B_rep1.bam,
B_rep2.bam, and B_input.bam.The key point is to make the same condition
samples with common letters and input samples contain "input" or "Input"
strings. If you use preprocess pipeline in this way, then you just need to edit
the configurations in A_do.sh
There are two peak calling methods used in this pipeline: MACS2 and HOMER.
Pay attention to $STYLE
, don't forget to modify it to the right value.
If the estimation of the shiftsize failed, then you may just use the estimation from
PhantomPeak
step of preprocessing.
Parameters could be adjusted to set the cutoff of the calculation. The intermediate files are kept, and user may remove them after the processing.
It is an experimental pipeline, and the annotation of mouse genome is "borrowed"
from our ngs.plot project. Users may use the
genome they need, and getBEDAnoo.R
could be used here to generate the bed files.