Processing chromatin associated RNA-DNA interactions followed by paired-end-tag sequencing (ChRD-PET) data
This pipeline consists of three scripts.
- run.sh:
- Splitting data into DNA and RNA data.
- DNA data alignment to the genome by bwa. RNA data alignment to the genome by hisat2 and bowtie2.
- Extracting the uniq aligned reads.
- run.coverage.sh:
- Convert bam file to BigWig file.
- Generate some profile result.
- run.interaction.sh:
- Merge DNA-RNA interaction according to read ID
- Extraction of high-confidence DNA-RNA interactions using hypergeometric distribution test
-
java8
-
python
-
perl
-
R
-
samtools
-
bowtie2 (bowtie2.2 best)
-
hisat2
-
bwa
-
cutadapt
-
bedtools
-
deeptools
-
HTSeq
I converted both "Chr" and "ChR0" to "chr" and gff files to gtf files.
wget http://rice.hzau.edu.cn/rice_rs1/download_ext/MH63RS1.LNNK00000000.fsa.tar.gz
wget http://rice.hzau.edu.cn/rice_rs1/download_ext/MH63_chr.gff.tar.gz
wget http://rice.hzau.edu.cn/rice_rs1/download_ext/MH63_repeat.gff3.tar.gz
bwa index genome.fa
grep -v ">" genome.fa | perl -lane '{$_=~s/[Nn]+//g;$sum+=length($_);}END{print $sum;}' # genome length
python hisat2-2.1.0/extract_splice_sites.py genome.gtf > genome.ss
python hisat2-2.1.0/extract_exons.py genome.gtf > genome.exon
hisat2-build -p 8 --ss genome.ss --exon genome.exon genome.fa genome
bowtie2-build --threads 8 genome.fa genome
perl -F"\t" -lane 'if($F[2] eq "gene"){$gene=$1 if($F[8]=~/gene_id "(.+?)";/);$type="coding";$name=$gene;print "$F[0]\t".($F[3]-1)."\t$F[4]\t$name\t0\t$F[6]\t$gene\t$type"}' genome.gtf > genome.info
or
perl -lane 'if($F[2] eq "gene"){$tmp=(split(/;/,$F[8]))[0];$gene=(split(/=/,$tmp))[1];print "$F[0]\t".($F[3]-1)."\t$F[4]\t$gene\t.\t$F[6]"}' genome.gff3 | awk '{print $0"\t"$4"\tcoding"}' > genome.info
get rrna region:
get rrna region from MH63_repeat.gff3.tar.gz file (grep "rRNA").
bash run.sh
# mv outfile.DNA.bam outfile.DNA.old.bam && intersectBed -a outfile.DNA.old.bam -b all.rrna.region.bed -v > outfile.DNA.bam
# mv outfile.RNA.bam outfile.RNA.old.bam && intersectBed -a outfile.RNA.old.bam -b all.rrna.region.bed -v > outfile.RNA.bam
bash run.coverage.sh
bash run.interaction.sh
# If your annotation file is in gff format. Please rewrite "-i gene_id" to "-i ID" in run.interaction.sh file.
run.sh result:
*.DNA.fastq # the DNA reads
*.RNA.fastq # the RNA reads
*.DNA.bam # the DNA reads align result
*.RNA.bam # the RNA reads align result
run.coverage.sh result:
*.bw # visualization in IGV
*.RNA.sense.richheatmap.cg.pdf # Just a quick look at, a few regions that need to be eliminated with unusually high expression
run.interaction.sh result:
*.DNARNA.givenanchor.FDRfiltered.txt # interaction. visualization in IGV
java8(1.8.0_161) python(3.7.9) perl(v5.26.2) R(3.5.2)
samtools(1.9) bowtie2(2.3.5.1) hisat2(2.1.0) bwa(0.7.15-r1140)
flash(v1.2.11) cutadapt(2.3) bedtools(v2.27.1) deeptools(3.2.1)
HTSeq(0.11.2)
contact me: [email protected]/[email protected]/[email protected]