Skip to content

Commit

Permalink
Merge pull request #132 from CCRGeneticsBranch/correlation
Browse files Browse the repository at this point in the history
Add new rules visualizing CARLISLE results
  • Loading branch information
kelly-sovacool authored Jun 12, 2024
2 parents 1437a04 + f6f93e1 commit 235f74e
Show file tree
Hide file tree
Showing 15 changed files with 471 additions and 63 deletions.
7 changes: 5 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
## CARLISLE development version

- Bug fixes (#127, @epehrsson)
- Bug fixes: (#127, @epehrsson)
- Removes single-sample group check for DESeq.
- Increases memory for DESeq.
- Ensures control replicate number is an integer.
- Fixes FDR cutoff misassigned to log2FC cutoff.
- Fixes `no_dedup` variable names in library normalization scripts.
- Containerize rules that require R (`deseq`, `go_enrichment`, and `spikein_assessment`) to fix installation issues with common R library path. (#129, @kelly-sovacool)
- The `Rlib_dir` and `Rpkg_config` config options have been removed as they are no longer needed.
- New visualizations: (#132, @epehrsson)
- New rules `cov_correlation`, `homer_enrich`, `combine_homer`, `count_peaks`
- Add peak caller to MACS2 peak xls filename
- New parameters in the config file to make certain rules optional: (#133, @kelly-sovacool)
- GO enrichment is controlled by `run_go_enrichment` (default: `false`)
- rose is controlled by `run_rose` (default: `false`)
- ROSE is controlled by `run_rose` (default: `false`)
- Fig bug that added nonexistent directories to the singularity bind paths. (#135, @kelly-sovacool)

## CARLISLE v2.5.0
Expand Down
2 changes: 1 addition & 1 deletion config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -164,4 +164,4 @@ adapters: "PIPELINE_HOME/resources/other/adapters.fa"
#####################################################################################
containers:
base: "docker://nciccbr/ccbr_ubuntu_base_20.04:v6"
carlisle_r: "docker://nciccbr/carlisle_r:v1"
carlisle_r: "docker://nciccbr/carlisle_r:v2"
12 changes: 6 additions & 6 deletions docker/carlisle_r/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ ARG REPONAME="000000"
ENV REPONAME=${REPONAME}

# install conda packages
COPY packages.txt /data2/
RUN mamba install \
--no-channel-priority \
-c bioconda -c conda-forge -c r \
--file /data2/packages.txt
ENV PATH="/opt2/conda/bin:$PATH"
COPY environment.yml /data2/
ENV CONDA_ENV=carlisle
RUN mamba env create -n ${CONDA_ENV} -f /data2/environment.yml && \
echo "conda activate ${CONDA_ENV}" > ~/.bashrc
ENV PATH="/opt2/conda/envs/${CONDA_ENV}/bin:$PATH"
ENV R_LIBS_USER=/opt2/conda/lib/R/library/

# install ELBOW manually, fails with mamba
RUN wget --no-check-certificate https://bioconductor.riken.jp/packages/3.4/bioc/src/contrib/ELBOW_1.10.0.tar.gz && \
R -e 'install.packages("ELBOW_1.10.0.tar.gz", repos = NULL, type="source", INSTALL_opts = "--no-lock")'
Expand Down
38 changes: 38 additions & 0 deletions docker/carlisle_r/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
channels:
- bioconda
- conda-forge
- r
dependencies:
- bioconductor-bsgenome.hsapiens.ncbi.t2t.chm13v2.0
- bioconductor-chipenrich
- bioconductor-chipseeker
- bioconductor-ComplexHeatmap
- bioconductor-deseq2
- bioconductor-edger
- bioconductor-enhancedvolcano
- bioconductor-genomicfeatures
- bioconductor-htsfilter
- bioconductor-org.Hs.eg.db
- bioconductor-org.Mm.eg.db
- bioconductor-rtracklayer
- bioconductor-txdb.hsapiens.ucsc.hg19.knowngene
- bioconductor-TxDb.Hsapiens.UCSC.hg38.knownGene
- bioconductor-TxDb.Mmusculus.UCSC.mm10.knownGene
- deeptools
- r-argparse
- r-circlize
- r-DT
- r-ggfortify
- r-ggvenn
- r-htmltools
- r-latticeextra
- r-openxlsx
- r-pander
- r-pdp
- r-plotly
- r-plyr
- r-rcolorbrewer
- r-reshape2
- r-tidyverse
- r-xfun>=0.43
- r-yaml
2 changes: 1 addition & 1 deletion docker/carlisle_r/meta.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
dockerhub_namespace: nciccbr
image_name: carlisle_r
version: v1
version: v2
container: "$(dockerhub_namespace)/$(image_name):$(version)"
28 changes: 0 additions & 28 deletions docker/carlisle_r/packages.txt

This file was deleted.

2 changes: 2 additions & 0 deletions resources/cluster_biowulf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,5 @@ DESeq:
time: 00-02:00:00
threads: 2
###################################################################
cov_correlation:
threads: 4
32 changes: 31 additions & 1 deletion workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,26 @@ def get_motifs(wildcards):
files.extend(anno_s)
return files

def get_enrich(wildcards):
files=[]
if ("macs2_narrow" in PEAKTYPE) or ("macs2_broad" in PEAKTYPE):
anno_m=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","homer","enrichment.{dupstatus}.{peak_caller_type}.png"),peak_caller="macs2",qthresholds=QTRESHOLDS,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_M),
files.extend(anno_m)
if ("gopeaks_narrow" in PEAKTYPE) or ("gopeaks_broad" in PEAKTYPE):
anno_g=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","homer","enrichment.{dupstatus}.{peak_caller_type}.png"),peak_caller="gopeaks",qthresholds=QTRESHOLDS,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_G),
files.extend(anno_g)
if ("seacr_stringent" in PEAKTYPE) or ("seacr_relaxed" in PEAKTYPE):
anno_s=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","homer","enrichment.{dupstatus}.{peak_caller_type}.png"),peak_caller="seacr",qthresholds=QTRESHOLDS,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_S),
files.extend(anno_s)
return files

def get_combined(wildcards):
files = []
if ("macs2_narrow" in PEAKTYPE) or ("macs2_broad" in PEAKTYPE):
combined_m = expand(join(RESULTSDIR,"peaks","{qthresholds}","macs2","annotation","homer","{treatment_control_list}.{dupstatus}.{peak_caller_type}.annotation_qvalue.tsv"),qthresholds=QTRESHOLDS,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_M,treatment_control_list=TREATMENT_LIST_M)
files.extend(combined_m)
return files

def get_rose(wildcards):
files=[]
if config['run_rose']:
Expand Down Expand Up @@ -200,6 +220,12 @@ rule all:
# ALIGN / deeptools_heatmap
unpack(run_deeptools_heatmap),

# ALIGN / deeptools_corr
expand(join(RESULTSDIR,"deeptools","all.{dupstatus}.PearsonCorr.tab"),dupstatus=DUPSTATUS),
expand(join(RESULTSDIR,"deeptools","all.{dupstatus}.PCA.tab"),dupstatus=DUPSTATUS),
expand(join(RESULTSDIR,"deeptools","all.{dupstatus}.Pearson_heatmap.png"),dupstatus=DUPSTATUS),
expand(join(RESULTSDIR,"deeptools","all.{dupstatus}.PearsonPCA.png"),dupstatus=DUPSTATUS),

# ALIGN / create_library_norm_scales
unpack(run_library_norm),

Expand All @@ -213,6 +239,8 @@ rule all:
unpack(run_macs2),
unpack(run_seacr),
unpack(run_gopeaks),
join(RESULTSDIR,"peaks","Peak counts.xlsx"),
join(RESULTSDIR,"peaks","all.peaks.txt"),

# QC
unpack(run_qc),
Expand All @@ -222,6 +250,8 @@ rule all:

# ANNOTATION / findMotif, rose, create_contrast_peakcaller_files, go_enrichment
unpack(get_motifs),
unpack(get_enrich),
unpack(get_combined),
unpack(get_rose),
unpack(get_enrichment)

Expand Down Expand Up @@ -292,4 +322,4 @@ onerror:
expand(join(RESULTSDIR,"deeptools","clean", "{group}.{dupstatus}.metagene.mat.gz"),group=[a+b for a in TREATMENTS+["all_samples"] for b in ["", ".prot"]],dupstatus=DUPSTATUS),
expand(join(RESULTSDIR,"deeptools","clean", "{group}.{dupstatus}.TSS.mat.gz"),group=[a+b for a in TREATMENTS+["all_samples"] for b in ["", ".prot"]],dupstatus=DUPSTATUS),
expand(join(RESULTSDIR,"deeptools","clean", "{group}.{dupstatus}.geneinfo.bed"),group=[a+b for a in TREATMENTS+["all_samples"] for b in ["", ".prot"]],dupstatus=DUPSTATUS),
"""
"""
46 changes: 45 additions & 1 deletion workflow/rules/align.smk
Original file line number Diff line number Diff line change
Expand Up @@ -468,4 +468,48 @@ rule deeptools_heatmap:
plotHeatmap -m {input.TSSmat} -out {output.TSSheat} --colorMap 'RdBu_r' --zMin auto --zMax auto --yAxisLabel 'average RPGC' --regionsLabel 'genes' --legendLocation 'none'
plotProfile -m {input.metamat} -out {output.metaline} --plotHeight 15 --plotWidth 15 --perGroup --yAxisLabel 'average RPGC' --plotType 'se' --legendLocation upper-right
plotProfile -m {input.TSSmat} -out {output.TSSline} --plotHeight 15 --plotWidth 15 --perGroup --yAxisLabel 'average RPGC' --plotType 'se' --legendLocation upper-left
"""
"""

rule cov_correlation:
"""
Create replicate correlation plots from filtered BAM files
"""
input:
bams=expand(join(RESULTSDIR,"bam","{replicate}.{{dupstatus}}.bam"),replicate=REPLICATES),
align_table=join(RESULTSDIR,"alignment_stats","alignment_stats.tsv")
output:
counts=join(RESULTSDIR,"deeptools","all.{dupstatus}.readCounts.npz"),
pearson_corr=join(RESULTSDIR,"deeptools","all.{dupstatus}.PearsonCorr.tab"),
pearson_plot=join(RESULTSDIR,"deeptools","all.{dupstatus}.PearsonCorr.png"),
pca=join(RESULTSDIR,"deeptools","all.{dupstatus}.PCA.tab"),
hc=join(RESULTSDIR,"deeptools","all.{dupstatus}.Pearson_heatmap.png"),
pca_format=join(RESULTSDIR,"deeptools","all.{dupstatus}.PearsonPCA.png")
params:
rscript=join(SCRIPTSDIR,"_plot_correlation.R"),
dupstatus="{dupstatus}"
container: config['containers']['carlisle_r']
threads: getthreads("cov_correlation")
shell:
"""
# Calculate genome-wide coverage
multiBamSummary bins \
--bamfiles {input.bams} \
--smartLabels \
-out {output.counts} \
-p {threads}
# Plot heatmap - Pearson
plotCorrelation \
-in {output.counts} \
--corMethod pearson --skipZeros \
--whatToPlot heatmap --plotNumbers \
-o {output.pearson_plot} \
--removeOutliers \
--outFileCorMatrix {output.pearson_corr}
# Plot PCA
plotPCA -in {output.counts} --outFileNameData {output.pca}
# Plot heatmap and PCA (formatted)
Rscript {params.rscript} {output.pearson_corr} {output.pca} {input.align_table} {params.dupstatus} {output.hc} {output.pca_format}
"""
Loading

0 comments on commit 235f74e

Please sign in to comment.