Skip to content

Python version of CRISPRcleanR: An R package for unsupervised identification and correction of gene independent cell responses to CRISPR-cas9 targeting

License

Notifications You must be signed in to change notification settings

cancerit/pyCRISPRcleanR

Repository files navigation

pyCRISPRcleanR

Master Develop
Master Badge Develop Badge

This is python implementation CRISPRcleanR package for unsupervised identification and correction of gene independent cell responses to CRISPR-cas9 targeting

Design

Uses DNAcopy R pcakage to perform CBS[ Circular Binary Segmentation of count data ]

Tools

pyCRISPRcleanR has multiple commands, listed with pyCRISPRcleanR --help.

pyCRISPRcleanR

Takes the input count data, library file and other associated files/parameters The output is tab separated files for normalised fold changes and inverse transformed corrected treatment counts

Various exceptions can occur for malformed input files.

inputFormat

  • gRNA Counts file: tab separated file containing following fields
  • sgRNA gene <control_count 1...N> <sample_count 1..N>
  • sgRNA library file format
  • sgRNA gene chr start end

outputFormat

results.html file is generated in the user supplied output folder. This file contains short description and links for all the result files/folders generated during an analysis workflow.

Tab separated output files

[please note the number prefix to a file name are in the order of files generated by script and help with grouping similar files]:

  1. 01_normalised_counts.tsv
  • sgRNA: guideRNA
  • gene: gene name as defined in the library file
  • <control sample count:normalised 1..N> : Normalised count
  • <treatment sample count: normalised 1..N> : Normalised count
  1. 02_normalised_fold_changes.tsv
  • sgRNA: guideRNA
  • gene: gene name as defined in the library file
  • <treatment sample fold chages: fold changes 1..N>
  • avgFC: average fold change values
  1. 03_crispr_cleanr_corrected_counts.tsv [ generated only when --crispr_cleanr flag is set ]
  • sgRNA: guideRNA
  • gene: gene name as defined in the library file
  • <control sample count:corrected 1..N> : corrected count
  • <treatment sample count:corrected 1..N >: corrected count
  1. 04_crispr_cleanr_fold_changes.tsv [ generated only when --crispr_cleanr flag is set ]
  • sgRNA: guideRNA
  • gene: gene name as defined in the library file
  • <treatment sample fold chages: fold changes 1..N>
  • avgFC: average fold change values
  1. 05_alldata.tsv [ generated only when --crispr_cleanr flag is set ]
  • sgRNA: guideRNA
  • <control sample count: raw 1..N> : raw count
  • <treatment sample count: raw 1..N> : raw count
  • gene: gene name as defined in the library file
  • chr: Chromosome name
  • start: gRNA start position
  • end: gRNA end position
  • <control sample count:normalised 1..N> : Normalised count (postfixed _nc)
  • <treatment sample count: normalised 1..N> : Normalised count (postfixed _nc)
  • avgFC: average fold change values
  • BP: Base pair location ( used for DNAcopy analysis)
  • correction: correction factor
  • correctedFC: corrected foldchange values
  • <control sample count:corrected 1..N> : corrected count (postfixed _cc)
  • <treatment sample count:corrected 1..N >: corrected count (postfixed _cc)
  • <treatment sample fold chages: fold changes 1..N> (postfixed _cf)
  • avgFC_cf: average fold change values based on corrected counts
  1. mageckOut [ generated only whem --run_mageck flag is set, produces folder containing mageck output for normalised and/or CRISPRcleanR corrected counts]

  2. bagelOut [ generated only whem --run_bagel flag is set, produces folder containing bagel output for normalised and/or CRISPRcleanR corrected counts]

Plotly and pdf plots

  1. plots based on raw sgRNA counts
  • 01_raw_counts_boxplot.html
  • 01_raw_counts_histogram.html
  • 01_raw_counts_correlation_matrix.html
  1. plots based on normalised sgRNA counts
  • 02_normalised_counts_boxplot.html
  • 02_normalised_counts_histogram.html
  • 02_normalised_counts_correlation_matrix.html
  1. plots based on fold changes
  • 03_fold_changes_boxplot.html
  • 03_fold_changes_histogram.html
  • 03_fold_changes_correlation_matrix.html
  1. stats plots: precision recall and ROC curves based on known tru positive sgRNA/gene set [generated only when --gene_signatures flag is set]
  • 04_pr_rc_curve_sgRNA.html
  • 04_roc_curve_sgRNA.html
  • 05_pr_rc_curve_gene.html
  • 05_roc_curve_gene.html
  • 06_depletion_profile_genes.html
  1. plots based on CRISPRcleanR corrected counts
  • 07_CRISPRcleanR_corrected_count_boxplot.html
  • 07_CRISPRcleanR_corrected_count_histogram.html
  • 07_CRISPRcleanR_corrected_count_correlation_matrix.html
  1. plots based on CRISPRcleanR corrected fold chnages
  • 08_CRISPRcleanR_corrected_fold_changes_boxplot.html
  • 08_CRISPRcleanR_corrected_fold_changes_histogram.html
  • 08_CRISPRcleanR_corrected_fold_changes_correlation_matrix.html
  1. 09_Raw_vs_postCRISPRcleanR_segmentation_fold_changes.pdf [generated only when --crispr_cleanr flag is set]

  2. Other informative plots

  • 10_density_plots_pre_and_post_CRISPRcleanR.html [generated only when --crispr_cleanr flag is set]
  • 11_impact_on_phenotype_barchart.html [generated only when --run_mageck flag is set]
  • 11_impact_on_phenotype_piechart.html [generated only when --run_mageck flag is set]

INSTALL

Installing via pip install. Simply execute with the path to the compiled 'whl' found on the release page:

pip install pyCRISPRcleanR.X.X.X-py3-none-any.whl

Release .whl files are generated as part of the release process and can be found on the release page

Package Dependancies

pip will install the relevant dependancies, listed here for convenience, please refer requirements.txt for versions:

R packages

  • DNAcopy R packages is required to run pyCRISPRcleanR. To facilitate the install process there is a script Rsupport/libInstall.R that can be run to build this for you.

Alternatively you can run:

cd Rsupport
./setupR.sh path_to_install_to

Appending 1 to the command to request a complete local build of R (3.3.0).

Development environment

This project uses git pre-commit hooks. As these will execute on your system it is entirely up to you if you activate them.

If you want tests, coverage reports and lint-ing to automatically execute before a commit you can activate them by running:

git config core.hooksPath git-hooks

Only a test failure will block a commit, lint-ing is not enforced (but please consider following the guidance).

You can run the same checks manually without a commit by executing the following in the base of the clone:

./run_tests.sh

Development Dependencies

Setup VirtualEnv

cd $PROJECTROOT
hash virtualenv || pip3 install virtualenv
virtualenv -p python3 env
source env/bin/activate
python setup.py develop # so bin scripts can find module

For testing/coverage (./run_tests.sh)

source env/bin/activate # if not already in env
pip install pytest
pip install pytest-cov

Also see Package Dependancies

Cutting a release

Make sure the version is incremented in ./setup.py

Install via .whl (wheel)

Generate .whl

source env/bin/activate # if not already
python setup.py bdist_wheel -d dist

Install .whl

# this creates an wheel archive which can be copied to a deployment location, e.g.
scp dist/pyCRISPRcleanR.X.X.X-py3-none-any.whl user@host:~/wheels
# on host
pip install --find-links=~/wheels pyCRISPRcleanR

Reference

Iorio F, Behan FM, Gonçalves E, Bhosle SG, Chen E, Shepherd R, Beaver C, Ansari R, Pooley R, Wilkinson P, Harper S, Butler AP, Stronach EA, Saez-Rodriguez J, Yusa K, Garnett MJ. Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting. BMC Genomics. 2018 Aug 13;19(1):604. doi: 10.1186/s12864-018-4989-y.

About

Python version of CRISPRcleanR: An R package for unsupervised identification and correction of gene independent cell responses to CRISPR-cas9 targeting

Resources

License

Stars

Watchers

Forks

Packages

No packages published