Python package for generating copy number alteration (CNA), B-allele frequency (BAF/LOH), and gene coverage plot from NGS data
cnaplotr.py - for generating CNV and LOH plots covploter.py - for generating gene coverage plot from NGS data.
Currently optimized for GRCh38/hg38. Work in progress to incorporate GRCh37/hg19
The program requires python v3.8 or higher. The following python packages are required for this program
- pandas - to read the cnr file
- matplotlib and seaborn - generating the CNA plots. The packages can be installed using pip
- To plot gene coverage for NGS panel, it takes BED region coverage data output generated by mosdepth (https://github.com/brentp/mosdepth)
pip install pandas, matplotlib, seaborn
For mosdepth installation and instruction to generate coverage information, please checkout the github repo (https://github.com/brentp/mosdepth)
At the time of development the following version of the libraries were tested.
pandas v1.4.3
matplotlib 3.5.2
seaborn 0.11.2
Installation is simple.
Checkout the github repo and change into the project directory
git clone https://github.com/roysomak4/cnaplotr.git
cd cnaplotr
This folder can be located in any convenient location to execute the script. It is important that all the python files and assets remain in the same directory.
For convenience, a docker image of cnaplotr is available at Gitlab container registry. Use the following commands to download the image and run cnaplotr in a Docker container.
docker pull registry.gitlab.com/roysomak4/genbio_containers/cnaplotr:v0.2-bullseye
docker run --rm -v /path/to/cnvkit_cnr_files:/data registry.gitlab.com/roysomak4/genbio_containers/cnaplotr:v0.2-bullseye bash -c "python3 cnaplotr.py --cnr-file /data/sample1.cnr --output-path /data/output_folder --output-format png --sample-name sample1"
Please update the path to the data folder on the host machine and the appropriate sample name when running on your system.
and execute on the command line like such
python3 cnaplotr.py \
--cnr-file sample1.cnr \
--output-path /path/to/result_folder \
--output-format png \
--sample-name sample1
The details of all options can be viewed using the -h
or --help
flag
python3 cnaplotr.py -h
usage: cnaplotr.py [-h] -i CNR_FILE -o OUTPUT_PATH -f OUTPUT_FORMAT -s
SAMPLE_NAME
options:
-h, --help show this help message and exit
-i CNR_FILE, --cnr-file CNR_FILE
CNR file containing weighted log2 ratio info.
-o OUTPUT_PATH, --output-path OUTPUT_PATH
Output folder to save plot images. This folder must
exist. A 'plots' folder will be created inside the
output path folder.
-f OUTPUT_FORMAT, --output-format OUTPUT_FORMAT
Output file format. Supported types: png, jpg, tiff,
pdf, svg. Default is png.
-s SAMPLE_NAME, --sample-name SAMPLE_NAME
Sample name to include in the chart title
The python program generates the following output
- Image of a whole genome or all chromosome view plot. Depending on the targets sequenced, the plot will show only those chromosomes with sequence data. For exome sequencing or most comprehensive NGS panels, data for almost all or all chromosomes will be displayed.
Example of a whole genome CNA plot of a normal non-FFPE sample (NIST HG-003)
Example of a whole genome CNA plot for a tumor sample (FFPE)
- Per chromosome plot.
This is a beta release of this program. Feedback for bugs and feature requests are most welcome. Please use GitHub issues for putting in your questions.