██████╗ ███████╗███████╗██████╗ ██████╗ ██╗ ██╗ ██████╗ ██╔══██╗██╔════╝██╔════╝██╔══██╗██╔══██╗██║ ██║██╔═══██╗ ██║ ██║█████╗ █████╗ ██████╔╝██████╔╝███████║██║ ██║ ██║ ██║██╔══╝ ██╔══╝ ██╔═══╝ ██╔══██╗██╔══██║██║ ██║ ██████╔╝███████╗███████╗██║ ██║ ██║██║ ██║╚██████╔╝ ╚═════╝ ╚══════╝╚══════╝╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝ v2.0
DeepRho: software accompanyment for "DeepRho: Accurate Estimation of Recombination Rate from Inferred Genealogies using Deep Learning", Haotian Zhang and Yufeng Wu, manuscript, 2021.
DeepRho constructs images from population genetic data and takes advantage of the power of convolutional neural network (CNN) in image classification to etstimate recombination rate. The key idea of DeepRho is generating genetics-informative images based on inferred gene geneaologies and linkage disequilibrium from population genetic data.
deeprho
is an open-source software developed for per-base recombination rate estimation from inferred genealogies using deep learning. deeprho
makes estimates based on LD patterns and local genealogical trees inferred by RENT+.
- OS: Linux, Windows, MacOS
- Software: Conda
- Device: CUDA-Enabled GPU (optional, default set to use CPU)
- Clone from GitHub:
git clone https://github.com/haotianzh/deeprho_v2.git
or download & unzip the file to your local directory. - Enter root directory:
cd deeprho_v2
- Create a virtual environment through conda:
conda create -n deeprho python=3.7 openjdk=11 msprime
- Activate conda environment:
conda activate deeprho
- Install:
pip install .
- Validate:
deeprho -v
- [Optional] see GPU support if you are seeking to use GPU
- ms-formatted input (the first line is position (seperated by space) followed by haplotype sequences, check
examples/data.ms
for details) - VCF file (check
examples/data.vcf
)
-
# save a precalculated lookup table for a user provided demography deeprho maketable --demography examples/YRI_pop_sizes.csv --out YRI_pop_table
-
# estimate recombination rates deeprho estimate --file examples/example_YRI.vcf --ploidy 2 --table YRI_pop_table --num-thread 8 --plot --verbose
-
demography is a
# generate a test case under a given evolutionary setting deeprho test --demography examples/YRI_pop_sizes.csv --rate-map examples/test_recombination_map.txt --npop 50 --ploidy 2 --out test.vcf
.csv
file which contains at least three columnslabel
,x
(time) andy
(size).label
is the population name which should have only one population in a single file,time
is measured in generation, seeexamples/ACB_pop_sizes.csv
for example.
Default output name is formatted as <FILE>.rate[.txt|.png|.npy]
in the same directory as your input.
-
.txt
file consists of 3 columnsStart
,End
andRate
seperated by tab. a simple output likes:# your_vcf_file_name.rate.txt Start End Rate 0 8 0.0 8 1822 2.862294427352283e-08 1822 4321 2.3297465959039865e-08 4321 7125 1.6098357471351787e-08 7125 10570 4.027717518356611e-09 10570 14312 2.1394376828669226e-09 14312 17689 2.2685986706092933e-09 17689 19928 1.6854787948356243e-09
-
.png
file shows a simple plot of estimated recombination map. -
.npy
file stores andarray
object recording recombination rate per base, the i-th element of thendarray
denotes the rate from base i to base (i+1).
GPU Support (more)
- First check if your graphics card is CUDA-enabled.
- Check compatibility table to find appropriate python, tensorflow, CUDA, cuDNN version combo.
- Install
cudatoolkit
andcudnn
:conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
- (For Linux) Set env:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
(have to do this step every time you restart the session) - Verify install:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
-
deeprho maketable [-h] [--ne NE] [--demography DEMOGRAPHY] [--npop NPOP] [--ploidy PLOIDY] [--rmin RMIN] \ [--rmax RMAX] [--repeat REPEAT] [--draw DRAW] [--num-thread NUM_THREAD] [--verbose]
Arguments Descriptions --ploidy <PLOIDY>
Ploidy (default 2) --ne <NE>
Effective population size (default 105) --demography <DEMOGRAPHY>
Demography file if no lookup table provided --npop <NPOP>
Number of individuals or samples --num-thread <NUMTHREAD>
Number of workers for parallel (default 4) --rmin <RMIN>
Min of recombination rate per base per generation --rmax <RMAX>
Max of recombination rate per base per generation --repeat <REPEAT>
Number of repeats in simulation --draw <DRAW>
Number of repeats in simulation --verbose
Show loggings in console --help, -h
Show usage -
deeprho estimate [-h] [--file FILE] [--length LENGTH] [--ne NE] [--ploidy PLOIDY] [--res RES] \ [--threshold THRESHOLD] [--gws GWS] [--ws WS] [--ss SS] [--m1 MODEL_FINE] \ [--m2 MODEL_LARGE] [--num-thread NUM_THREAD] [--plot] [--savenp] [--verbose]
Arguments Descriptions --file <FILE>
Input file --ploidy <PLOIDY>
Ploidy (default 1) --ne <NE>
Effective population size (default 105) --demography <DEMOGRAPHY>
Demography file if no lookup table provided --gws <GWS>
Window size for inferring genealogy (default 103 SNPs) --ws <WS>
Window size for performing deeprho
(fixed at 50 SNPs)--ss <SS>
Step size for performing deeprho
(default as 25 SNPs)--length <LENGTH>
Length of chromosome --m1 <MODELFINE>
Path of fine model --m2 <MODELLARGE>
Path of large model --threshold <THRESHOLD>
Threshold of recombination Hotspot (default 5x10-8) --savenp
Save estimated rates as numpy ndarray (saved as <FILE>.out.npy
)--plot
Plot recombination map (saved as <FILE>.out.png
)--num-thread <NUMTHREAD>
Specify number of workers for parallel (default 4) --verbose
Show loggings in console --help, -h
Show usage <LENGTH>
can be either explicitly specified or inferred from input, if the latter,<LENGTH>
= Sn-S1, where Sn is physical position of the last SNP site, S1 is the position of the first SNP site.<MODELFINE>, <MODELLARGE>
are two pretrained-models,deeprho
takes two-stages strategies to estimate recombination rate,<MODELFINE>
is applied for estimating recombination background regions while<MODELLARGE>
is used to fine-tune hotspot regions. two default models with a constant demographic model are included in this repo, users are also allowed to train their own models through following sections.<THRESHOLD>
defines a threshold above which a region can be regarded as a hotspot. 5x10-8 is set as default.<GWS>
guides how large region the genealogies are inferred from. As our test, 1000 is a great choice to include as much information as possible for improving local genealogical inference.
-
deeprho test [-h] [--ne NE] [--demography DEMOGRAPHY] [--npop NPOP] [--ploidy PLOIDY] [--rate-map RATEMAP] \ [--recombination-rate RATE] [--sequence-length LENGTH] [--num-thread NUM_THREAD] [--verbose]
Arguments Descriptions --ploidy <PLOIDY>
Ploidy (default 2) --ne <NE>
Effective population size (default 105) --demography <DEMOGRAPHY>
Demography file if no lookup table provided --npop <NPOP>
Number of individuals or samples --sequence-length <LENGTH>
Length of simulated genome --recombination-rate <RRATE>
Recombination rate --rate-map <RATEMAP>
Recombination rate map --mutation-rate <MRATE>
Mutation rate (default as 2.5x10-8) --help, -h
Show usage -
Demography settings: there are some software used for inferring demographic history, such as PSMC, SMC++, MSMC. Here we take SMC++ output as our input but only contains one population, get more information about SMC++ output.
TIPS: If you are not familiar with these parametric settings, just leave them as default if possible.
Feel free to shoot us at [email protected].