This repo contains the code and experiments for paper Tractable and Expressive Generative Models of Genetic Variation Data.
README.md This is this file.
data1kg/ Datasets.
sh/ It includes helper scripts to generate runnable commands.
# julia package
Project.toml This file specifies required julia environment.
src/ The source code for the algorithm.
# runnable julia scripts
env.jl Intalling environment.
preprocess_data.jl Preprocess genome format data.
learn_hclt.jl Learning a HCLT.
# plotting scripts
launch_plotting_figs_SNP_1000G.ipynb Generating plots
- Install Julia 1.8
- Run the following command to install required packages.
julia --project env.jl
Run the following command to preprocess the datasets.
julia --project preprocess_data.jl
- Run the following command to learn hidden Chow-Liu trees on
805
and10K
dataset, and generate artificial genomesjulia --project learn_hclt.jl --datasetname 805 --k_fold 1 --nosplit \ --latents 16 --pseudocount 0.005 --softness 0 --batch_size 1024 \ --save_circuit --dir exp/hclt/805 \ --p1 0 --p2 0.999 --p3 0.99 --p4 0.999 \ --n1 200 --n2 0 --n3 0 --n4 200
julia --project learn_hclt.jl --datasetname 10K --k_fold 1 --nosplit \ --latents 16 --pseudocount 0.005 --softness 0 --batch_size 1024 \ --save_circuit --dir exp/hclt/10K \ --p1 0 --p2 0.999 --p3 0.99 --p4 0.999 \ --n1 200 --n2 0 --n3 0 --n4 200
Run launch_plotting_figs_SNP_1000G.ipynb
to generate plots. This notebook, plotting utils code under src/short
, and samples generated by RMB and GAN are adopted and modified from paper Creating artificial human genomes using generative neural networks.