Skip to content

Latest commit

 

History

History
56 lines (46 loc) · 2.06 KB

README.md

File metadata and controls

56 lines (46 loc) · 2.06 KB

Training Probabilistic Circuits on Genomes Datasets

This repo contains the code and experiments for paper Tractable and Expressive Generative Models of Genetic Variation Data.

Files

README.md           This is this file.
data1kg/            Datasets.
sh/                 It includes helper scripts to generate runnable commands.

# julia package
Project.toml        This file specifies required julia environment.
src/                The source code for the algorithm.

# runnable julia scripts
env.jl              Intalling environment.
preprocess_data.jl  Preprocess genome format data.
learn_hclt.jl       Learning a HCLT.

# plotting scripts
launch_plotting_figs_SNP_1000G.ipynb   Generating plots

Environment

  1. Install Julia 1.8
  2. Run the following command to install required packages.
    julia --project env.jl
    

Datasets

Run the following command to preprocess the datasets.

julia --project preprocess_data.jl

Experiments

  • Run the following command to learn hidden Chow-Liu trees on 805 and 10K dataset, and generate artificial genomes
    julia --project learn_hclt.jl --datasetname 805 --k_fold 1 --nosplit \
        --latents 16 --pseudocount 0.005 --softness 0 --batch_size 1024 \
        --save_circuit --dir exp/hclt/805 \
        --p1 0 --p2 0.999 --p3 0.99 --p4 0.999 \
        --n1 200 --n2 0 --n3 0 --n4 200
    
     julia --project learn_hclt.jl --datasetname 10K --k_fold 1 --nosplit \
         --latents 16 --pseudocount 0.005 --softness 0 --batch_size 1024 \
         --save_circuit --dir exp/hclt/10K \
         --p1 0 --p2 0.999 --p3 0.99 --p4 0.999 \
         --n1 200 --n2 0 --n3 0 --n4 200
    

Plot

Run launch_plotting_figs_SNP_1000G.ipynb to generate plots. This notebook, plotting utils code under src/short, and samples generated by RMB and GAN are adopted and modified from paper Creating artificial human genomes using generative neural networks.