This repository contains training and sampling code for the paper: Grammar Variational Autoencoder.
To create the molecule datasets, call in this directory:
unzip ../rockyou-processed.zip -d data/
python make_zinc_dataset_grammar.py
To train the models, call:
python train_zinc.py % the grammar model
python train_zinc.py --latent_dim=2 --epochs=50` % train a model with a 2D latent space and 50 epochs
The file molecule_vae.py can be used to encode and decode SMILES strings. For a demo run:
python encode_decode_zinc.py
python encode_decode_zinc.py --latent_dim=2 --epochs=50 --num_samples=200 % sample 200 random strings from the model trained with a 2D latent space and 50 epochs
-
Model is defined in
models/model_zinc.py
. Modify_buildEncoder
and_encoderMeanVar
in a similar manner to change the model i.e. add same additonal layers to both functions. -
The folder
results/
contains some trained models which can be samples using the above commands. Note that the arguments need to be fed appropriately to the files. -
Some samples already generated from the modles are in
samples/
.