Molecular Autoencoder

This is the code used for the paper:

Automatic chemical design using a data-driven continuous representation of molecules

Abstract: We develop a molecular autoencoder, which converts discrete representations of molecules to and from a vector representation. This allows efficient gradient-based optimization through open-ended spaces of chemical compounds. Continuous representations also allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as interpolating between molecules.

By

bibtex file | slides

Notes

This code requires a fork of Keras that forked from the dev version around approximately version 0.3.2 and Theano > 0.8.2. (Recently, to test on OS X 10.12.2, we are running Theano 0.9.0 dev4) We want to point you to the work of Max Hodak who re-implemented this tool based on the paper. For beginning your own project, you may have greater success starting there. https://github.com/maxhodak/keras-molecules

To test the weights generated in the paper (limited to 5000 test SMILES)

    python sample_autoencoder.py \
        ../data/best_vae_model.json \
        ../data/best_vae_annealed_weights.h5 \
        ../data/250k_rndm_zinc_drugs_clean.smi \
        ../data/zinc_char_list.json \
        -l5000

Which should result is something close to this (values will range from random selection of 5000 samples from test file)

    Using Theano backend.
    ('Training set size is', 5000)
    Training set size is 5000, after filtering to max length of 120
    ('total chars:', 35)
    Loss: 0.834809958935, Accuracy: 0.948206666667

To train a new model (limit of 5000 training SMILES)

    python train_autoencoder.py \
        ../data/250k_rndm_zinc_drugs_clean.smi \
        ../data/zinc_char_list.json \
        -l5000

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
autoencoder		autoencoder
data		data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Molecular Autoencoder

Notes

To test the weights generated in the paper (limited to 5000 test SMILES)

To train a new model (limit of 5000 training SMILES)

About

Releases

Packages

Languages

fazhiyang/molecule-autoencoder

Folders and files

Latest commit

History

Repository files navigation

Molecular Autoencoder

Notes

To test the weights generated in the paper (limited to 5000 test SMILES)

To train a new model (limit of 5000 training SMILES)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages