Skip to content

Commit

Permalink
minor
Browse files Browse the repository at this point in the history
  • Loading branch information
shiwentao00 committed Feb 27, 2021
1 parent ee4b718 commit 968cab1
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Molecule-RNN
Molecule-RNN is a recurrent neural network built with Pytorch to generate molecules for drug discovery. It is trained with the [Zinc](https://zinc.docking.org/) dataset.
Molecule-RNN is a recurrent neural network built with Pytorch to generate molecules for drug discovery. It is trained with the [Zinc](https://zinc.docking.org/) dataset. The [SELFIES](https://github.com/aspuru-guzik-group/selfies) is used as the representation of molecules. The SMILES files are converted to SELFIES during training on-the-fly.

## Training
1. Dowdload the SMILES files of molecules from [Zinc](https://zinc.docking.org/). Select "SMILES(*.smi)" and "Flat" opitions when downloading.
Expand All @@ -14,7 +14,15 @@ python train.py
The training loss:

## Sampling
We can generate molecules by sampling the model according to the output distribution.
We can generate molecules by sampling the model according to the output distribution.
```
python sample.py
```
```
The sampled output is in the format of SELFIES:
```
```
Note that the SELFIES is an [automata](https://en.wikipedia.org/wiki/Automata_theory), and it terminates when there is no chemical bonds to build. So the converted SMILES could be shorter than the SELFIES:
```
```

The advantage of SELFIES is that the output is always a valid molecule:

0 comments on commit 968cab1

Please sign in to comment.