Molecule-RNN is a recurrent neural network built with Pytorch to generate molecules for drug discovery.
There are different ways to tokenize SMILES, 3 of them are implemented in this project:
- Character-level tokenization.
- Regular expression-based tokenization.
- SELFIES tokenization.
The chembl28 dataset is used.
-
Modify the path of dataset in
train.yaml
to your downloaded dataset by setting the value ofdataset_dir
. -
Run the training script.
python train.py
We can generate molecules by sampling the model according to the output distribution.
python sample.py