DirectMultiStep: Direct Route Generation for Multi-Step Retrosynthesis

Overview

The preprint for this work is posted on arXiv.

Data/process.py scripts to preprocess PaRoutes dataset and create training and evaluation partitions.
Models/Architecture.py contains definitions of Encoder, Decoder, and combining Seq2Seq module.
Models/Training.py definition of Lightning Training class
Models/Configure.py definiton of model config
Models/Generation.py implementation of beam search using python lists
Models/TensorGen.py implementation of beam search using torch.Tensors to maximize GPU efficiency. Warning: the current algorithm works properly only with batch_size=1 inputs (PRs welcome).
Utils/Dataset.py definition of custom torch Datasets used for training and evaluation.
Utils/PreProcess.py all functions related to preprocessing of the PaRoutes dataset (used by Data/process.py)
Utils/PostProcess.py all functions needed to postprocess results of beam search and run evaluations
Utils/Visualize.py function that draws the synthesis tree as a pdf

For training see:

train_nosm.py - w/o SM provided to encoder
train_wsm.py - w/ SM provided to encoder

Once everything is set up, it's suffice to simply run python train_wsm.py.

Run bash download_ckpts.sh to download our checkpoints from the file storage.

Finally, we provide assess_single.py which allows to run our model on a single target compound.

Tutorials

To use the tutorials, simply move/copy them to the root directory. This is necessary because the notebooks use relative imports.

Tutorials/Basic_Usage.ipynb walks you through how to input your compounds, steps, and starting materials. Visualization of routes in PDF is shown.
Tutorials/Route_Separation.ipynb reproduces the route separation results from the paper.
Tutorials/Pharma_Compounds.ipynb reproduces the three FDA-approved drug results from the paper.

Licenses

All code is licensed under MIT License. The content of the pre-print on arXiv is licensed under CC-BY 4.0.

TODO

Bring codecov to 80+.
Revise Models/TensorGen.py so that it can work with batch size greater than 1.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
Data		Data
DirectMultiStep		DirectMultiStep
Tutorials		Tutorials
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
assess_single.py		assess_single.py
codecov.yml		codecov.yml
download_ckpts.sh		download_ckpts.sh
train_nosm.py		train_nosm.py
train_wsm.py		train_wsm.py
visualize_tree.py		visualize_tree.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DirectMultiStep: Direct Route Generation for Multi-Step Retrosynthesis

Overview

Tutorials

Licenses

TODO

About

Releases

Packages

Contributors 2

Languages

License

batistagroup/DirectMultiStep

Folders and files

Latest commit

History

Repository files navigation

DirectMultiStep: Direct Route Generation for Multi-Step Retrosynthesis

Overview

Tutorials

Licenses

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages