Skip to content

Generating drug-like molecules from gene expression signatures using transformer

License

Notifications You must be signed in to change notification settings

sundar7D0/transformers-for-drug-discovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Generating Drug-like Molecules from Gene Expression Signatures using Transformers

The chemical space of drug-discovery is very large and discrete. Screening through this space for molecules that satisfy biological and pharmacokinetic properties such as stability, solubility, efficacy, affinity and permeability poses a highly complex multiobjective optimization problem. Precisely, our Transformer model with modified encoder architecture is well suited for translating the information contained in high-throughput biological data into instances in the chemical space.

Key Contributions:

  • We show that attention-based sequential prediction performs better and converges faster, by well attending to previously predicted outputs and encoded gene expression signature.
  • Moreover, the model automatically learns the structural and chemical characteristics during training, which is evident by visually inspecting the common scaffolds in the generated and the actual compounds.
  • By incorporating biological information in the form of altered gene expression, we have outperformed other DL based molecular generators in terms of validity, uniqueness and metrics like Synthetic Accessibility score and Tanimoto similarity with the known compound.

Altogether, our method can not only help in accelerating the early stage of drug discovery but can also aid in drug repurposing. This work, done under the guidance of Prof. Manikandan Narayanan, is accepeted as a poster at 'ML for Computation Biology' track at ISMB22.

Dependencies

  1. Installing RDKit
  2. python 3.6+
  3. tensorflow 2.1+
  4. numba v0.52

Using the code

A single Jupyter notebook Modified_Transformer.ipynb, downloads the dataset, evaluation toolkit (RDKit), builds, trains and evaluates the transformer model. It's parameters can be easily modified and the whole setup can be easily ported to run with public-cloud like GCP, AWS, etc. or google-colab.

Additional resources

  1. Full-paper
  2. Poster

About

Generating drug-like molecules from gene expression signatures using transformer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published