Skip to content

renayang2023/drug-target-emb-predict

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

drug-target-emb-predict

This project uses ESM2 protein embeddings and MolecularTransformer drug embeddings to train a linear classifier to predict drug-targets.

Install

Create and activate local environment

python -m venv .venv
source .venv/bin/activate

Install requirements

pip install -e .

Query the Bio2RDF endpoint to get drugs and their smiles, targets and their protein sequences, and the set of known drug-target pairs

./get_bio2rdf_data.sh

Process the Bio2RDF data to generate the inputs needed for the two embeddings methods

python src/prepare.py

Install the ESM library

pip install git+https://github.com/facebookresearch/esm.git

Generate the protein embeddings

esm-extract esm2_t33_650M_UR50D data/download/drugbank_targets.fasta data/vectors/drugbank_targets_esm2_l33_mean --repr_layers 33 --include mean

Install the Molecular Transformer Embeddings

git clone https://github.com/mpcrlab/MolecularTransformerEmbeddings.git
cd MolecularTransformerEmbeddings
chmod +x download.sh
./download.sh

if you get an error (bash: ./download.sh: /bin/bash^M: bad interpreter: No such file or directory) running the download script, then run dos2unix

Generate the drug embeddings

python embed.py --data_path=../data/download/drugbank_smiles.txt
mv embeddings/drugbank_smiles.npz ../data/vectors/
cd ..

Run the prediction tool

python src/dt_predict.py

Results are in results folder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.0%
  • Shell 3.0%