This project uses ESM2 protein embeddings and MolecularTransformer drug embeddings to train a linear classifier to predict drug-targets.
Create and activate local environment
python -m venv .venv
source .venv/bin/activate
Install requirements
pip install -e .
Query the Bio2RDF endpoint to get drugs and their smiles, targets and their protein sequences, and the set of known drug-target pairs
Process the Bio2RDF data to generate the inputs needed for the two embeddings methods
python src/
Install the ESM library
pip install git+
Generate the protein embeddings
esm-extract esm2_t33_650M_UR50D data/download/drugbank_targets.fasta data/vectors/drugbank_targets_esm2_l33_mean --repr_layers 33 --include mean
Install the Molecular Transformer Embeddings
git clone
cd MolecularTransformerEmbeddings
chmod +x
if you get an error (bash: ./ /bin/bash^M: bad interpreter: No such file or directory) running the download script, then run dos2unix
Generate the drug embeddings
python --data_path=../data/download/drugbank_smiles.txt
mv embeddings/drugbank_smiles.npz ../data/vectors/
cd ..
Run the prediction tool
python src/
Results are in results folder