Skip to content

This repo will hold part of my work on my master thesis. I am creating a dataset for Drug-Target-Interaction (DTI) problems.

Notifications You must be signed in to change notification settings

Lanorius/dataset_creation

Repository files navigation

My Master's Thesis: Drug-Target-Interaction (DTI) Dataset Creation

The following two repos were also part of my project. All three are required to repeat my steps:
Prediction Of Binding Affinity: prediction of DTI
ChemVAE Fork: encoding of the small molecules

Additionally, PortT5 is needed:
[ProtT5] (https://github.com/agemagician/ProtTrans): encoding of the proteins

How to use the Data Cleaner

  1. Clone this repo
  2. Download the MayaChemTools collection, and place the folder next to the folder of this repo (or you can specify another location in the config file)
  3. Install CD-Hit
  4. Ensure the requirements are met, we found that the best way is to create an RDKit environment and install the other requirements to it
  5. Check in the src/config.ini file if all files are chosen correctly, including the raw data file
  6. Run by using "python main.py"

Databases

BindingDB: Database of DTI data

RDKit

RDKit: Open-Source Cheminformatics Software, install using the information in the following link https://www.rdkit.org/docs/Install.html

RDKit additional scripts

MayaChemTools: collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs

CD-Hit

CD-Hit: a widely used program for clustering and comparing protein or nucleotide sequences

About

This repo will hold part of my work on my master thesis. I am creating a dataset for Drug-Target-Interaction (DTI) problems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages