Usage

Workflows

Please execute the following workflows depending on your preferences:

workflow_predict.sh

installs the required conda environment if necessary
generates a network
optimizes the network and predicts missing molecular properties
output contains the predictions and the graph in separate files

workflow_optimize_predictions.py

installs the required conda environment if necessary
predicts missing molecular properties using a specified machine learning script
generates a network containing the formerly made predictions
optimizes the network including the annotated predictions
output contains the predictions with optimization suggestions and the graph in separate files

Further Advices

simply execute the shell scripts provided in this directory
for the usage of Network Balance Scaling as a predictor, please try and modify workflow_predict.sh
for the usage of Network Balance Scaling to optimize existing predictions, please try and modify workflow_optimize_predictions.sh
- this workflow contains a precalculated model
- new models can be created with our inhouse predictor based on deepchem using the script predidictors/bace_predictor_train.sh
- to generate predictions with an existing model independent from the workflow, please try the script predictors/bace_predictor_predict.sh
please note, that our workflows only include the BACE data set of moleculenet due to a huge RAM requirement for larger data sets of this benchmark
- if you are interested in the other data sets of this benchmark, feel free to construct your own pipeline and try our helper scripts in the utils directory
- you can find precalculated benchmark scenarios and the original CSV files from the moleculenet benchmark in the data directory
note that all scripts should be startet from main directory in order of the path management
to evaluate the results you achieved, please checkout our script utils/evaluate_RMSD.py

Dependencies

python is needed to run the scripts
if you use an anaconda environment:
- the workflows will automatically create and activate the required dependencies in extra environments
- if the activation fails, please add the following line insinde the workflows above the activate command:
  - source ~/anaconda3/etc/profile.d/conda.sh
- or add this line to your .bashrc
if you use another python environment, please make sure to provide the below listed dependencies

Tool Dependencies

python-igraph=0.8.3=py37h340e831_2
cvxpy=1.1.10=py37h89c1867_0
cvxpy-base=1.1.10=py37hdc94413_0
rdkit=2020.09.4=py37he53b9e1_0

generate_network.py

A script for creating networks with or without existing predictions at red vertices.

usage: python3 generate_network.py INDEX.csv MOLECULES.sdf PROPERTY [-o]

Create an MMS network in .GraphML format

positional arguments:
  INDEX.csv             Indexed output from mmpdb containing the MMS
                        relations.
  INPUT.smi             Input file in smiles format containing the molecules.
                        Row format: {SMILES\t ID\t PROPERTY}.
  PROPERTY              Property of interest, annotated in the input SD file.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTNAME, --outname OUTNAME
                        Specify name of output graph file.
  -p PREDICTION.smi, --prediction PREDICTION.smi
                        Predictions in smiles format of INPUT.smi

optimize_network.py

A script for optimizing networks. It can be used to either predict unknwon properties or to optimize existing predictions depending on the option -p

usage: python3 optimize_network.py FILE [-t, -s, -o, -l]

Optimizes an MMS network in GraphML format.

positional arguments:
  FILE.GraphML          Network to be optimized in GraphML format.

optional arguments:
  -h, --help            show this help message and exit
  -t TRAFO_WEIGHT, --traformations TRAFO_WEIGHT
                        Decimal number, which indicates how strongly the
                        number of transformations at an edge influence the sum
                        of squares
  -s STDDEV, --standarddeviation STDDEV
                        Decimal number, which indicates how strongly the
                        standard deviation of the transformation at an edge
                        influence the sum of squares
  -o OUTDIR, --outdir OUTDIR
                        Output directory to store optimized network.
  -l, --logarithmic     If this action is chosen, the number of
                        transformations is considered logarithmically.
  -f FORMAT, --format FORMAT
                        Type of result file containing the predictions.
                        Default is SMILES.
  -p, --prediction      Optimize existing predictions

Predictors

Integrated Predictor

The integrated predictor is based on DeepChem (https://github.com/deepchem/). Thus, the corresponding python scripts predictor/build_model.py and predictor/predictor.py additionally underly the MIT License. Our integrated predictor comprises:

a trained graph convolution regression model for the BACE data set
a trained message passing neural network model for the BACE data set
the Python script predictor/build_model.py which can be used to create a model
the Python script predictor/predictor.py which can be used to create predictions using an existing model
the bash script predictor/bace_predictor_train.sh which trains a model based on the BACE data set
the bash script predictor/bace_predictor_predict.sh which predicts properties for the BACE test set
training, validation and test set for BACE can be found in our data
for the other data sets, please use the scripts provided in utils to create your own data splits and convert files with respective formats

Integrated Predictor Dependencies

deepchem=2.3.0=py_2
rdkit=2020.09.2=py37h713bca6_0
tensorflow=1.14.0=h4531e10_0

Integrating Your Own Predictor

To incorporate existing predictions into your network, please mind the following:

the resulting predictions need to be written to a SMILES file
the SMILES file needs to follow the format SMILES\tID\tPROPERTY
using workflow_optimize_predictions.sh you need to replace the call of our default predictor based on deepchem (./predictor/bace_predictor_predict.sh)
please remember to adapt the training, validation and query set at the beginning of the bash script
note: Please remember, that all molecules need to be included into the mmpdb index. Using the network generation with existing predictions independently from workflow_optimize_predictions requires the molecules with known properties (training and validation sets) in a first SMILES file, your predictions in a second SMILES file and the index including all (training, validation and query) molecules.

File Formats

the workflows are constructed to run with SD files as input
the SD files are converted during the workflow using the scripts provided at utils/
for directly using SMILES files, please note that the required format consists of three tab-separated columns:
- SMILES\tID\tPROPERTY
the SMILES format is also a requirement for the input files used in mmpdb's MMP analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Usage

Workflows

workflow_predict.sh

workflow_optimize_predictions.py

Further Advices

Dependencies

Tool Dependencies

generate_network.py

optimize_network.py

Predictors

Integrated Predictor

Integrated Predictor Dependencies

Integrating Your Own Predictor

File Formats

Files

README.md

Latest commit

History

README.md

File metadata and controls

Usage

Workflows

workflow_predict.sh

workflow_optimize_predictions.py

Further Advices

Dependencies

Tool Dependencies

generate_network.py

optimize_network.py

Predictors

Integrated Predictor

Integrated Predictor Dependencies

Integrating Your Own Predictor

File Formats