Performance of regression models as a function of experiment noise

This repository contains code and data (or link to needed data) needed to replicate the analysis carried out in the pre-print by Li G, et al, 2019 (https://arxiv.org/abs/1912.08141).

Description

The different parts of the analysis is broken down into separate folders:

MonteCarloSimulation contains the script for Monte Carlo simulations in Figure 2.
EvaluateClassicalModels contains the script to evaluate the performance of six different classical regression models on a given dataset. This script was used in the enzyme Topt prediction section (Figure 3).
EnzymeTopt contains the two fasta files for enzyme Topt: ones with raw and cleaned Topt records. It also contains the script for feature extraction (Figure 3).
DeepNN contains the architecture of deep neural networks used in the prediction of enzyme Topt (Figure 3).
Transcriptomics contains scripts for the transcriptomics data analysis (Figure 4).
Genomics contains the scripts, datasets and results for the evaluation of a random forest model on the prediction 34 quantitative phenotypes from pan genome (Figure 5). The pan genome data was obtained from https://zenodo.org/record/3407352#.Xe-gX5P0nUI. Quantitative phenotypes were obtained from Peter, J. et al. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature 556, 339–344 (2018).

Dependencies

numpy 1.15.0
pandas 0.23.4
sklearn 0.20.3
seaborn 0.9.0
keras 2.2.4
tensorflow 1.10.0
hyperopt 0.1.2

The repository was tested with Python 3.6.7.

Hardware

Scripts of MonteCarloSimulation and Transcriptomics can be performed with PC with linux or Mac OS. Other scripts need to be done with a computer cluster. All deep learning related analysis (UniRep representation and DeepNN) need GPU platform.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Performance of regression models as a function of experiment noise

Description

Dependencies

Hardware

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
DeepNN		DeepNN
EnzymeTopt		EnzymeTopt
EvaluateClassicalModels		EvaluateClassicalModels
Genomics		Genomics
MonteCarloSimulation		MonteCarloSimulation
Transcriptomics		Transcriptomics
LICENSE		LICENSE
README.md		README.md

License

EngqvistLab/Supplemetenary_scripts_datasets_R2LG

Folders and files

Latest commit

History

Repository files navigation

Performance of regression models as a function of experiment noise

Description

Dependencies

Hardware

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages