Skip to content

alinarekena/Data-Science-project-2019

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Original models were build using R, and we did the estimation of the change in model performance caused by using different programming language and libraries. Notebook "PROJECT_Bioconcentration_factor_06_RF_models_FINALoob_coments_added" contains replicated models from the original publication.

To get models, presented in the Results section, notebooks "XGBoost_Validation", "Old_Validation_SVM_Naive" (contains both Linear SVM model and Gaussian Naive Bayes) and "Old_Validation_LightGBM" have to be run. The graphical representation was done by transferring numbers to "Results_OLD_v3_xgboost" notebook.

Final sets of parameters were chosen after running "XGBoost_Train", "Old_Train_SVM_Naive" and "Old_Train_LightGBM" with different model parameters, manually or by loops. Linear SVM was chosen after trying Gaussian, Linear Polynomial and Sigmoid kernels (notebooks "SVM - Gaussian Kernel", "SVM - Linear", "SVM - Polynomial Kernel" and "SVM - Sigmoid Kernel" respectively). Other types of SVM were either not effective or took to much time to run.

All those models were run using 3 selected features from the paper. We as well performed our feature selection, which can be found in the "Features selection" notebook. The algorithm is based on random shuffling and splitting of the dataset, thus can give a slightly different result every time.
Notebooks marked with "New" used features selected by our algorithm as an input, but the performance was insufficient and they were not included in final results.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published