-
Notifications
You must be signed in to change notification settings - Fork 0
/
0) Sourcing and integration File.R
100 lines (67 loc) · 3.22 KB
/
0) Sourcing and integration File.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#Source Code
#0. Enter Working Directory where folder is located on CPU
here()
#1. Loading necessary packages
source("1) Packages.R")
#2. Loading data and pre-processing data
source("2) Loading Data and Pre-Processing.R")
#3. Loading custom functions
source("3) Custom Functions.R")
#4. Creating base models of logistic regression and random forest (without missing values) and evaluating performance
source("4) Base Model Performance.R")
#5. Running the functions used for mode imputation and random forest imputation
source("5) Simulation Function for Mode Imputation.R")
source("6) Simulation Function for RF Imputation.R")
#6. Running simulation on chosen models at different levels of missingness using mode imputation
#Inputs are: (number of simulations, percentage of missing values)
#A. 5% Missing Values
missingness_performance(1000,0.05)
#B. 10% Missing Values
missingness_performance(1000,0.10)
#C. 15% Missing Values
missingness_performance(1000,0.15)
#D. 20% Missing Values
missingness_performance(1000,0.20)
#E. 25% Missing Values
missingness_performance(1000,0.25)
#F. 30% Missing Values
missingness_performance(1000,0.30)
#G. 40% Missing Values
missingness_performance(1000,0.40)
#H. 50% Missing Values
missingness_performance(1000,0.50)
#7. Running simulation on chosen models at different levels of missingness using random forest imputation
#Inputs are: (number of simulations, percentage of missing values)
#Select Number of Cores to Use
doParallel::registerDoParallel(cores = 4)
#A. 5% Missing Values
missingness_performance_rf(1000,0.05)
#B. 10% Missing Values
missingness_performance_rf(1000,0.10)
#C. 15% Missing Values
missingness_performance_rf(1000,0.15)
#D. 20% Missing Values
missingness_performance_rf(1000,0.20)
#E. 25% Missing Values
missingness_performance_rf(1000,0.25)
#F. 30% Missing Values
missingness_performance_rf(1000,0.30)
#G. 40% Missing Values
missingness_performance_rf(1000,0.40)
#F. 50% Missing Values
missingness_performance_rf(1000,0.50)
#8. Performance evaluation of simulated models
source("7) Performance Evaluation.R")
#Plotting Line Charts of Misclassification Rate of Mode and Random Forest Imputation
misclassification_plot(performance_mode,"Mode")
misclassification_plot(performance_rf,"Random Forest")
#Plotting line charts (2x2) of AUC, Precision, Sensitivity, and Specificity for Both Imputations
other_measures_plot(performance_mode,"Mode")
other_measures_plot(performance_rf,"Random Forest")
#Plotting line charts (1x2) Comparing Performance of Imputation Methods
imputation_comparison(performance_glm,performance_randomforest, "Misclassification")
imputation_comparison(performance_glm,performance_randomforest, "AUC")
imputation_comparison(performance_glm,performance_randomforest, "Precision")
imputation_comparison(performance_glm,performance_randomforest, "Sensitivity")
imputation_comparison(performance_glm,performance_randomforest, "Specificity")
#----------------------------------END OF CODE------------------------------------------------------