-
Notifications
You must be signed in to change notification settings - Fork 3
/
STAT697-01_s19-team-3_data_analysis_by_MG.sas
185 lines (143 loc) · 5.18 KB
/
STAT697-01_s19-team-3_data_analysis_by_MG.sas
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
*******************************************************************************;
**************** 80-character banner for column width reference ***************;
* (set window width to banner width to calibrate line length to 80 characters *;
*******************************************************************************;
* set relative file import path to current directory (using standard SAS trick);
X "cd ""%substr(%sysget(SAS_EXECFILEPATH),1,%eval(%length(%sysget(SAS_EXECFILEPATH))-%length(%sysget(SAS_EXECFILENAME))))""";
* load external file that will generate final analytic file;
%include '.\STAT697-01_s19-team-3_data_preparation.sas';
*******************************************************************************;
* Research Question Analysis Starting Point;
*******************************************************************************;
title1 justify=left
'Research Question: Which country has the highest inequaality-adjusted life expectancy index?'
;
title2 justify=left
'Rationale: This should help identify the which country has the lowest education index and make appropriate action'
;
footnote1 justify=left
'The country has the most life expectancy index does not always has the highest human development index.'
;
*
Note: This compares the column "Country" to
the column of "Inequality-adjusted life expectancy Index from 2018
Statistical_Annex_Table_3.
Limitations: The value in this dataset need to be numeric to perform mean and
median calculation.
Methodology: To proc sort the data in descending order and find the country
has the highest lift expectancy index which is Albania
Followup Steps: Clear out any missing value.
;
* output first five row of resulting sorted data to better visualize the results;
proc sort
data=country_analytic_file_raw (obs=5)
out = country_analystic_file_raw_Q1
;
by
descending
adjusted_life_index
HDI
;
run;
proc report data=country_analystic_file_raw_Q1;
columns
Country
adjusted_life_index
HDI
;
run;
* clear titles/footnotes;
title;
footnote;
*******************************************************************************;
* Research Question Analysis Starting Point;
*******************************************************************************;
title1 justify=left
'Research Question: What is the correleation between year of school and income per capita?'
;
title2 justify=left
'Rationale: This should help demonstrate whehter school can be a important factor in national income value.'
;
footnote1 justify=left
'Based on the correlation table, there is a positive correlation between these two variable'
;
*
Note: This compares the column "Expected years of
schooling" to the column "Estimated gross national income per capita" from
2018 Statistical_Annex_Table_4
Limitations: The value in this dataset need to be numeric to perform mean and
median calculation
Methodology: To plot a scatterplot to visualize the trend of the reseach question.
According to the graph, the longer of the female education, the higher of the female
gross national income
Followup Steps: Run a linear regresssio analysis to determine whether there is a
strong correleation between the two variables.;
* Construct a scatter plot to show the relationship between the two interested
variable;
proc sgplot
data=country_analytic_file_raw;
scatter x=Year_School_Female y=Estimated_gross_national_income_
;
run;
data work;
set country_analytic_file_raw;
if
Year_School_Female ne '..'
then
do;
Year_School_Female2 = input(Year_School_Female, best7.);
end;
if
Estimated_gross_national_income_ ne '..'
then
do;
Estimated_income_fe = input(Estimated_gross_national_income_, best7.);
end;
run;
proc corr
data=work
nosimple
;
var
Year_School_Female2
Estimated_income_fe
;
run;
* clear titles/footnotes;
title;
footnote;
*******************************************************************************;
* Research Question Analysis Starting Point;
*******************************************************************************;
title1 justify=left
'Research Question: Which country is experiencing the most severe multidimensional poverty?'
;
title2 justify=left
'Rationale: This should help Non-profit organizaion to appropriate allocate resources to the needest country.'
;
footnote1 justify=left
'Looks like majority of afrian country are in severe multidimenstional poverty'
;
*
Note: This compares the column "Country" from
sat15 to the column "Population in severe multidimensional poverty from 2018
Statistical_Annex_Table_6.
Limitations: missing value may need to removed to perform general linear model
Methodology: To sort the selected variable and find the highest severe
multidimensional poverty.
Followup Steps: To make a boxplot of the data to see the distribution of data
;
* To sort the data in descending order to find the max value;
proc sort
data=country_analytic_file_raw
;
by
descending
Population_in_severe_multidimens
;
run;
proc print;
run;
* clear titles/footnotes;
title;
footnote;