-
Notifications
You must be signed in to change notification settings - Fork 3
/
STAT697-01_s19-team-3_data_analysis_by_MP.sas
259 lines (198 loc) · 9.24 KB
/
STAT697-01_s19-team-3_data_analysis_by_MP.sas
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
*******************************************************************************;
**************** 80-character banner for column width reference ***************;
* (set window width to banner width to calibrate line length to 80 characters *;
*******************************************************************************;
* set relative file import path to current directory (using standard SAS trick);
X "cd ""%substr(%sysget(SAS_EXECFILEPATH),1,%eval(%length(%sysget(SAS_EXECFILEPATH))-%length(%sysget(SAS_EXECFILENAME))))""";
* load external file that will generate final analytic file;
%include '.\STAT697-01_s19-team-3_data_preparation.sas';
*******************************************************************************;
* Research Question Analysis Starting Point;
*******************************************************************************;
title1 justify=left
'Question: Is there evidence that countries with a greater percent of Inequality of Education also suffer from greater poverty as measured by the Multidimensional Poverty Index?'
;
title2 justify=left
'Rationale: This would indicate if there is an assocation between eduaction inequality and poverty, and if the former can be an indicator of poverty in a country.'
;
*
Note: This compares the column Inequality of Education from 2018 Statistical
Annex Table 3 to the column Multidimensional Poverty Index from 2018
Statistical Annex Table 6.
Limitations: Values denoted as ".." for the column Inequality of Education
should be excluded from analysis since they represent missing values.
Methodology: Use proc corr to perform a correlation analysis, and then use proc
sgplot to output a scatterplot to illustrate the correlation.
Followup Steps: Perform a more formal linear regression analysis to measure the
degre of association between the two variables and the type of relationship.
;
title3
'Correlation analysis for Inequality in education and Multidimensional Poverty Index'
;
footnote1 justify=left
'Assuming the variables Inequality in Education and Multidimensional Poverty Index are normally distributed, the data analysis shows that there is a strong positive correlation between percent of inequality in education and the Multidimensional Poverty Index by country.'
;
footnote2 justify=left
'There is a statistically significant correlation with high confidence level since the p-value is less than 0.001.'
;
footnote3 justify=left
'Countries with a greater percent of inequality in education generally have a high Multidimensional Poverty Index which could possibly suggest that a high percent of inqeuality in education could be in indicator of high levels of poverty in a country.'
;
data work1;
set country_analytic_file_raw;
if Inequality_in_education='..' then delete;
if Inequality_in_education='.' then delete;
if Inequality_in_education=' ' then delete;
Inequality_in_education1 = input(Inequality_in_education, 7.);
run;
proc corr
data=work1
;
var
Inequality_in_education1
Multidimensional_Poverty_Index
;
run;
title;
footnote;
title1
'Plot illustrating the positive correlation between Inequality in education and Multidimensional Poverty Index'
;
footnote1
'In the above plot, we can see how values for percent of Inequality of education tend to increase as the values for the Multidimensional Poverty Index increase.'
;
proc sgplot data=work1;
scatter
x=Inequality_in_education1
y=Multidimensional_Poverty_Index
;
run;
title;
footnote;
*******************************************************************************;
* Research Question Analysis Starting Point;
*******************************************************************************;
title1 justify=left
'Question: Is there a significant difference in the average years of schooling by gender among the countries?'
;
title2 justify=left
'Rationale: This would show if their is an association between average years of schooling per country and gender and could possibly indicate gender disadvantages.'
;
footnote1 justify=left
'Assuming the mean years of schoolong for female and male are normally distributed, there is a difference of 0.6486 in the gender means, with males have the higher mean.'
;
footnote2 justify=left
'Data analysis with a paired t-test shows strong evidence of a difference in the paired mean years of schooling between females and males with p-value of less than 0.001.'
;
footnote3 justify=left
'The significant difference in means of years of schooling between males and females could indicate gender discrimination or possibly cultural norms that encourage more male years of schooling compared to females.'
;
*
Note: This compares the columns Female and Male under Mean years of schooling
from 2018 Statistical Annex Table 4 to each other.
Limitations: Values denoted as ".." for the columns Mean years of schooling
for both Female and Male should be excluded from analysis since they represent
missing values.
Methodology: Use proc ttest to perform a paired comparison of group means between
the two genders for mean years of schooling. Use proc report to create a table
for Mean_years_schooling_f and Mean_years_schooling_m, listing characteristic
values of the data including the min, max, mean and median.
Followup Steps: Display a visual of the summary statistics such as a boxplot to
further illustrate the difference between genders.
;
data work;
set country_analytic_file_raw;
if Mean_years_of_schooling_female ne '..' then
Mean_years_of_schooling_female1 = input(Mean_years_of_schooling_female, best7.)
;
if Mean_years_of_schooling_male ne '..' then
Mean_years_of_schooling_male1 = input(Mean_years_of_schooling_male, best7.);
run;
proc ttest data=work;
paired Mean_years_of_schooling_male1*Mean_years_of_schooling_female1;
run;
title;
footnote;
title1
'Comparing the summary statistics for females and males regarding Mean years of Schooling.'
;
footnote1
'We can observe that the min, max, mean and median values for mean years of schooling per country are each higher for males than for females.'
;
proc report data=work;
column Mean_years_of_schooling_female1=minf
Mean_years_of_schooling_female1=maxf
Mean_years_of_schooling_female1=avgf
Mean_years_of_schooling_female1=medf
Mean_years_of_schooling_male1=minm
Mean_years_of_schooling_male1=maxm
Mean_years_of_schooling_male1=avgm
Mean_years_of_schooling_male1=medm;
define minf/min 'Female Min';
define maxf/max 'Female Max';
define avgf/mean 'Female Mean';
define medf/median 'Female Median';
define minm/min 'Male Min';
define maxm/max 'Male Max';
define avgm/mean 'Male Mean';
define medm/median 'Male Median';
run;
title;
footnote;
*******************************************************************************;
* Research Question Analysis Starting Point;
*******************************************************************************;
title1 justify=left
'Question: Can percent of Population living below the income poverty line for 2006-2017 be used to predict country HDI?'
;
title2 justify=left
'Rationale: This can help to see how poverty levels are associated with Human Development Index value, which assesses country development.'
;
*
Note: This compares the column Population living below national poverty line
from 2018 Statistical Annex Table 6 to the column HDI from 2018 Statistical
Annex Table 3.
Limitations: Missing values for the column Population living below
national poverty line should be excluded from analysis as there are 10 missing
values indicated.
Methodology: Use proc corr to perform a correlation analysis, and then use proc
sgplot to output a scatterplot to illustrate the correlation.
Followup Steps: Perform a more formal linear regression analysis to measure the
degre of association between the two variables and the type of relationship.
;
title3
'Correlation analysis for Population living below national and HDI'
;
footnote1 justify=left
'Assuming the variables Population living below national poverty line and HDI are normally distributed, the data analysis shows that there is a strong negative correlation between percent of the population living below the income poverty line and the Human Development Index by country.'
;
footnote2 justify=left
'There is a statistically significant correlation with high confidence level since the p-value is less than 0.001.'
;
footnote3 justify=left
'Countries with a greater percent of the population living below the national poverty line generally have a lower Human Development Index which could possibly be explained by lower standards of living in countries with greater poverty.'
;
proc corr
data=country_analytic_file_raw
;
var
Population_living_below_national
HDI
;
run;
title;
footnote;
title1
'Plot illustrating the negative correlation between Population living below the national poverty line and the Human Development Index'
;
footnote1
'In the above plot, we can see how values of percent of Population living below the national poverty line tend to decrease as values for the Human Development Index increase.'
;
proc sgplot data=country_analytic_file_raw;
scatter
x=Population_living_below_national
y=HDI
;
run;
title;
footnote;