-
Notifications
You must be signed in to change notification settings - Fork 1
/
uncertainty-swirl.rnw
364 lines (364 loc) · 13.5 KB
/
uncertainty-swirl.rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
\section{Swirl Review Questions}
\subsection{Lesson 1}
\begin{enumerate}
\item Which statement best describes how simple randomization differs from complete randomization?
\begin{enumerate}
\item Only part of our sample is randomly chosen
\item We randomize part of the sample but do not randomize the treatment
\item We choose how much of the sample receives treatment a priori
\item We randomize the sample, a priori
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'We choose how much of the sample receives treatment a priori'
@
\fi
\item Let variance and bias be denoted by V(x) and B(x) respectively. Then Mean Squared Error (MSE) equals:
\begin{enumerate}
\item V(x\textasciicircum{}2) + B(x)
\item V(x) + B(x\textasciicircum{}2)
\item V(x)\textasciicircum{}2 + B(x)
\item V(x) + B(x)\textasciicircum{}2
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'V(x) + B(x)^2'
@
\fi
\item The variance in a population is 5. For a sample of size 10 from that population, what is the variance in the sample mean (express your answer to the nearest 0.1)?
\if1\solutions
\newline\newline \noindent{\bf Solution:}
<<eval=FALSE>>=
0.5
@
\fi
\item In the context of confidence intervals, what is alpha?
\begin{enumerate}
\item The probability that over repeated sampling the confidence interval does not contain the true value of a parameter
\item The probability that the confidence interval contains the true value of a parameter, based on sample size
\item The probability that the confidence interval contains the true value of a parameter, regardless of sample size
\item The bias in the probability that the confidence interval contains the true value of a parameter
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'The probability that over repeated sampling the confidence interval does not contain the true value of a parameter'
@
\fi
\item In our sample of voters we find that 70\% of the participants support Obama. If we want our 95\% confidence interval for the true population proportion to be within +/- 1\% of the true value, what is the minimum number of participants we must ask?
\if1\solutions
\newline\newline \noindent{\bf Solution:}
<<eval=FALSE>>=
8068
@
\fi
\item Retrospective bias corrected confidence intervals differ from the prospective bias corrected confidence intervals because:
\begin{enumerate}
\item Actually, both correction methods do not differ at all
\item In the prospective method the magnitude of the bias is estimated without observing the true parameter values
\item In practice the retrospective bias corrected confidence intervals have a lower coverage rate
\item In the prospective method more data points are needed in its estimation
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'In the prospective method the magnitude of the bias is estimated without observing the true parameter values'
@
\fi
\item How many degrees of freedom does a t-statistic require for a sample of size N?
\begin{enumerate}
\item n
\item n-1
\item n-2
\item sqrt(n)
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
AnswerTests: omnitest(correctVal='n-1')
@
\fi
\item I flip a fair coin 27 times. What is the variance in the expected number of heads (express to the nearest 0.001)?
\if1\solutions
\newline\newline \noindent{\bf Solution:}
<<eval=FALSE>>=
0.009
@
\fi
\item Now, I flip a coin of unknown bias 10 times. It comes up heads 8 times. What is the (non-negative) margin of error, calculated from a 95\% confidence interval for the true probability the coin comes up heads (to the nearest 0.01)?
\if1\solutions
\newline\newline \noindent{\bf Solution:}
<<eval=FALSE>>=
0.25
@
\fi
\item In a sample 50\% of the times a coin has landed tails. For 95\% confidence and a margin of error of 0.001, roughly how many flips should you we need to check if the coin is actually fair?
\begin{enumerate}
\item 100000
\item 500000
\item 1000000
\item 5000000
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
AnswerTests: omnitest(correctVal='1000000')
@
\fi
\item The variance of a Bernoulli distributed random variable is given by: p(1-p). What is the value of p that maximizes the variance of such a variable?
\begin{enumerate}
\item 0
\item 1/2
\item 1/3
\item 1/4
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
AnswerTests: omnitest(correctVal='1/2')
@
\fi
\end{enumerate}
\subsection{Lesson 2}
\begin{enumerate}
\item Suppose you wish to test whether education has an effect on wages. What could be the null hypothesis?
\begin{enumerate}
\item Education has no effect on wages
\item Education has an effect on wages
\item Wages has no effect on education
\item Wages has an effect on education
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'Education has no effect on wages'
@
\fi
\item When hiring, large companies get tons of overqualified applicants, so they are fine with rejecting qualified applicants, though they do want to make sure selected applicants are actually qualified. Do large companies care more about Type I or Type II errors?
\begin{enumerate}
\item Type I
\item Type II
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'Type I'
@
\fi
\item You wish to conduct a study to test whether students at your university sleep more hours than the national average. What t-test should you use?
\begin{enumerate}
\item two sample, one-sided
\item two sample, two-sided
\item one sample, one-sided
\item one sample, two-sided
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'one sample, one-sided'
@
\fi
\item You have a coin which you suspect is unfair and instead shows heads with probability 0.7. So, you decide to test this by flipping it exactly once: if heads comes up, you say it is unfair, otherwise, you say it is fair. What is the power of this test (express to the nearest 0.1)?
\if1\solutions
\newline\newline \noindent{\bf Solution:}
<<eval=FALSE>>=
0.7
@
\fi
\item Flipping a coin once is not super informative, you think to yourself, and so decide to flip the coin two more times. If any heads appear, you say the coin is unfair, otherwise, you say it is fair. With these three flips in total, what is the power of the test now (express to the nearest 0.001)?
\if1\solutions
\newline\newline \noindent{\bf Solution:}
<<eval=FALSE>>=
0.875
@
\fi
\item What does the quantity (1 - power) represent?
\begin{enumerate}
\item probability of true positive
\item probability of true negative
\item probability of false positive
\item probability of false negative
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'probability of false negative'
@
\fi
\item You repeatedly run statistical tests until you get significance. This fallacy is known as:
\begin{enumerate}
\item multiple testing
\item repeated error detection
\item likelihood fallacy
\item multiple hypothesis detection
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'multiple testing'
@
\fi
\item You think a die is unfair, so you roll twice. If you get 2 sixes in a row, you deem it unfair. What is the probability of a Type I error in this test?
\begin{enumerate}
\item 1/2
\item 1/6
\item 1/12
\item 1/36
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'1/36'
@
\fi
\item You wish to test whether political sentiment of the average Texan is significantly different from that of the average New Yorker. What test would be appropriate?
\begin{enumerate}
\item one-sample t-test
\item two-sample z-test
\item two-sample t-test
\item one-sample z-test
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'two-sample t-test'
@
\fi
\item How does a t-test primarily differ from a z-test?
\begin{enumerate}
\item You do not know the population mean
\item You do not know the population standard deviation
\item You do not know the confidence level
\item You do not know if the data is normally distributed
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'You do not know the population standard deviation'
@
\fi
\end{enumerate}
\subsection{Lesson 3}
\begin{enumerate}
\item What is the exogeneity assumption behind linear regression?
\begin{enumerate}
\item The mean of the errors does not depend on the explanatory variables and is equal to 0
\item The variance of the errors does not depend on the explanatory variables and is equal to 0
\item The median of the errors does not depend on the explanatory variables and is equal to 0
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'The mean of the errors does not depend on the explanatory variables and is equal to 0'
@
\fi
\item You observe that students who come to class tend to score higher on exams, so you conclude that attendance positively impacts exam performance. Yet, you fail to account for the effect of internal motivation. What term best describes this problem?
\begin{enumerate}
\item omitted variable bias
\item omitted independent variable bias
\item exogeneity
\item Type I error
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'omitted variable bias'
@
\fi
\item What is homoskedasticity?
\begin{enumerate}
\item independent errors and variance that does not depend on the explanatory variables
\item independent errors and expectation that does not depend on the explanatory variables
\item expectation of errors equals their variance
\item variance of errors depends on their expectation
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'independent errors and variance that does not depend on the explanatory variables'
@
\fi
\item In R, what term should you include in a linear regression formula to leave out the intercept?
\begin{enumerate}
\item 1
\item -1
\item 0
\item -2
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'-1'
@
\fi
\item If the correlation between X and Y is 4, the standard deviation of X is 1 and the standard deviation of Y is 2, then what is the covariance between X and Y?
\begin{enumerate}
\item 8
\item 4
\item 2
\item 1
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'8'
@
\fi
\item For 765 observations and 8 explanatory variables in a linear regression model with an intercept, when calculating the p-values for the coefficients how many degrees of freedom will the t-statistic have?
\if1\solutions
\newline\newline \noindent{\bf Solution:}
<<eval=FALSE>>=
756
@
\fi
\item The variance of a variable can be written as V(x) = E[x\textasciicircum{}2] - E[x]\textasciicircum{}2. Using that formula and the properties of the residuals: What is a valid expression for the Variance of the residuals in a linear regression (where e denotes the vector of resisuals)?
\begin{enumerate}
\item E[e\textasciicircum{}2] - E[e]\textasciicircum{}2
\item E[e\textasciicircum{}2]
\item both expressions are correct
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'both expressions are correct'
@
\fi
\item When might you use adjusted R\textasciicircum{}2 over R\textasciicircum{}2, holding everything else equal?
\begin{enumerate}
\item When there are many observations
\item When there are many predictors
\item When there are many degrees of freedom
\item When the data is nonlinear
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'When there are many predictors'
@
\fi
\item True or False: In a linear regression model, if two confidence intervals of two variables overlap, this implies that the confidence interval for the difference between those variables must contain 0.
\begin{enumerate}
\item True
\item False
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'False'
@
\fi
\item What is the relevant parameter if you wanted to set the significance level for confidence intervals within the \rfun{predict()} function (type a single word within double quotes)?
\begin{enumerate}
\item p-value
\item level
\item significance
\end{enumerate}
\if1\solutions
\noindent{\bf Solution:}
<<eval=FALSE>>=
'level'
@
\fi
\end{enumerate}