uncertainty-swirl.rnw

\section{Swirl Review Questions}
  \subsection{Lesson 1}
  \begin{enumerate}             
    \item Which statement best describes how simple randomization differs from complete randomization?
    \begin{enumerate}
      \item Only part of our sample is randomly chosen
      \item We randomize part of the sample but do not randomize the treatment
      \item We choose how much of the sample receives treatment a priori
      \item We randomize the sample, a priori
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'We choose how much of the sample receives treatment a priori'
    @
    \fi
    \item Let variance and bias be denoted by V(x) and B(x) respectively. Then Mean Squared Error (MSE) equals:
    \begin{enumerate}
      \item V(x\textasciicircum{}2) + B(x)
      \item V(x) + B(x\textasciicircum{}2)
      \item V(x)\textasciicircum{}2 + B(x)
      \item V(x) + B(x)\textasciicircum{}2
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'V(x) + B(x)^2'
    @
    \fi
    \item The variance in a population is 5. For a sample of size 10 from that population, what is the variance in the sample mean (express your answer to the nearest 0.1)?
    \if1\solutions
    \newline\newline \noindent{\bf Solution:}
    <<eval=FALSE>>=
    0.5
    @
    \fi
    \item In the context of confidence intervals, what is alpha?
    \begin{enumerate}
      \item The probability that over repeated sampling the confidence interval does not contain the true value of a parameter
      \item The probability that the confidence interval contains the true value of a parameter, based on sample size
      \item The probability that the confidence interval contains the true value of a parameter, regardless of sample size
      \item The bias in the probability that the confidence interval contains the true value of a parameter
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'The probability that over repeated sampling the confidence interval does not contain the true value of a parameter'
    @
    \fi
    \item In our sample of voters we find that 70\% of the participants support Obama. If we want our 95\% confidence interval for the true population proportion to be within +/- 1\% of the true value, what is the minimum number of participants we must ask?
    \if1\solutions
    \newline\newline \noindent{\bf Solution:}
    <<eval=FALSE>>=
    8068
    @
    \fi
    \item Retrospective bias corrected confidence intervals differ from the prospective bias corrected confidence intervals because:
    \begin{enumerate}
      \item Actually, both correction methods do not differ at all
      \item In the prospective method the magnitude of the bias is estimated without observing the true parameter values
      \item In practice the retrospective bias corrected confidence intervals have a lower coverage rate
      \item In the prospective method more data points are needed in its estimation
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'In the prospective method the magnitude of the bias is estimated without observing the true parameter values'
    @
    \fi
    \item How many degrees of freedom does a t-statistic require for a sample of size N?
    \begin{enumerate}
      \item n
      \item n-1
      \item n-2
      \item sqrt(n)
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
  AnswerTests: omnitest(correctVal='n-1')
    @
    \fi
    \item I flip a fair coin 27 times. What is the variance in the expected number of heads (express to the nearest 0.001)?
    \if1\solutions
    \newline\newline \noindent{\bf Solution:}
    <<eval=FALSE>>=
    0.009
    @
    \fi
    \item Now, I flip a coin of unknown bias 10 times. It comes up heads 8 times. What is the (non-negative) margin of error, calculated from a 95\% confidence interval for the true probability the coin comes up heads (to the nearest 0.01)?
    \if1\solutions
    \newline\newline \noindent{\bf Solution:}
    <<eval=FALSE>>=
    0.25
    @
    \fi
    \item In a sample 50\% of the times a coin has landed tails. For 95\% confidence and a margin of error of 0.001, roughly how many flips should you we need to check if the coin is actually fair?
    \begin{enumerate}
      \item 100000
      \item 500000
      \item 1000000
      \item 5000000
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
  AnswerTests: omnitest(correctVal='1000000')
    @
    \fi
    \item The variance of a Bernoulli distributed random variable is given by: p(1-p). What is the value of p that maximizes the variance of such a variable?
    \begin{enumerate}
      \item 0
      \item 1/2
      \item 1/3
      \item 1/4
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
  AnswerTests: omnitest(correctVal='1/2')
    @
    \fi
  \end{enumerate}
  \subsection{Lesson 2}
  \begin{enumerate}
    \item Suppose you wish to test whether education has an effect on wages. What could be the null hypothesis?
    \begin{enumerate}
      \item Education has no effect on wages
      \item Education has an effect on wages
      \item Wages has no effect on education
      \item Wages has an effect on education
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'Education has no effect on wages'
    @
    \fi
    \item When hiring, large companies get tons of overqualified applicants, so they are fine with rejecting qualified applicants, though they do want to make sure selected applicants are actually qualified. Do large companies care more about Type I or Type II errors?
    \begin{enumerate}
      \item Type I
      \item Type II
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'Type I'
    @
    \fi
    \item You wish to conduct a study to test whether students at your university sleep more hours than the national average. What t-test should you use?
    \begin{enumerate}
      \item two sample, one-sided
      \item two sample, two-sided
      \item one sample, one-sided
      \item one sample, two-sided
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'one sample, one-sided'
    @
    \fi
    \item You have a coin which you suspect is unfair and instead shows heads with probability 0.7. So, you decide to test this by flipping it exactly once: if heads comes up, you say it is unfair, otherwise, you say it is fair. What is the power of this test (express to the nearest 0.1)?
    \if1\solutions
    \newline\newline \noindent{\bf Solution:}
    <<eval=FALSE>>=
    0.7
    @
    \fi
    \item Flipping a coin once is not super informative, you think to yourself, and so decide to flip the coin two more times. If any heads appear, you say the coin is unfair, otherwise, you say it is fair. With these three flips in total, what is the power of the test now (express to the nearest 0.001)?
    \if1\solutions
    \newline\newline \noindent{\bf Solution:}
    <<eval=FALSE>>=
    0.875
    @
    \fi
    \item What does the quantity (1 - power) represent?
    \begin{enumerate}
      \item probability of true positive
      \item probability of true negative
      \item probability of false positive
      \item probability of false negative
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'probability of false negative'
    @
    \fi
    \item You repeatedly run statistical tests until you get significance. This fallacy is known as:
    \begin{enumerate}
      \item multiple testing
      \item repeated error detection
      \item likelihood fallacy
      \item multiple hypothesis detection
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'multiple testing'
    @
    \fi
    \item You think a die is unfair, so you roll twice. If you get 2 sixes in a row, you deem it unfair. What is the probability of a Type I error in this test?
    \begin{enumerate}
      \item 1/2
      \item 1/6
      \item 1/12
      \item 1/36
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    '1/36'
    @
    \fi
    \item You wish to test whether political sentiment of the average Texan is significantly different from that of the average New Yorker. What test would be appropriate?
    \begin{enumerate}
      \item one-sample t-test
      \item two-sample z-test
      \item two-sample t-test
      \item one-sample z-test
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'two-sample t-test'
    @
    \fi
    \item How does a t-test primarily differ from a z-test?
    \begin{enumerate}
      \item You do not know the population mean
      \item You do not know the population standard deviation
      \item You do not know the confidence level
      \item You do not know if the data is normally distributed
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'You do not know the population standard deviation'
    @
    \fi
  \end{enumerate}
  \subsection{Lesson 3}
  \begin{enumerate}
    \item What is the exogeneity assumption behind linear regression?
    \begin{enumerate}
      \item The mean of the errors does not depend on the explanatory variables and is equal to 0
      \item The variance of the errors does not depend on the explanatory variables and is equal to 0
      \item The median of the errors does not depend on the explanatory variables and is equal to 0
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'The mean of the errors does not depend on the explanatory variables and is equal to 0'
    @
    \fi
    \item You observe that students who come to class tend to score higher on exams, so you conclude that attendance positively impacts exam performance. Yet, you fail to account for the effect of internal motivation. What term best describes this problem?
    \begin{enumerate}
      \item omitted variable bias
      \item omitted independent variable bias
      \item exogeneity
      \item Type I error
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'omitted variable bias'
    @
    \fi
    \item What is homoskedasticity?
    \begin{enumerate}
      \item independent errors and variance that does not depend on the explanatory variables
      \item independent errors and expectation that does not depend on the explanatory variables
      \item expectation of errors equals their variance
      \item variance of errors depends on their expectation
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'independent errors and variance that does not depend on the explanatory variables'
    @
    \fi
    \item In R, what term should you include in a linear regression formula to leave out the intercept?
    \begin{enumerate}
      \item 1
      \item -1
      \item 0
      \item -2
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    '-1'
    @
    \fi
    \item If the correlation between X and Y is 4, the standard deviation of X is 1 and the standard deviation of Y is 2, then what is the covariance between X and Y?
    \begin{enumerate}
      \item 8
      \item 4
      \item 2
      \item 1
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    '8'
    @
    \fi
    \item For 765 observations and 8 explanatory variables in a linear regression model with an intercept, when calculating the p-values for the coefficients how many degrees of freedom will the t-statistic have?
    \if1\solutions
    \newline\newline \noindent{\bf Solution:}
    <<eval=FALSE>>=
    756
    @
    \fi
    \item The variance of a variable can be written as V(x) = E[x\textasciicircum{}2] - E[x]\textasciicircum{}2. Using that formula and the properties of the residuals: What is a valid expression for the Variance of the residuals in a linear regression (where e denotes the vector of resisuals)?
    \begin{enumerate}
      \item E[e\textasciicircum{}2] - E[e]\textasciicircum{}2
      \item E[e\textasciicircum{}2]
      \item both expressions are correct
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'both expressions are correct'
    @
    \fi
    \item When might you use adjusted R\textasciicircum{}2 over R\textasciicircum{}2, holding everything else equal?
    \begin{enumerate}
      \item When there are many observations
      \item When there are many predictors
      \item When there are many degrees of freedom
      \item When the data is nonlinear
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'When there are many predictors'
    @
    \fi
    \item True or False: In a linear regression model, if two confidence intervals of two variables overlap, this implies that the confidence interval for the difference between those variables must contain 0.
    \begin{enumerate}
      \item True
      \item False
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'False'
    @
    \fi
    \item What is the relevant parameter if you wanted to set the significance level for confidence intervals within the \rfun{predict()} function (type a single word within double quotes)?
    \begin{enumerate}
      \item p-value
      \item level
      \item significance
    \end{enumerate}
    \if1\solutions
    \noindent{\bf Solution:}
    <<eval=FALSE>>=
    'level'
    @
    \fi
  \end{enumerate}