Skip to content

Logistic Regression

joaovissoci edited this page Oct 4, 2012 · 1 revision

Definition

Logistic regression is a type of regression used when the outcome variable is binary or ordinal ("yes" or "no", "risk" or "no risk"). It is commonly used for predicting the probability of occurrence of an event, based on several predictor variables that may either be numerical (continuous or discrete) or categorical (ordinal or nominal) (adapted from R-Bloggers http://goo.gl/FZAXA).

Input

Required variables to run this test

This test requires an dataset with a vector for an outcome variable of a character type, with dichotomous characteristic, and vectors with the predictors or covariances with different types of variables. Predictors might be numerical, categorical or discrete.

Taking the Cedegren dataset, reported by Manning (2007) available at http://goo.gl/KmJSD

install.packages("MASS",repos=("http://cran.us.r-project.org"))
install.packages("verification",repos=("http://cran.us.r-project.org"))
library(MASS)
library(verification)
attach(anorexia)
str(anorexia)
head(anorexia)
tail(anorexia)
summary(anorexia)

How to describe this test in your Methods section

"We used the Bhapkar coefficient to measure concordance between raters."

Output

Annotated output from function (example from http://goo.gl/NsR20)

#This function will create a glm object (here taken as 'anorex.1' who will carry all the statistics analysis for the logistic regression).
#Items for the functions are: outcome=Postwt; Predictors=Prewt,Treat; family=type of model (see?glm for more detais; data=dataset used for the analysis)

anorex.1 <- glm(Postwt ~ Prewt + Treat,
                family = gaussian, data = anorexia)

#summary function will provide a summary for the logisti regression analysis containing:
#1. model formula
#2. deviance residuals
#3. Coefficients with p-values for each analysis
#4. Deviances and degrees of freedom
#5. AIC parameter
summary(anorex.1)

#anova function will calculate differences between models, or compare one model to the null model hypothesis.
anova(anorex.1,test="Chisq")
#anova(anorex.1,anorex.2,test="Chisq") --- A variation with a second glm model (anorex.2)

#Analyse logistic models' adequacy and fitness 
pred_fig1 <- as.numeric(fitted(anorex.1))
roc.area(Postwt, pred_fig1)

logistic.display(logisticmodelfigure1)#Logistic.display will give the OR coefficients  as well as the 95%CI
residuals(logisticmodelfigure1) # residuals
influence(logisticmodelfigure1) # regression diagnostics
layout(matrix(c(1,2,3,4),2,2)) # creates the white space for 4 graphs/page 
plot(logisticmodelfigure1) #generates 4 graphs/page

Annotated reference - Need to put in some "Gold Standard references"

  1. Original function description at http://rss.acs.unt.edu/Rdoc/library/irr/html/bhapkar.html
  2. Brief explanation at http://www.john-uebersax.com/stat/mcnemar.htm#bhapkar
  3. Original article: Bhapkar, V.P. (1966). A note on the equivalence of two test criteria for hypotheses in categorical data. Journal of the American Statistical Association, 61, 228-235. http://goo.gl/P21a3