-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path2021_data_Kelvin.Rmd
205 lines (171 loc) · 7.38 KB
/
2021_data_Kelvin.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
title: "analysis"
author: "Kelvin Feng"
date: "2022-11-12"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
```
```{r}
# Reading in dataset
dat <- haven::read_xpt("C:/Users/kelvi/Downloads/LLCP2021XPT/LLCP2021.XPT")
```
```{r}
survey1 <- dat %>% select(`_MICHD`, ADDEPEV3, MENTHLTH, `_RFHLTH`,
PHYSHLTH,`_HCVU652`, MEDCOST1, `_RFHYPE6`,
`_RFCHOL3`, CVDSTRK3, DIABETE4, `_SEX`,
`_EDUCAG`, INCOME3, `_BMI5`, `_SMOKER3`,
AVEDRNK3, `_TOTINDA`, DRADVISE,
`_AGE_G`, `_FRTLT1A`, `_VEGLT1A`) %>%
rename(Gen_Health = `_RFHLTH`, Phys_Health = PHYSHLTH,
Ment_Health = MENTHLTH, Med_Cost = MEDCOST1,
Stroke = CVDSTRK3, Diabetes = DIABETE4,
Sex = `_SEX`, Income = INCOME3,
Avg_Drinks = AVEDRNK3, Sodium_Intake = DRADVISE,
Health_Cov = `_HCVU652`, BP = `_RFHYPE6`,
Chol = `_RFCHOL3`, Heart_Disease = `_MICHD`,
Age_Groups = `_AGE_G`, BMI = `_BMI5`,
Education = `_EDUCAG`, Smoking = `_SMOKER3`,
Fruit = `_FRTLT1A`, Veggies = `_VEGLT1A`,
Exercise = `_TOTINDA`, Depression = ADDEPEV3) %>%
mutate(Heart_Disease = recode(Heart_Disease, '2' = 0),
Depression = recode(Depression, '2' = 0, '7' = as.numeric(NA), '9' = as.numeric(NA)),
Ment_Health = recode(Ment_Health, '88' = 0, '77' = as.numeric(NA), '99' = as.numeric(NA)),
Gen_Health = recode(Gen_Health, '2' = 0, '3' = as.numeric(NA)),
Phys_Health = recode(Phys_Health, '88' = 0, '77' = as.numeric(NA), '99' = as.numeric(NA)),
Health_Cov = recode(Health_Cov, '2' = 0, '9' = as.numeric(NA)),
Med_Cost = recode(Med_Cost, '2' = 0,
'7' = as.numeric(NA), '9' = as.numeric(NA)),
BP = recode(BP, '1' = 0, '2' = 1, '9' = as.numeric(NA)),
Chol = recode(Chol, '1' = 0, '2' = 1, '9' = as.numeric(NA)),
Stroke = recode(Stroke, '1' = 1, '2' = 0,
'7' = as.numeric(NA), '9' = as.numeric(NA)),
Diabetes = recode(Diabetes, '2' = 1, '3' = 0, '4' = 2,
'1' = 1, '7' = as.numeric(NA),
'9' = as.numeric(NA)),
Education = recode(Education, '9' = as.numeric(NA)),
Income = recode(Income, '77' = as.numeric(NA), '99' = as.numeric(NA)),
BMI = BMI/100,
Smoking = recode(Smoking, '4' = 1, '3' = 2, '2' = 3, '1' = 4, '9' = as.numeric(NA)),
Avg_Drinks = recode(Avg_Drinks,'88' = 0, '77' = as.numeric(NA), '99' = as.numeric(NA)),
Exercise = recode(Exercise, '2' = 0, '9' = as.numeric(NA)),
Sodium_Intake = recode(Sodium_Intake, '2' = 0, '7' = as.numeric(NA), '9' = as.numeric(NA)),
Fruit = recode(Fruit, '2' = 0, '9' = as.numeric(NA)),
Veggies = recode(Veggies, '2' = 0, '9' = as.numeric(NA)))
```
# Variable descriptions
Heart_Disease: Respondents that have ever reported having coronary heart disease (CHD) or myocardial infarction (MI)
0 - did not report
1 - reported
NA - not asked or missing
Depression: (Ever told) you that you have a depressive disorder, including depression, major depression, dysthymia, or minor depression?
0 - no
1 - yes
NA - don't know, refused
Ment_Health: Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good?
0:30 - # of days
NA - don't know, refused
Gen_Health: Adults with good or better general health
0 - fair or poor health
1 - good or better health
NA - don't know, not sure or refused/missing
Phys_Health: Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30 days was your physical health not good?
0:30 - # days
NA - don't know, refused, missing
Health_Cov: Respondents aged 18-64 who have any form of health care coverage
0 - do not have
1 - have
NA - don't know, refused, missing
Med_Cost: Was there a time in the past 12 months when you needed to see a doctor but could not because of cost?
0 - no
1 - yes
NA - don't know, refused, missing
BP: Adults who have been told they have high blood pressure by a doctor, nurse, or other health professional
0 - no
1 - yes
NA - don't know, refused, missing
Chol: Adults who have had their cholesterol checked and have been told by a doctor, nurse, or other health professional that it was high
0 - no
1 - yes
NA - don't know, refused, missing
Stroke: (Ever told) you had a stroke.
0 - no
1 - yes
NA - don't know, refused
Diabetes: (Ever told) you have diabetes (If "Yes" and respondent is female, ask "Was this only when you were pregnant?". If Respondent says pre-diabetes or borderline diabetes, use response code 4.)
0 - no
1 - yes
2 - pre-diabetes or borderline
NA - don't know, refused, missing
Sex: Indicate sex of respondent.
1 - male
2 - female
Education: Level of education completed
1 - did not graduate highschool
2 - graduated highschool
3 - attended college or technical school
4 - graduated from college or technical school
NA - don't know, refused, missing
Income: Is your annual household income from all sources: (If respondent refuses at any income level, code "Refused.")
1 - <$10,000
2 - $10,000:$14,999
3 - $15,000:$19,999
4 - $20,000:$24,999
5 - $25,000:$34,999
6 - $35,000:$49,999
7 - $50,000:$74,999
8 - $75,000:$99,999
9 - $100,000:$149,999
10 - $150,000:$199,000
11 - $200,000+
NA - don't know, refused, missing
BMI: Body Mass Index (BMI)
1:99.99 - bmi
NA - don't know, refused, missing
Smoking: Four-level smoker status: Everyday smoker, Someday smoker, Former smoker, Non-smoker
1 - never smoked
2 - former smoker
3 - now smokes some days
4 - now smokes every day
NA - don't know, refused, missing
Avg_Drinks: One drink is equivalent to a 12-ounce beer, a 5-ounce glass of wine, or a drink with one shot of liquor. During the past 30 days, on the days when you drank, about how many drinks did you drink on the average? (A 40 ounce beer would count as 3 drinks, or a cocktail drink with 2 shots would count as 2 drinks.)
1:76 - # of drinks
NA - don't know, refused, missing
Exercise: Adults who reported doing physical activity or exercise during the past 30 days other than their regular job
0 - did not have physical activity or exercise
1 - had physical activity or exercise
NA - don't know, refused, missing
Sodium_Intake: Has a doctor or other health professional ever advised you to reduce sodium or salt intake?
0 - no
1 - yes
NA - don't know, refused, missing
Age_65: Age categories
1 - age 18:24
2 - age 25:34
3 - age 35:44
4 - age 45:54
5 - age 55:64
6 - age 65+
NA - don't know, refused, missing
Fruit: Consume Fruit 1 or more times per day
0 - less than one time per day
1 - one or more times per day
NA - don't know, refused, missing
Veggies: Consume Vegetables 1 or more times per day
0 - less than one time per day
1 - one or more times per day
NA - don't know, refused, missing
```{r}
# Checking for duplicates
dat$SEQNO <- as.factor(dat$SEQNO)
duplicates <- dat %>% group_by(SEQNO, `_STATE`) %>%
summarize(count = n()) %>% filter(count > 1)
dat2 <- dat %>% select(-c(`_STATE`:`_PSU`))
table(sapply(dat2, class))
missingness = as.data.frame(round(colSums(is.na(dat2))/length(dat2$`_MICHD`), 3))
colnames(missingness) <- "Value"
remove <- missingness %>% filter(Value > 0.8) %>% rownames()
dat3 <- dat2 %>% select(-all_of(remove))
```