-
Notifications
You must be signed in to change notification settings - Fork 0
/
Project2.rmd
194 lines (168 loc) · 5.45 KB
/
Project2.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
title: "Ptrojct 2"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Install library
I am using DBI library for database access and tidyverse library for data manipulation.
```{r}
library(DBI)
library(tidyverse)
```
## Connect to SQL Server
I will create ODBC connection object
```{r}
con <- dbConnect(odbc::odbc(),
.connection_string = "Driver={SQL Server};Server=(local);Database=tb;Trusted_Connection=Yes;")
```
## Weight Data for Domestic Cats
144 adult (over 2kg in weight) cats used for experiments with the drug digitalis had their heart and body weight recorded. 47 of the cats were female and 97 were male.
https://stat.ethz.ch/R-manual/R-devel/library/boot/html/catsM.html
This data frame contains the following columns:
Sex - A factor for the sex of the cat (levels are F and M: all cases are M in this subset).
Bwt - Body weight in kg.
Hwt - Heart weight in g.
Source
The data were obtained from Fisher, R.A. (1947) The analysis of covariance method for the relation between a part and the whole. Biometrics, 3, 65–68.
## The leading causes of death by sex and ethnicity in New York City
Cause of death is derived from the NYC death certificate which is issued for every death that occurs in New York City.
https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam
###Columns in this Dataset:
Year - The year of death.
Leading Cause - The cause of death.
Sex - The decedent's sex.
Race Ethnicity - The decedent's ethnicity.
Deaths - The number of people who died due to cause of death.
Death Rate - The death rate within the sex and Race/ethnicity category
Age Adjusted Death Rate - The age-adjusted death rate within the sex and Race/ethnicity category
## NYC Civil Service Titles
List of Civil Service title codes and title descriptions used by City of New York agencies
https://data.cityofnewyork.us/City-Government/NYC-Civil-Service-Titles/nzjr-3966
###Columns in this Dataset:
Title Code - Civil Service Title Code
Title Description - Name/Description of title
Standard Hours - Standard weekly hours for the title
Assignment Level - Assignment level within title
Union Code - Unique code for union representing title, if applicable
Union Description - Name/Description of union representing title, if applicable
Bargaining Unit Short Name - Acronym or short name for union designated as the bargaining unit for title, if applicable
Bargaining Unit Description - Name/description of union designated as the bargaining unit for title, if applicable
Minimum Salary Rate - Minimum salary range for the title
Maximum Salary Rate - Maximum salary range for the title
### Create tables for datasets:
```{r}
rs = dbSendStatement(con, "
--DROP TABLE IF EXISTS dbo.cats;
--CREATE TABLE dbo.cats(N VARCHAR(3), Sex VARCHAR(3), Bwt DECIMAL(3,1), Hwt DECIMAL(4,2));
--DROP TABLE IF EXISTS dbo.CauseOfDeath;
--CREATE TABLE dbo.CauseOfDeath (
--[Year] INT NOT NULL,
--LeadingCause VARCHAR(130) NOT NULL,
--Sex VARCHAR(6),
--RaceEthnicity VARCHAR(40),
--Deaths VARCHAR(40),
--DeathRate VARCHAR(40),
--AgeAdjustedDeathRate VARCHAR(40));
--DROP TABLE IF EXISTS dbo.ServiceTitles;
--CREATE TABLE dbo.ServiceTitles (
--TitleCode CHAR(5) NOT NULL,
--Title_Description VARCHAR(70) NOT NULL,
--StandardHours FLOAT NOT NULL,
--Assignment_Level VARCHAR(50),
--UnionCode VARCHAR(20),
--Union_Description VARCHAR(70),
--BargainingUnitShortName VARCHAR(5),
--BargainingUnitDescription VARCHAR(70),
--MinimumSalaryRate VARCHAR(20),
--MaximumSalaryRate VARCHAR(20));
"
)
dbClearResult(rs)
```
### Load data into tables
```{r}
rs = dbSendStatement(con, "
-- BULK INSERT dbo.cats FROM 'cats.csv'
-- BULK INSERT dbo.ServiceTitles FROM 'NYC_Civil_Service_Titles.csv'
-- BULK INSERT dbo.CauseOfDeath FROM 'New_York_City_Leading_Causes_of_Death.csv' WITH (FORMAT = 'CSV');
")
dbClearResult(rs)
```
## Query database into R
```{r}
rs <- dbSendQuery(con, "
SELECT N
,Sex
,Bwt
,Hwt
FROM dbo.cats
")
df_cat = data.frame(dbFetch(rs))
dbClearResult(rs)
rs <- dbSendQuery(con, "
SELECT Year
,LeadingCause
,Sex
,RaceEthnicity
,Deaths
,DeathRate
,AgeAdjustedDeathRate
FROM dbo.CauseOfDeath2
")
df_c_of_d = data.frame(dbFetch(rs))
dbClearResult(rs)
rs <- dbSendQuery(con, "
SELECT TitleCode
,Title_Description
,StandardHours
,Assignment_Level
,UnionCode
,Union_Description
,BargainingUnitShortName
,BargainingUnitDescription
,MinimumSalaryRate
,MaximumSalaryRate
FROM dbo.ServiceTitles
")
df_tc = data.frame(dbFetch(rs))
dbClearResult(rs)
```
## ## Convert R dataframe to tibble
```{r}
cats = tibble(df_cat)
CauseOfDeath = tibble(df_c_of_d)
ServiceTitles = tibble(df_tc)
```
## Preview results
### Cats
```{r}
head(cats)
```
### Cause Of Death
```{r}
head(CauseOfDeath)
```
### ServiceTitles
```{r}
head(ServiceTitles)
```
### Compute cases per year
```{r}
CauseOfDeath$Year <- as.integer(CauseOfDeath$Year)
CauseOfDeath$Deaths <- as.integer(CauseOfDeath$Deaths)
CauseOfDeath %>% count(Year, wt = Deaths)
```
## Cats pivot
```{r}
cats %>% pivot_wider(names_from = Sex, values_from = c(Bwt,Hwt))
```
## Unite Minimum Salary Rate and Maximum Salary Rate
```{r}
ServiceTitles %>% unite(SalaryRate, MinimumSalaryRate, MaximumSalaryRate)
```
## Disconnect from database
```{r}
dbDisconnect(con)
```