-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexplore.Rmd
90 lines (71 loc) · 2.52 KB
/
explore.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
title: "SF ridehail shift start location"
author: "Greg Macfarlane"
date: "9/24/2021"
output: html_document
---
We'll do this again, using the mlogit package for R.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(mlogit)
library(modelsummary)
```
I got some data from Yufei
```{r loadata}
(d <- read_csv("data/estimation_data.zip"))
```
In a “long” dataset, there should be an ID column that identifies the choice
maker, and alternative column that identifies which options they have, and a
chosen column that identifies which alternative they chose.
```{r stats}
# number of choice makers
length(unique(d$ID))
d %>% group_by(ID) %>%
summarise(
n = n(), # number of rows per person
chosen = sum(chosen), # number of choices per person
n_alts = length(unique(TAZ))
)
```
This dataset is not well-specified. How is it possible for an individual
driver to have chosen four different starting locations? Is it because they have
multiple shifts? Which variable identifies the shift number?
Let's take a guess that the `start` column could help us out.
```{r better-id}
d %>% group_by(ID, start) %>%
summarise(
n = n(), # number of rows per person
chosen = sum(chosen), # number of choices per person
n_alts = length(unique(TAZ))
)
```
Okay, now we are cooking. There is still a problem where some agents apparently have
repeated alternatives. That's got to change; is there a compelling reason why 30 alternatives
is better than 10?
Oh well, let's move forward. We'll remove the duplicate rows and make sure we keep
the choice.
```{r idx}
idx <- d %>%
filter(ID < 10) %>%
mutate(ID = str_c(ID, start, sep = "-")) %>%
# keep only one row from the ID-TAZ pair, but make sure it's the chosen row
group_by(ID, TAZ) %>%
arrange(-chosen, .by_group = TRUE) %>%
slice(1) %>%
dfidx(idx = c("ID", "TAZ"))
```
With that cleaning done, estimating a destination choice model is straightforward.
One thing to note is that you cannot have any alternative-specific coefficients
(including the intercept). So we add a `-1` to the ASC specification.
```{r models}
mymodels <- list(
"Base" = mlogit(chosen ~ POP + EMP | -1, data = idx),
"Land Use" = mlogit(chosen ~ TOTALPARK + SFDU + MFDU | -1, data = idx),
"Both" = mlogit(chosen ~ POP + EMP + TOTALPARK + SFDU + MFDU | -1, data = idx)
)
```
```{r results}
modelsummary(mymodels, estimate = "{estimate} {stars}", statistic = "({statistic})",
notes = "t-statistic in parentheses", fmt = "%.4f")
```