-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More question on the proximity variable #2
Comments
After creating
With these three pieces of information you could join |
@bensoltoff Divvy_m <- cbind(DivvyStation$lon, DivvyStation$lat)
CTA_m <- cbind(CTAStops$stop_lon, CTAStops$stop_lat)
distance_m <- distm(Divvy_m, CTA_m, fun = distHaversine) # a 535 by 11520 matrix for distance
distance150 <- distance_m <= 150 # check if the distance is <=150 meters or approximately 0.1 mile
proximity150 <- (rowSums(distance150) > 0)*1 # a Divvy station is <=150m from any CTA stop, 1; otherwise, 0
DivvyStation$proximity <- proximity150
index150 <- which(distance_m <= 150, arr.ind = T) # matrix of indices where the distance is <= 150
for (i in range(1, ncol(index150))){ # switching the index number to id numbers
Divvyindex <- index150[i,1]
CTAindex <- index150[i,2]
DivvyId <- DivvyStation$id[Divvyindex]
CTAId <- CTAStops$stop_id[CTAindex]
index150[i,1] <- DivvyId
index150[i,2] <- CTAId
}
colnames(index150) <- c('id', 'stop_id') # matching the column names to those in `DivvyStation` and `CTAStops`
index150 <- index150 %>% as_data_frame()
test <- left_join(DivvyStation, index150) It seems to work to some extent, although I am seeing mismatches: where the > print(test, n = 30)
# A tibble: 1,990 × 8
id name lat lon dpcapacity online_date proximity stop_id
<int> <chr> <dbl> <dbl> <int> <chr> <dbl> <int>
1 456 2112 W Peterson Ave 41.99118 -87.68359 15 5/12/2015 1 748
2 456 2112 W Peterson Ave 41.99118 -87.68359 15 5/12/2015 1 811
3 101 63rd St Beach 41.78102 -87.57612 23 4/20/2015 0 1318
4 101 63rd St Beach 41.78102 -87.57612 23 4/20/2015 0 1319
5 101 63rd St Beach 41.78102 -87.57612 23 4/20/2015 0 1322
6 101 63rd St Beach 41.78102 -87.57612 23 4/20/2015 0 10581
7 101 63rd St Beach 41.78102 -87.57612 23 4/20/2015 0 11110
8 101 63rd St Beach 41.78102 -87.57612 23 4/20/2015 0 11111
9 101 63rd St Beach 41.78102 -87.57612 23 4/20/2015 0 11337
10 101 63rd St Beach 41.78102 -87.57612 23 4/20/2015 0 11338
11 101 63rd St Beach 41.78102 -87.57612 23 4/20/2015 0 11392
12 101 63rd St Beach 41.78102 -87.57612 23 4/20/2015 0 11502
13 109 900 W Harrison St 41.87468 -87.65002 19 8/6/2013 1 1283
14 109 900 W Harrison St 41.87468 -87.65002 19 8/6/2013 1 9937
15 21 Aberdeen St & Jackson Blvd 41.87773 -87.65479 15 6/21/2013 1 384
16 21 Aberdeen St & Jackson Blvd 41.87773 -87.65479 15 6/21/2013 1 419
17 21 Aberdeen St & Jackson Blvd 41.87773 -87.65479 15 6/21/2013 1 9710
18 21 Aberdeen St & Jackson Blvd 41.87773 -87.65479 15 6/21/2013 1 10084
19 21 Aberdeen St & Jackson Blvd 41.87773 -87.65479 15 6/21/2013 1 10085
20 80 Aberdeen St & Monroe St 41.88046 -87.65393 19 6/26/2013 1 116
21 80 Aberdeen St & Monroe St 41.88046 -87.65393 19 6/26/2013 1 200
22 80 Aberdeen St & Monroe St 41.88046 -87.65393 19 6/26/2013 1 7876
23 346 Ada St & Washington Blvd 41.88283 -87.66121 15 10/10/2013 0 11370
24 346 Ada St & Washington Blvd 41.88283 -87.66121 15 10/10/2013 0 11371
25 346 Ada St & Washington Blvd 41.88283 -87.66121 15 10/10/2013 0 11516
26 341 Adler Planetarium 41.86610 -87.60727 19 10/9/2013 1 3979
27 341 Adler Planetarium 41.86610 -87.60727 19 10/9/2013 1 3980
28 341 Adler Planetarium 41.86610 -87.60727 19 10/9/2013 1 4033
29 444 Albany Ave & 26th St 41.84448 -87.70201 19 4/29/2015 1 1649
30 444 Albany Ave & 26th St 41.84448 -87.70201 19 4/29/2015 1 1680
# ... with 1,960 more rows I am not exactly sure where it went wrong. I will keep working on this. Thank you very much! |
I don't think you're properly isolating the proximate stop. I don't have an API token to reproduce this. Can you push the data and your code as part of this repo? If so I can clone it and test some code. |
@bensoltoff Dr. Soltoff. I have added the data and script for this issue. Please take a look at the fp-issue02.R script for reproducing what I have provided above in this issue thread. If I managed it correctly, you should not need any other script or data to make fp-issue02 to run. The code for creating an object DivvyStation_prox <- DivvyStation %>% filter(proximity == 1) Thank you very much for your help. I will be looking forward to your comment. |
##---------------------------------------------------------------##
## This script reads Divvy station and CTA stop data into R, in ##
## order to creat a tidy data frame for Divvy Station data frame ##
## with spatial variable. This is a mini script for the Issue 02 ##
##---------------------------------------------------------------##
library(tidyverse)
library(feather)
library(geosphere)
# Read the CTA stop file
CTAStops <- read_csv("data/stops.txt")
# Read the Divvy station file
DivvyStation <- read_csv("data/Divvy_Trips_2016_Q1Q2/Divvy_Stations_2016_Q1Q2.csv")
colnames(DivvyStation) <- c("id", "name", "lat", "lon", "dpcapacity", "online_date")
# adding a proximity variable to station data
Divvy_m <- cbind(DivvyStation$lon, DivvyStation$lat)
CTA_m <- cbind(CTAStops$stop_lon, CTAStops$stop_lat)
distance_m <- distm(Divvy_m, CTA_m, fun = distHaversine) # a 535 by 11520 matrix for distance
# find which CTA stations are the closest - note that for some Divvy stations,
# there may be multiple CTA stations the same minimum distance away
closest <- apply(distance_m, 1, function(x) which(x == min(x))) %>%
# convert to data frame
map_df(as_data_frame, .id = "id") %>%
rename(Divvyrow = id,
CTArow = value) %>%
# obtain distance from distance_m
mutate(Divvyrow = as.numeric(Divvyrow),
distance = map2_dbl(Divvyrow, CTArow, function(x, y) distance_m[x,y]),
close = if_else(distance <= 150, TRUE, FALSE))
# join DivvyStation and closest
## need to create a new DivvyStation column using row_number()
## to enable the join
DivvyCTA <- DivvyStation %>%
mutate(Divvyrow = row_number()) %>%
left_join(closest) %>%
# now join with CTAStops - also create row_number() columns
left_join(CTAStops %>%
mutate(CTArow = row_number()))
str(DivvyCTA)
Note the use of |
@bensoltoff Thank you for pointing out the mismatch between the original ids and the row/column ids in # adding two proximity variables to station data
Divvy_m <- cbind(DivvyStation$lon, DivvyStation$lat)
CTA_m <- cbind(CTAStops$stop_lon, CTAStops$stop_lat)
distance_m <- distm(Divvy_m, CTA_m, fun = distHaversine) # a 535 by 11520 matrix for distance
distance150 <- distance_m <= 150 # check if the distance is <=150 meters or approximately 0.1 mile
proximity150 <- (rowSums(distance150) > 0)*1 # binary; a Divvy station is <=150m from any CTA stop, 1; otherwise, 0
DivvyStation$proximity <- proximity150
proximity150_2 <- rowSums(distance150) # non-binary; number of close CTA stops
DivvyStation$prox_num <- proximity150_2
# create a data frame linking Divvy stataions and CTA stops close to each other
index150 <- which(distance150 == TRUE, arr.ind = T) # matrix of indices where the distance is <= 150
index150_df <- as_data_frame(index150)
index150_df <- index150_df[order(index150_df$row, index150_df$col),]
colnames(index150_df) <- c('Divvyindex', 'CTAindex')
DivvyStation$Divvyindex <- sequence(nrow(DivvyStation))
CTAStops$CTAindex <- sequence(nrow(CTAStops))
CTAindex <- CTAStops %>% select(stop_id, stop_name, CTAindex)
DivvyCTAProx <- DivvyStation %>%
left_join(index150_df) %>%
left_join(CTAindex) %>%
select(id, name, proximity, prox_num, stop_id, stop_name) I added above the column for the number of close CTA stops ( > print(DivvyCTAProx, n = 20)
# A tibble: 2,094 × 6
id name proximity prox_num stop_id stop_name
<int> <chr> <dbl> <dbl> <int> <chr>
1 456 2112 W Peterson Ave 1 4 11457 Rosehill Cemetery
2 456 2112 W Peterson Ave 1 4 11458 Peterson & Target (2100 W)
3 456 2112 W Peterson Ave 1 4 11483 2100 W Peterson
4 456 2112 W Peterson Ave 1 4 11484 Rosehill Cemetery
5 101 63rd St Beach 0 0 NA <NA>
6 109 900 W Harrison St 1 6 201 900 W Harrison
7 109 900 W Harrison St 1 6 207 900 W Harrison
8 109 900 W Harrison St 1 6 208 Harrison & Morgan
9 109 900 W Harrison St 1 6 30068 UIC-Halsted
10 109 900 W Harrison St 1 6 30069 UIC-Halsted
11 109 900 W Harrison St 1 6 40350 UIC-Halsted
12 21 Aberdeen St & Jackson Blvd 1 2 60 Jackson & Aberdeen
13 21 Aberdeen St & Jackson Blvd 1 2 92 Van Buren & Aberdeen
14 80 Aberdeen St & Monroe St 1 1 460 Madison & Aberdeen
15 346 Ada St & Washington Blvd 0 0 NA <NA>
16 341 Adler Planetarium 1 1 4877 Solidarity Dr & Planetarium
17 444 Albany Ave & 26th St 1 3 6317 26th Street & Albany
18 444 Albany Ave & 26th St 1 3 6318 26th Street & Sacramento
19 444 Albany Ave & 26th St 1 3 6397 26th Street & Albany
20 511 Albany Ave & Bloomingdale Ave 0 0 NA <NA>
# ... with 2,074 more rows Again, thank you very much for your help! I couldn't have found a satisfactory solution without your comments. |
Working on my final project over this weekend, I realized that I actually needed more information than simply adding a dummy variable for proximity, which I asked about in the previous issue post.
More specifically, for those stations in proximity with any CTA stop (i.e.,
proximity == 1
from the last issue post), I need to find out with which CTA stop(s) each Divvy Station is in proximity.I think I may start with getting the data for Divvy stations with
proximity == 1
, usingfilter()
and calling itDivvyStation_prox
. This looks like the following:And, of course, I have the
CTAStops
data:How can I proceed from here?
The text was updated successfully, but these errors were encountered: