Skip to content

Commit

Permalink
Merge pull request #933 from cal-itp/sjoin-vp-nearest-stop
Browse files Browse the repository at this point in the history
Find vp nearest stop
  • Loading branch information
tiffanychu90 authored Oct 24, 2023
2 parents 566f455 + 27f354d commit adff709
Show file tree
Hide file tree
Showing 22 changed files with 1,373 additions and 400 deletions.
6 changes: 4 additions & 2 deletions gtfs_funnel/Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
download_gtfs_data_one_day:
download_gtfs_data:
# make sure to update update_vars.py for dates to download
python download_trips.py
python download_stops.py
Expand All @@ -10,4 +10,6 @@ download_gtfs_data_one_day:
preprocess:
python stop_times_with_direction.py
python vp_keep_usable.py
python vp_direction.py
python vp_direction.py
python cleanup.py

2 changes: 1 addition & 1 deletion gtfs_funnel/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ Use `update_vars` and input one or several days to download.

1. **Schedule data**: download data for [trips](./download_trips.py), [stops](./download_stops.py), [shapes](./download_shapes.py), and [stop times](./download_stop_times.py) and cache parquets in GCS
1. **Vehicle positions data**: download [RT vehicle positions](./download_vehicle_positions.py)
1. Use the `Makefile` and download schedule and RT data. In terminal: `make download_gtfs_data_one_day`
1. Use the `Makefile` and download schedule and RT data. In terminal: `make download_gtfs_data`
49 changes: 49 additions & 0 deletions gtfs_funnel/logs/find_vp_direction.log
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,52 @@
2023-10-12 11:21:52.344 | INFO | __main__:<module>:176 - export vp direction: 0:05:29.299659
2023-10-12 11:23:14.557 | INFO | __main__:<module>:186 - export usable vp with direction: 0:01:22.212409
2023-10-12 11:23:14.558 | INFO | __main__:<module>:187 - execution time: 0:06:51.512068
2023-10-19 10:58:26.750 | INFO | __main__:<module>:184 - Analysis date: 2023-09-13
2023-10-19 11:01:08.229 | INFO | __main__:attach_prior_vp_add_direction:89 - persist vp gddf: 0:02:41.478652
2023-10-19 11:04:05.857 | INFO | __main__:attach_prior_vp_add_direction:125 - np vectorize arrays for direction: 0:02:57.627657
2023-10-19 11:04:25.727 | INFO | __main__:<module>:191 - export vp direction: 0:05:58.976162
2023-10-19 11:05:26.722 | INFO | __main__:<module>:196 - export usable vp with direction: 0:01:00.995301
2023-10-19 11:05:26.723 | INFO | __main__:<module>:197 - execution time: 0:06:59.971463
2023-10-19 11:05:26.724 | INFO | __main__:<module>:184 - Analysis date: 2023-10-11
2023-10-19 11:08:10.013 | INFO | __main__:attach_prior_vp_add_direction:89 - persist vp gddf: 0:02:43.288529
2023-10-19 11:10:57.486 | INFO | __main__:attach_prior_vp_add_direction:125 - np vectorize arrays for direction: 0:02:47.473068
2023-10-19 11:11:17.743 | INFO | __main__:<module>:191 - export vp direction: 0:05:51.017843
2023-10-19 11:12:14.833 | INFO | __main__:<module>:196 - export usable vp with direction: 0:00:57.090460
2023-10-19 11:12:14.834 | INFO | __main__:<module>:197 - execution time: 0:06:48.108303
2023-10-19 11:22:48.570 | INFO | __main__:<module>:185 - Analysis date: 2023-03-15
2023-10-19 11:44:08.820 | INFO | __main__:<module>:185 - Analysis date: 2023-03-15
2023-10-19 11:46:41.490 | INFO | __main__:attach_prior_vp_add_direction:89 - persist vp gddf: 0:02:32.668923
2023-10-19 11:49:18.408 | INFO | __main__:attach_prior_vp_add_direction:126 - np vectorize arrays for direction: 0:02:36.918447
2023-10-19 11:49:36.829 | INFO | __main__:<module>:192 - export vp direction: 0:05:28.008511
2023-10-19 11:50:34.563 | INFO | __main__:<module>:197 - export usable vp with direction: 0:00:57.733907
2023-10-19 11:50:34.565 | INFO | __main__:<module>:198 - execution time: 0:06:25.742418
2023-10-19 11:50:34.566 | INFO | __main__:<module>:185 - Analysis date: 2023-04-12
2023-10-19 11:53:00.392 | INFO | __main__:attach_prior_vp_add_direction:89 - persist vp gddf: 0:02:25.825681
2023-10-19 11:55:43.433 | INFO | __main__:attach_prior_vp_add_direction:126 - np vectorize arrays for direction: 0:02:43.040656
2023-10-19 11:56:02.076 | INFO | __main__:<module>:192 - export vp direction: 0:05:27.509401
2023-10-19 11:56:58.366 | INFO | __main__:<module>:197 - export usable vp with direction: 0:00:56.290053
2023-10-19 11:56:58.368 | INFO | __main__:<module>:198 - execution time: 0:06:23.799454
2023-10-19 11:56:58.368 | INFO | __main__:<module>:185 - Analysis date: 2023-05-17
2023-10-19 11:59:23.853 | INFO | __main__:attach_prior_vp_add_direction:89 - persist vp gddf: 0:02:25.485009
2023-10-19 12:02:10.887 | INFO | __main__:attach_prior_vp_add_direction:126 - np vectorize arrays for direction: 0:02:47.034093
2023-10-19 12:02:28.048 | INFO | __main__:<module>:192 - export vp direction: 0:05:29.680081
2023-10-19 12:03:24.619 | INFO | __main__:<module>:197 - export usable vp with direction: 0:00:56.570424
2023-10-19 12:03:24.620 | INFO | __main__:<module>:198 - execution time: 0:06:26.250505
2023-10-19 12:03:24.620 | INFO | __main__:<module>:185 - Analysis date: 2023-06-14
2023-10-19 12:05:48.202 | INFO | __main__:attach_prior_vp_add_direction:89 - persist vp gddf: 0:02:23.581493
2023-10-19 12:08:28.397 | INFO | __main__:attach_prior_vp_add_direction:126 - np vectorize arrays for direction: 0:02:40.195186
2023-10-19 12:08:45.600 | INFO | __main__:<module>:192 - export vp direction: 0:05:20.979952
2023-10-19 12:09:41.253 | INFO | __main__:<module>:197 - export usable vp with direction: 0:00:55.653037
2023-10-19 12:09:41.254 | INFO | __main__:<module>:198 - execution time: 0:06:16.632989
2023-10-19 12:09:41.254 | INFO | __main__:<module>:185 - Analysis date: 2023-07-12
2023-10-19 12:12:23.972 | INFO | __main__:attach_prior_vp_add_direction:89 - persist vp gddf: 0:02:42.717672
2023-10-19 12:15:14.864 | INFO | __main__:attach_prior_vp_add_direction:126 - np vectorize arrays for direction: 0:02:50.891639
2023-10-19 12:15:32.063 | INFO | __main__:<module>:192 - export vp direction: 0:05:50.808333
2023-10-19 12:16:37.518 | INFO | __main__:<module>:197 - export usable vp with direction: 0:01:05.455225
2023-10-19 12:16:37.519 | INFO | __main__:<module>:198 - execution time: 0:06:56.263558
2023-10-19 12:16:37.519 | INFO | __main__:<module>:185 - Analysis date: 2023-08-15
2023-10-19 12:19:21.523 | INFO | __main__:attach_prior_vp_add_direction:89 - persist vp gddf: 0:02:44.003497
2023-10-19 12:22:02.828 | INFO | __main__:attach_prior_vp_add_direction:126 - np vectorize arrays for direction: 0:02:41.304747
2023-10-19 12:22:21.129 | INFO | __main__:<module>:192 - export vp direction: 0:05:43.609000
2023-10-19 12:23:18.532 | INFO | __main__:<module>:197 - export usable vp with direction: 0:00:57.403234
2023-10-19 12:23:18.533 | INFO | __main__:<module>:198 - execution time: 0:06:41.012234
16 changes: 16 additions & 0 deletions gtfs_funnel/logs/usable_rt_vp.log
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,19 @@
2023-10-13 10:38:23.748 | INFO | __main__:attach_prior_vp_add_direction:88 - persist vp gddf: 0:04:24.538800
2023-10-13 10:39:00.233 | INFO | __main__:attach_prior_vp_add_direction:114 - np vectorize arrays for direction: 0:00:36.484908
2023-10-13 10:39:07.270 | INFO | __main__:<module>:181 - export vp direction: 0:05:08.060546
2023-10-19 10:21:27.377 | INFO | __main__:<module>:161 - Analysis date: 2023-09-13
2023-10-19 10:22:59.586 | INFO | __main__:<module>:171 - pare down vp: 0:01:32.208062
2023-10-19 10:22:59.586 | INFO | __main__:<module>:161 - Analysis date: 2023-10-11
2023-10-19 10:24:21.940 | INFO | __main__:<module>:171 - pare down vp: 0:02:54.562217
2023-10-19 11:34:28.126 | INFO | __main__:<module>:161 - Analysis date: 2023-03-15
2023-10-19 11:36:28.520 | INFO | __main__:<module>:171 - pare down vp: 0:02:00.393163
2023-10-19 11:36:28.521 | INFO | __main__:<module>:161 - Analysis date: 2023-04-12
2023-10-19 11:37:58.174 | INFO | __main__:<module>:171 - pare down vp: 0:03:30.047838
2023-10-19 11:37:58.177 | INFO | __main__:<module>:161 - Analysis date: 2023-05-17
2023-10-19 11:39:35.480 | INFO | __main__:<module>:171 - pare down vp: 0:05:07.353337
2023-10-19 11:39:35.481 | INFO | __main__:<module>:161 - Analysis date: 2023-06-14
2023-10-19 11:41:06.197 | INFO | __main__:<module>:171 - pare down vp: 0:06:38.070240
2023-10-19 11:41:06.197 | INFO | __main__:<module>:161 - Analysis date: 2023-07-12
2023-10-19 11:42:34.062 | INFO | __main__:<module>:171 - pare down vp: 0:08:05.936015
2023-10-19 11:42:34.063 | INFO | __main__:<module>:161 - Analysis date: 2023-08-15
2023-10-19 11:43:55.229 | INFO | __main__:<module>:171 - pare down vp: 0:09:27.102851
21 changes: 16 additions & 5 deletions gtfs_funnel/stop_times_with_direction.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from calitp_data_analysis import utils
from shared_utils import rt_utils
from segment_speed_utils import helpers
from segment_speed_utils import helpers, wrangle_shapes
from segment_speed_utils.project_vars import RT_SCHED_GCS, PROJECT_CRS


Expand Down Expand Up @@ -139,12 +139,25 @@ def assemble_stop_times_with_direction(analysis_date: str):

prior_geom = other_stops.prior_geometry.compute()
current_geom = other_stops.geometry.compute()


# Create a column with readable direction like westbound, eastbound, etc
stop_direction = np.vectorize(
rt_utils.primary_cardinal_direction)(prior_geom, current_geom)

# Create a column with normalized direction vector
# Add this because some bus can travel in southeasterly direction,
# but it's categorized as southbound or eastbound depending
# on whether south or east value is larger.
# Keeping the normalized x/y direction allows us to distinguish a bit better later
direction_vector = wrangle_shapes.get_direction_vector(prior_geom, current_geom)
normalized_vector = wrangle_shapes.get_normalized_vector(direction_vector)

other_stops_no_geom = other_stops_no_geom.assign(
stop_primary_direction = stop_direction
stop_primary_direction = stop_direction,
# since we can't save tuples, let's assign x, y normalized direction vector
# as 2 columns
stop_dir_xnorm = normalized_vector[0],
stop_dir_ynorm = normalized_vector[1]
)

scheduled_stop_times_with_direction = pd.concat(
Expand All @@ -165,8 +178,6 @@ def assemble_stop_times_with_direction(analysis_date: str):
f"stop_times_direction_{analysis_date}"
)



end = datetime.datetime.now()
print(f"execution time: {end - start}")

Expand Down
4 changes: 3 additions & 1 deletion gtfs_funnel/update_vars.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
from pathlib import Path
from shared_utils import rt_dates

months = ["sep", "oct"]
months = [
"sep", "oct"
]

analysis_date_list = [
rt_dates.DATES[f"{m}2023"] for m in months
Expand Down
55 changes: 33 additions & 22 deletions gtfs_funnel/vp_direction.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@
from loguru import logger

from calitp_data_analysis.geography_utils import WGS84
from segment_speed_utils import helpers, segment_calcs
from segment_speed_utils import helpers, segment_calcs, wrangle_shapes
from segment_speed_utils.project_vars import SEGMENT_GCS, PROJECT_CRS
from shared_utils import rt_utils

fs = gcsfs.GCSFileSystem()
fs = gcsfs.GCSFileSystem()

def attach_prior_vp_add_direction(
analysis_date: str,
Expand Down Expand Up @@ -56,7 +56,7 @@ def attach_prior_vp_add_direction(
# calculated in projected CRS
vp_gddf = dg.from_dask_dataframe(
vp2,
geometry = dg.points_from_xy(vp2, x="x", y="y", crs=WGS84)
geometry = dg.points_from_xy(vp2, x="x", y="y")
).set_crs(WGS84).to_crs(PROJECT_CRS)

vp_ddf = vp_gddf.assign(
Expand All @@ -81,34 +81,46 @@ def attach_prior_vp_add_direction(
).query('prior_vp_idx >= min_vp_idx')[
["vp_idx", "prior_x", "prior_y", "x", "y"]
].reset_index(drop=True)

full_df = full_df.persist()

keep_cols = ["vp_idx", "prior_x", "prior_y", "x", "y"]
full_df = full_df[keep_cols].compute()

time1 = datetime.datetime.now()
logger.info(f"persist vp gddf: {time1 - time0}")

def column_into_array(df: dd.DataFrame, col: str) -> np.ndarray:
return df[col].compute().to_numpy()

vp_indices = column_into_array(full_df, "vp_idx")
prior_geom_x = column_into_array(full_df, "prior_x")
prior_geom_y = column_into_array(full_df, "prior_y")
current_geom_x = column_into_array(full_df, "x")
current_geom_y = column_into_array(full_df, "y")
vp_indices = full_df.vp_idx.to_numpy()
distance_east = full_df.x - full_df.prior_x
distance_north = full_df.y - full_df.prior_y

distance_east = current_geom_x - prior_geom_x
distance_north = current_geom_y - prior_geom_y
# Get the normalized direction vector split into x and y columns
normalized_vector = wrangle_shapes.get_normalized_vector(
(distance_east, distance_north)
)

direction_result = np.vectorize(
rt_utils.cardinal_definition_rules)(distance_east, distance_north)

# Stack our results and convert to df
results_array = np.column_stack((vp_indices, direction_result))
results_array = np.column_stack((
vp_indices,
normalized_vector[0],
normalized_vector[1]
))

vp_direction = pd.DataFrame(
results_array,
columns = ["vp_idx", "vp_primary_direction"]
).astype({"vp_idx": "int64"})
columns = ["vp_idx", "vp_dir_xnorm", "vp_dir_ynorm"]
).astype({
"vp_idx": "int64",
"vp_dir_xnorm": "float",
"vp_dir_ynorm": "float"
})

# Get a readable direction (westbound, eastbound)
vp_direction = vp_direction.assign(
vp_primary_direction = vp_direction.apply(
lambda x:
rt_utils.cardinal_definition_rules(x.vp_dir_xnorm, x.vp_dir_ynorm),
axis=1
)
)

time2 = datetime.datetime.now()
logger.info(f"np vectorize arrays for direction: {time2 - time1}")
Expand Down Expand Up @@ -168,7 +180,6 @@ def add_direction_to_usable_vp(
format="{time:YYYY-MM-DD at HH:mm:ss} | {level} | {message}",
level="INFO")


for analysis_date in analysis_date_list:

logger.info(f"Analysis date: {analysis_date}")
Expand Down
Loading

0 comments on commit adff709

Please sign in to comment.