Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fare metrics from NTD using the new transit class #8

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,10 @@ package-lock.json
setup/
miniconda.sh
test_js*
shapefile
*.png
*.pdf
data/
*chloropleth_map.html
*.gif
denver/
15,211 changes: 15,211 additions & 0 deletions deprecated/headway.ipynb

Large diffs are not rendered by default.

16,125 changes: 13,616 additions & 2,509 deletions scripts/ntd.ipynb

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions scripts/otp/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
town := denver


.PHONY: otp


otp:
# create directory for data and config
-mkdir $(town)
cp build-config.jsonc $(town)/build-config.json
# download OSM
if [ ! -f $(town)/osm.pbf ]; then \
curl -C - -L -k https://download.geofabrik.de/north-america/us/colorado-latest.osm.pbf -o $(town)/osm.pbf; \
fi
#
# past logic for manual GTFS downloading.
# this is superceded by the scrape ipynb
#
# download GTFS
# this is the latest gtfs file
#curl -L -k https://www.rtd-denver.com/files/gtfs/google_transit.zip -o $(town)/denver-rtd-gtfs.zip
# 2022
#curl -C - -L -k -H "Accept: application/zip" https://web.archive.org/web/20220308061530/https://www.rtd-denver.com/files/gtfs/google_transit.zip -o $(town)/denver-2022-rtd-gtfs.zip
#curl -L -k https://web.archive.org/web/20220308061530if_/https://www.rtd-denver.com/files/gtfs/google_transit.zip -o $(town)/denver-2022-rtd-gtfs.zip
#
#
#
# build graph and save it onto the host system via the volume
@echo "Go to Docker Desktop and change resource to maximum memory and swap."
docker run --rm \
-e JAVA_TOOL_OPTIONS='-Xmx32g' \
-v ./$(town):/var/opentripplanner docker.io/opentripplanner/opentripplanner:latest --build --save
# load and serve graph
docker run -it --rm -p 9999:8080 \
-e JAVA_TOOL_OPTIONS='-Xmx32g' \
-v ./$(town):/var/opentripplanner \
docker.io/opentripplanner/opentripplanner:latest --load --serve

manual:
java -Xmx2G -jar otp-2.6.0-shaded.jar --build --serve /home/username/otp
# osmium extract --bbox=-123.043,45.246,-122.276,45.652 -o portland.pbf oregon-latest.osm.pbf
116 changes: 116 additions & 0 deletions scripts/otp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Time and Cost Solution

launch docker container according to the readme in
https://github.com/e-mission/em-public-dashboard

Here is an overview of the objective.

![Overview diagram](costtime.svg)

## How can I achieve choice model analysis with historical data?

The most crucial step in the process is finding and retrieving the
data. Quality data that is complete (spanning many years)
and accurate will allow for a more informed model.

## Time

We want to know the time it takes for an individual to use a particular mode
to get to a destination

### Transit

Transit is nuanced since there are many modes such as bus, rail, and tram.

#### Data Sources

GTFS (transit data standard) sources are varied, with some requiring monetary payment

| **Website Name** | **Dates** | **URL** | **Paid** | **Notes** |
|---------------------------------------------|---------------------|-------------------------------------------------------------------------|-----------------------------|----------------------------|
| Transitland | 2016-Present | [https://www.transit.land/feeds/f-9xj-rtd#versions](https://www.transit.land/feeds/f-9xj-rtd#versions) | Yes Unless Edu Academic Plan | Does not have historical data without paid plan |
| transitfeeds AKA OpenMobilityData | 2015-Dec 2023 | [https://transitfeeds.com/p/rtd-denver/188?p=26](https://transitfeeds.com/p/rtd-denver/188?p=26) | No | API Deprecated |
| Scrape of transitfeeds AKA OpenMobilityData | Up to March 2021 | [https://raw.githubusercontent.com/interline-io/scrape-of-transitfeeds/refs/heads/master/json-scrape/feedVersionS3Urls.csv](https://raw.githubusercontent.com/interline-io/scrape-of-transitfeeds/refs/heads/master/json-scrape/feedVersionS3Urls.csv) | No | Does not have recent data |
| Mobility Database | Feb 2024-Present | [https://mobilitydatabase.org/feeds/mdb-178](https://mobilitydatabase.org/feeds/mdb-178) | No | Does not have historical data |
| Wayback Machine | Sporadic | [http://web.archive.org/web/20240000000000*/http://go-rts.com/gtfs/google_transit.zip](http://web.archive.org/web/20240000000000*/http://go-rts.com/gtfs/google_transit.zip) | No | Can be used to fill in gaps between the other sources

The CanBikeCO dataset does not go into 2024.

Thus, the most easy option was to scrape transitfeeds. This was done previously by someone else,
but that scrape was too old. Therefore, `scrape.ipynb` allows for an automated retrieval
of all GTFS data, for example, RTD Denver.

With this `scrape.ipynb` notebook, we are able to receive the astonishingly vast GTFS historical data from 2015 to the end of 2023. Datasets sometimes were released multiple times on a single
day, so it seems that every single change that the agencies made were recorded by transitfeeds.
Small changes such as the relocation of a singular stop would still reissue all of the data that remained unchanged, introducing redundancies. A form of version control, if implemented as a standard, would stop this.

#### GTFS Combination

It is not needed to have GTFS for every single day; it is unrealistic, as well, for computation
analysis to churn such significant amounts of data. We used only 1 GTFS file for every quarter
of the year. To distill this dataset, use the Jupyter cell marked `# Distill GTFS` in the
`scrape.ipynb`.

Then after distilling, download from
https://github.com/OneBusAway/onebusaway-gtfs-modules/blob/master/docs/onebusaway-gtfs-merge-cli.md
and change the version accordingly within the jar filename in the following command.

```bash
java -jar -Xmx16G ~/Downloads/onebusaway-gtfs-merge-cli-3.2.4.jar denver/*.zip outputCombined.zip
```

#### OpenTripPlanner Launch

Ensure that docker is installed. This is not tested on Windows, so we assume you
have `make` available to run the Makefile. OTP does not work on Windows anyways.

The Makefile downloads the geofabrik pbf file for Colorado. This can be
parametrized, as can the `scrape.ipynb` notebook into a Python class/functions.

```bash
make
```

#### Transit Time Python API

Once OTP is loaded, verify that it works within the browser by going to
localhost:9999

The routing should work with dates several years in the past e.g. 2022
but some years in the data distilling cell skipped years such as 2015.

Navigate to the Jupyter cell prefaced with `# GraphQL Interface` to have
a Python API for getting transit times.


## Cost

### Transit

It is true that GTFS provides fare information.
However, agencies tend not to provide it. But NTD has all agencies' information.

Fare data for transit is taken from NTD. The average fare for each particular agency
is calculated for any arbitrary person- take, for instance, Gainesville, Florida.
Most riders are students who attend the University of Florida, and all of their
rides are free (this is the same for any faculty and staff).

There are still individuals who pay for fare. So Gainesville's average is $0.07
according to NTD.

To calculate average fare, use the e-mission-common function:

```python
# gainesville fl coordinates for the bus (thats why mode is MB)
# longitude, then latitude.
# google maps prefers latitude, then longitude.
# otherwise you will be in antarctica
intensities = await emcmft.get_transit_intensities(2022, [-82.328132, 29.626142], modes=['MB'])

# Access and print the average_fare
average_fare = intensities[0]['average_fare']
print("Average Fare:", average_fare)
# 0.07352028470412668
# wow, seven cents! $0.07
```

Loading
Loading