Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fare metrics from NTD using the new transit class #8

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

jpfleischer
Copy link
Contributor

This PR is made in response to a comment about finding an accurate $/PMT value for transit. e-mission/em-public-dashboard#31 (comment)

NTD Data is taken from https://data.transportation.gov/Public-Transit/2022-NTD-Annual-Data-Metrics/ekg5-frzt/explore and contains columns that tell us Average Passenger Fare, Total Passenger Miles, and number of Passenger Trips (among others)

The notebook in this PR shows that:

Mode Category Total Fare Revenue Total Passenger Miles Cost per PMT ($/PMT) Weighted Avg Fare Revenues per Trip Non-weighted Avg Fare Revenues per Trip
Bus $2,768,242,207.96 10,111,540,221 0.2738 1.0029 0.9456
Train $4,983,602,512.69 17,094,346,380 0.2915 1.7882 3.5837
Subway $3,134,105,767.72 9,812,801,701 0.3194 1.3805 1.5633

"Weighted" means that the number of passengers that use the service affects the final fare, so the number leans more towards the average across all NTD services

However, we are assuming some relations between NTD mode and OpenPath's modes.
https://github.com/jpfleischer/e-mission-common/blob/74a5684458d2cac66f4a6bc14c3250509be1464c/src/emcommon/metrics/transit/transit.py#L30-L37

It is possible to further differentiate the $/PMT by metropolitan area. However, we are assuming that there are individuals that will not take transit because it is too expensive. In reality, individuals do not take transit for other reasons.

We must relocate the files if we do not want them compiled to JavaScript, but for now lets appraise if the code is suitable to incorporate in the analysis.

@shankari @JGreenlee @Abby-Wheelis

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Sep 16, 2024

Unfortunately, the NTD values do not mean what the passenger paid.

The average fare metric you’re looking at is actually fares per unliked passenger trip or “Fare per Trip”. This metric is based on the agencies total fares divided by their Unlinked Passenger Trips. Total fares includes both Passenger Paid fares (The prices listed) and Organization Paid fares which can result from agreements between the reporter and another agency or organization and wouldn’t appear in their fare prices.

There does not seem a way to distill the passenger paid fare from the dataset, so we turn to the GTFS feed and the fare info provided in that standard.

There are two tested providers of GTFS information: transit.land and MobilityDatabase

Transit.land and mobilitydatabase comparison

Take the Gwinnett County Transit for transit.land
https://www.transit.land/operators/o-dnh0-gwinnettcountytransit
The transit.land url is
https://realtimegwinnett.availtec.com/InfoPoint/gtfs-zip.ashx
which does not have fare.

However the mobilitydatabase returns the link:
https://files.mobilitydatabase.org/mdb-369/mdb-369-202406071617/mdb-369-202406071617.zip
Which has fare, and it is accurate https://www.gwinnettcounty.com/departments/transportation/gwinnettcountytransit/passesandtickets

Mobilitydatabase appears to be better because it has no rate limit and higher quality data

Number of agencies with fare data in their GTFS source

State Transit.land MobilityDatabase
GA 1 2
FL 10 21
MA 12 24

There are a few slight exceptions here and there: for instance transit.land accurately shows a 0 fare for Athens Clarke County https://www.accgov.com/1770/Fare-free-Transit whereas Mobilitydatabase does not contain that agency.

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Sep 17, 2024

The NTD has identified a data source that shows Passenger Paid Fare, separate from Organization Paid Fare, at https://www.transit.dot.gov/ntd/data-product/2022-annual-database-fare-revenues

If anything, we would join the two datasets- GTFS and NTD- to get route stops and timings from the former, and true average price paid from the latter. However, I will not immediately do this as I instead prioritize the coordinates-to-fare program logic.

NTD Sanity Check

Take the Passenger Paid Fare for Gainesville's MB mode (bus) which is 316,285
Take the Total Number of Trips taken for Gainesville's MB mode (bus) which is 4,302,010

Divide 316,285 by 4,302,010 and you get the average fare for any arbitrary rider is $0.07. Reasonable, because many people ride free!

RTD in Colorado, Denver Regional Transportation District. Directly Operated bus system. Passenger Paid Fare is $16,716,726 and its Total Number of Trips taken is 25,317,651. Notice how Fare Revenues Earned in the metrics, which we initially used, was wrong because included Organization Paid Fares.

RTD average bus ride fare: $0.66 likely because the purchased extended pass will allow for a greater value than paying the base fare everytime.

https://data.transportation.gov/Public-Transit/2022-NTD-Annual-Data-Metrics/ekg5-frzt/explore

Several Dynamics

  • to join the two NTD datasets to get the UACE from the Metrics
  • to join the NTD and MobilityDatabase

@JGreenlee
Copy link
Owner

Take the Passenger Paid Fare for Gainesville's MB mode (bus) which is 316,285
Take the Total Number of Trips taken for Gainesville's MB mode (bus) which is 4,302,010

Divide 316,285 by 4,302,010 and you get the average fare for any arbitrary rider is $0.07. Reasonable, because many people ride free!

So it sounds like you want to do:

Average Fare Per Passenger Trip = Passenger Paid Fares / Unlinked Passenger Trips

Sourcing Passenger Paid Fares from the "Fare Revenues" spreadsheet and Unlinked Passenger Trips from the "Metrics" spreadsheet

That's good news because the existing notebook https://github.com/JGreenlee/e-mission-common/blob/master/scripts/ntd.ipynb already gets Unlinked Passenger Trips from another spreadsheet, so you would only need to tack on the "Fare Revenues" spreadsheet to create your new column.

@jpfleischer
Copy link
Contributor Author

Hi @JGreenlee , this is great. I see how I can add my logic right into that same ntd.ipynb notebook.
I have already started working on a class here, so that we may separate data cleaning / retrieval from the Jupyter plotting.

Should I add on to your ntd.ipynb, or move my notebook to the scripts dir?

@shankari
Copy link

Add to ntd.ipynb. I don't want to have multiple copies of the same functionality.

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Sep 20, 2024

A bug has been identified and fixed, where the service Year 2022 has Actual Vehicles Passenger Car Miles column instead of Actual Vehicle/Passenger Car Miles. The data are slightly different now maybe due to NTD update. a526d66

As seen in this csv, the column with a forward slash / does not exist.
https://data.transportation.gov/api/views/4fir-qbim/rows.csv?date=20231102&accessType=DOWNLOAD&bom=true&format=true

What Happened?

It seems that NTD has updated their data, as we have very slight variations in the calculations, such as changes of 20-30 in the carbon metrics. I know this because I did not change any calculation logic.

@JGreenlee
Copy link
Owner

It's also very much possible that I just made a typo with "Actual Vehicle/Passenger Car Miles" and/or forgot to re-create the output files the last time the script was adjusted.
Either way thanks for finding it

JGreenlee added a commit that referenced this pull request Sep 25, 2024
Issue found thanks to #8
That PR will fix this, but in the meantime I am putting the fix on master to get a release out
@jpfleischer
Copy link
Contributor Author

I have gone ahead and done

    total_upt = 0
    total_fare = 0
    total_records = 0
    agency_mode_fueltypes = []
    for entry in intensities_data['records']:
        # skip entries that don't match the requested modes or UACE
        if (modes and entry["Mode"] not in modes) or (uace and entry["UACE Code"] != uace):
            continue
        total_records += 1
        if 'Average Fare' in entry:
            total_fare += entry['Average Fare']
...
intensities['average_fare'] = total_fare / total_records if total_records > 0 else None

Now the get_transit_intensities will give fare according to coordinates

@JGreenlee
Copy link
Owner

That is good for now. We may want to split it up later, depending on what can be extracted or generalized.
I will take a look when I have time

@jpfleischer
Copy link
Contributor Author

This PR now has logic to launch an OpenTripPlanner (OTP) instance, and to retrieve GTFS data from OpenMobilityData to give to that OTP instance.

We provide a Python interface to query the transit times from OTP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants