-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Middle-ground solution for accuracy #14
Comments
Thanks for the tag @nerik on this. I've done a very brief exploration of data availability and options (and can look into this further, wanted to jot down some additional sources to explore).
The pdf they are referring to with the collected data can be found hereand it has a lot of the conversation rates/air craft carrier info/etc that we would need for in-house calculations, but again it's in pdf form 🥲 and we'd really have to dig deeper into the licensing of this data.
|
Thanks @nerik and @kathrynberger! We take a look at the flight data sources, and additional to the source found by Kathryn, the below two sources seem to have more complete data for live and historical flight data, but it is needed to review in detail the data, the coverage, the formats.
Some additional question to if we are looking for historical or live data, could be;
In either case 2 or 3 the data team is available to work on the datasets searching, however, as Kathryn mentioned it is probably that we came across limitations to access to some of them or do not find all the information that we are looking for. |
Thanks @nerik, @kathrynberger and @karitotp for all of this! My thoughts are to first try with open (free), historical data and see if that satisfies our needs. If not, we could then consider "upgrading" to an updated / live data source, once we know that this tool is viable. Our MVP would need routing information. If it's up to date, that's better, and if it has aircraft types, that's fantastic, but the minimum is to see the route that a user would take from airport A to B.
If these two ☝️ are accurate, then it wouldn't matter for most cases if we use data from 2014 or 2023, because there would be similar/same routing for most main airport pairs. I'm sure there would be edge cases (e.g., direct seasonal flights between secondary cities that used to operate prior to 2014 but no longer do), but I'd suggest we assume that those are edge cases (<5%) unless we have evidence pointing otherwise. If all of this is wrong, we could allow the user to add their own routing information. That's probably less user friendly, but that could solve another use case where a direct flight exists but in reality, people are taking indirect (and often cheaper) options. |
@karitotp @kathrynberger @wrynearson Thank you so much for this work 🙏 (and sorry for replying eons later) The main thing with historical data is to check how it plays with Google's travel impact API, because a call to their endpoint needs an IATA carrier code + flight number. Will it return results for scheduled flights prior to 2014? @karitotp I've tried the AviationStack API. The pricing could maybe be workable if we got some funding at some point? I think there's tremendous value in creating an open source "good enough" model built on open data and/or free APIs.
I have some bandwidth this week to work on this project, but I think my time is better spent implementing the UI or parts of the UI that @LanesGood designed + writing a blogpost. @wrynearson Where do you think we are headed for the backend/data science part of this project? |
Thanks for this clarifying comment @nerik! This sounds like a good plan. I personally see the UI to be more important than adding in layover/routing info if we had to choose between the two, but if someone like @kathrynberger has capacity to look into the data analysis/science part, that'd be fantastic. We have some budget left this quarter for both, but I didn't do a good job thinking ahead to get @kathrynberger or someone else involved earlier. |
@wrynearson I'm certainly keen and should have some bandwidth to support it. What time frame are you looking for? @nerik perhaps we can connect at some point to discuss the exact requirements for this work. Thanks! 👍 |
I've jumped back into this and explored a bit further the options outlined above, frustratingly finding Some updates:
Open data:
Paid or subscription services that serve flight number information are:
I'll keep looking into this - but wanted to update findings here. cc' @nerik @wrynearson |
Nice @kathrynberger! Thanks for (double) checking this. If flight numbers are difficult, is there an approach to calculate 0-2 stop connection options between airport pairs without specific flight numbers? E.g., assuming SEA has one stop in LHR to go to DUB? I guess that would be a very large dataset, wouldn't be as accurate, and would have many assumptions in routing... |
So I've just got off a call with a representative from OAG (after I requested a sample dataset). Going through them (OAG) is overkill and too expensive, but was pointed in the direction of RAPID API (another way to access their datasets). It appears that the TimeTable Lookup API has the information (i.e., flight number) that we need (and is well documented here). @erik there is a Basic (free plan) but it has a hard limit of 100 API requests/month, after that it goes to $125 USD/mo (with a hard limit of 5,000/mo). How does this stand against your investigation with AviationEdge? Would it be worth pursuing? |
This is a great analysis! Some thoughts on the limitations of the tool: if one is very transparent about limitations goes a long way. Showing that one has thought about various kind of limitations, good reasons for not overcoming them, and still having useful results helps for
So for a public tool the insights from this ticket should be very visible and easy findable. |
I just came across this Methodology Document from the ICAO planner |
@yellowcap Great find, yes that is the pdf referenced above ☝️ in the thread - it has a lot of helpful data and methodology that is quite clear. However, licensing of the actual data in that pdf (i.e., is it considered "open" data?) is another question. Additionally, it's missing key I agree with your comments above sharing limitations and deflecting criticisms before they arise. While other, larger organizations may be able to purchase/afford this data to perfect their own tooling or products - it could be that we are able to offer this as a more accessible option to a wider number of organizations using only open data, all caveats and disclaimers included. |
Looking at @kathrynberger work, it looks like there are insurmountable bottlenecks with the open data + Google travel impact API approach:
Additionally, there is a larger philosophical issue with this approach: we can't claim it's open source since the Google APIs is free as in free beer, but closed source. So here's another proposal: Plan B: historical routes data + in-house routing + in-house CO2eq estimatesI would like to explore an alternative method, along those lines:
If this works reasonably well enough, the great benefit of this strategy would be to have a fully autonomous, open-source solution that anyone can reuse independently of any third-party (proprietary) API. @kathrynberger Do you have some bandwidth/interest in looking at : and assessing whether this approach could work? I'm not entirely sure to which extent you can do that without having us implementing and benchmarking the full thing, but I'd love to have at least your general feeling on this. |
Absolute fantastic summary @nerik. I definitely have interest and would like to support this work! |
We are currently operating on a simple relationship between flight distances and CO2eq estimated emissions (based on OWID data).
This is unsatisfying because of two main reasons:
1. Estimates are not only much larger than Google Flights estimates, but more importantly the range of discrepancy is very high - see #9
2. Layovers are not taken into account. This is arguably a showstopper for a lot of flight planning scenarios. It also has the potential to highly impact the final CO2eq estimate (surprisingly, in favor of flights with more layovers)
Which is addressed by:
1. Using a more complex model for calculating GHG emissions that is not solely based on distance
See typical methodologies:
- https://www.myclimate.org/fileadmin/user_upload/myclimate_-_home/01_Information/01_About_myclimate/09_Calculation_principles/Documents/myclimate-flight-calculator-documentation_EN.pdf
- https://www.goclimate.com/blog/wp-content/uploads/2019/04/Calculations-in-GoClimateNeutral-Flight-Footprint-API.pdf
2. Getting data about flight routes (ie "What are the possible flights from airport A to B, including layovers")
With a given flight code, the Google Travel Impact API can return a CO2 eq estimate for each travel class.
From there, there are three potential scenarios IMHO:
1. Commercial routing API + Google Travel Impact API
Tried in this Observable notebook: Calculate GHG emissions from airport to airport.
Commercial routing APIs pricing is likely to make this infeasible, but that could use further research.
2. In-house solution for routing + Google Travel Impact API
Build a basic routing engine based on open source datasets. Airports is relatively easy. Flights, not so sure.
Then call Google's API which is free of charge and under CC-BY-SA.
3. In-house solution for routing + in-house CO2eq estimation algo
Develop our own estimation algorithm based on widely documented methodologies and not-so-widely available datasets.
Ping @kathrynberger @developmentseed/data-team
The text was updated successfully, but these errors were encountered: