Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Middle-ground solution for accuracy #14

Open
nerik opened this issue May 16, 2023 · 15 comments
Open

Middle-ground solution for accuracy #14

nerik opened this issue May 16, 2023 · 15 comments

Comments

@nerik
Copy link
Contributor

nerik commented May 16, 2023

We are currently operating on a simple relationship between flight distances and CO2eq estimated emissions (based on OWID data).

This is unsatisfying because of two main reasons:

1. Estimates are not only much larger than Google Flights estimates, but more importantly the range of discrepancy is very high - see #9

2. Layovers are not taken into account. This is arguably a showstopper for a lot of flight planning scenarios. It also has the potential to highly impact the final CO2eq estimate (surprisingly, in favor of flights with more layovers)


Which is addressed by:

1. Using a more complex model for calculating GHG emissions that is not solely based on distance

See typical methodologies:

- https://www.myclimate.org/fileadmin/user_upload/myclimate_-_home/01_Information/01_About_myclimate/09_Calculation_principles/Documents/myclimate-flight-calculator-documentation_EN.pdf

- https://www.goclimate.com/blog/wp-content/uploads/2019/04/Calculations-in-GoClimateNeutral-Flight-Footprint-API.pdf

2. Getting data about flight routes (ie "What are the possible flights from airport A to B, including layovers")

With a given flight code, the Google Travel Impact API can return a CO2 eq estimate for each travel class.

From there, there are three potential scenarios IMHO:

1. Commercial routing API + Google Travel Impact API

Tried in this Observable notebook: Calculate GHG emissions from airport to airport.
Commercial routing APIs pricing is likely to make this infeasible, but that could use further research.

2. In-house solution for routing + Google Travel Impact API

Build a basic routing engine based on open source datasets. Airports is relatively easy. Flights, not so sure.
Then call Google's API which is free of charge and under CC-BY-SA.

3. In-house solution for routing + in-house CO2eq estimation algo

Develop our own estimation algorithm based on widely documented methodologies and not-so-widely available datasets.

Ping @kathrynberger @developmentseed/data-team

@kathrynberger
Copy link

kathrynberger commented May 17, 2023

Thanks for the tag @nerik on this. I've done a very brief exploration of data availability and options (and can look into this further, wanted to jot down some additional sources to explore).

  1. Commercial routing
    You're right, access to these data look pricey.
  • ICAO's DATA+ may have some of the data we need, but their pricing grid looks unfeasible.
  • Cirium looks to have more extensive datasets, including API access to emissions data including (origin/destination IATA codes, aircraft data, carbon emissions, and route information) however I cannot easily find subscription costs data - one needs to contact them with an inquiry first- which raises a red flag on price for me...
  1. In-house solution for routing + Google Travel Impact API
    My personal hunch is that this might be our best bet - especially if we can find reasonable flight data.
    I've come across these resources for Open Aviation Data (you might have already spotted) which would be very useful.
  • The OpenSky Network live API might be your best bet for live flight data around the world and is free, without registration. Might be worth giving a look?
  • A follow up question might be - does the flight information have to be "live" or can we set general meeting guidelines based on historical data? If so - it might also be a reasonable data source OpenSky's historical data for our needs.
  • Both of the above data sources provide data for non-commercial use only, so we may have to follow up re: licensing.
  • Additionally, there's also the python traffic package for air traffic data processing which might be helpful.
  • Additionally, I've looked into the more complex models you linked above to also check their data sources (I really liked the simplicity and transparency of the GOClimateNeutral Flight Footprint API, thought still don't understand where their flight data has come from). In their section 1.2 Exact Calculations of Future Emissions they state
Up to date data about individual aircrafts engine type and their real world fuel
consumption across distances are not publically available for all aircrafts. The European
Environment Agency (EEA) has generated estimations of fuel consumption (1), but these
are not made for comparing individual flight models. The UN-agency International Civil
Aviation Organization (ICAO) has collected the data but have not released it other then
in pdf-format with unclear licenses for public use (2)

The pdf they are referring to with the collected data can be found hereand it has a lot of the conversation rates/air craft carrier info/etc that we would need for in-house calculations, but again it's in pdf form 🥲 and we'd really have to dig deeper into the licensing of this data.

  1. In-house solution for routing + in-house CO2eq estimation algo
    I'd really have to dig deeper into what additional datasets are available out there for development of in-house algorithms - but know that they will be harder to access, and likely incomplete - especially given the methods papers that you've linked above. Certainly happy to discuss this further and tag the @developmentseed/data-team for their thoughts as well.

@karitotp
Copy link

karitotp commented May 18, 2023

Thanks @nerik and @kathrynberger!

We take a look at the flight data sources, and additional to the source found by Kathryn, the below two sources seem to have more complete data for live and historical flight data, but it is needed to review in detail the data, the coverage, the formats.

  • AviationStack: It provides Flight Data APIs for live and historical flight data and also provides information about airlines, airports, and aircraft. To access the data, it has pricing plans depending on the number of requests and the type of data to be accessed.

  • OpenFlight: It provides historical data of airports, airlines and routes under the open database license. Most of the data are in CSV format. The data of airports is sourced from https://ourairports.com/data/ (which you already have), but this page could be useful for airlines and routes (the historical data of routes is only until June 2014).

Some additional question to if we are looking for historical or live data, could be;

  • What type of flight information is more relevant?
  • Are we interested in any particular data format?
  • In case we could use historical data? What would be the useful temporal space for the study?

In either case 2 or 3 the data team is available to work on the datasets searching, however, as Kathryn mentioned it is probably that we came across limitations to access to some of them or do not find all the information that we are looking for.

@wrynearson
Copy link
Contributor

wrynearson commented May 22, 2023

Thanks @nerik, @kathrynberger and @karitotp for all of this!

My thoughts are to first try with open (free), historical data and see if that satisfies our needs. If not, we could then consider "upgrading" to an updated / live data source, once we know that this tool is viable.

Our MVP would need routing information. If it's up to date, that's better, and if it has aircraft types, that's fantastic, but the minimum is to see the route that a user would take from airport A to B.

  1. I assume common routes between primary cities (London-DC) are constant (there will always be a direct option between London and DC).
  2. I also guess that ancillary routes are constant, feeding into the main primary city connections (with Frankfurt-DC, there's always a connection from Frankfurt to London, then there's always one from London to DC).

If these two ☝️ are accurate, then it wouldn't matter for most cases if we use data from 2014 or 2023, because there would be similar/same routing for most main airport pairs. I'm sure there would be edge cases (e.g., direct seasonal flights between secondary cities that used to operate prior to 2014 but no longer do), but I'd suggest we assume that those are edge cases (<5%) unless we have evidence pointing otherwise.

If all of this is wrong, we could allow the user to add their own routing information. That's probably less user friendly, but that could solve another use case where a direct flight exists but in reality, people are taking indirect (and often cheaper) options.

@nerik
Copy link
Contributor Author

nerik commented Jul 3, 2023

@karitotp @kathrynberger @wrynearson Thank you so much for this work 🙏 (and sorry for replying eons later)

The main thing with historical data is to check how it plays with Google's travel impact API, because a call to their endpoint needs an IATA carrier code + flight number. Will it return results for scheduled flights prior to 2014?

@karitotp I've tried the AviationStack API. The pricing could maybe be workable if we got some funding at some point?

I think there's tremendous value in creating an open source "good enough" model built on open data and/or free APIs.
Clearing the roadblock mentioned above (and also assessing potential licensing challenges), I can see a prototype within reach:

  • build basic routing logic along the lines of what @wrynearson describes, generate candidate segments/flights with layovers;
  • from these segments, infer IATA flight codes from either historical data or OpenSky live data as @kathrynberger suggests;
  • use these codes to call Google's travel impact API.
  • bundle that as a python or node package?

I have some bandwidth this week to work on this project, but I think my time is better spent implementing the UI or parts of the UI that @LanesGood designed + writing a blogpost. @wrynearson Where do you think we are headed for the backend/data science part of this project?

@wrynearson
Copy link
Contributor

Thanks for this clarifying comment @nerik! This sounds like a good plan. I personally see the UI to be more important than adding in layover/routing info if we had to choose between the two, but if someone like @kathrynberger has capacity to look into the data analysis/science part, that'd be fantastic. We have some budget left this quarter for both, but I didn't do a good job thinking ahead to get @kathrynberger or someone else involved earlier.

@kathrynberger
Copy link

@wrynearson I'm certainly keen and should have some bandwidth to support it. What time frame are you looking for? @nerik perhaps we can connect at some point to discuss the exact requirements for this work. Thanks! 👍

@kathrynberger
Copy link

kathrynberger commented Jul 4, 2023

I've jumped back into this and explored a bit further the options outlined above, frustratingly finding flight numbers is a difficult task.

Some updates:

  • It appears that OpenSky Network live API does not actually provide this information.
  • From OpenFlights I have assembled a table of carrier codes (e.g., BA) that correspond to each route, and have the equipment number but haven't yet found a look up table or historical data for this. Additionally, 'OpenSky' has not been updated as @karitotp mentions, since 2014.

Open data:

Paid or subscription services that serve flight number information are:

  • Amadeus although not entirely convinced flight numbers are available here
  • Aviation Edge
  • OAG - I've requested a sample dataset
  • FlightStats by Cirium
  • Grepsr not exactly sure of data market source, but appears to have data we are looking for at an unknown price 🤷 🤔

I'll keep looking into this - but wanted to update findings here. cc' @nerik @wrynearson

@wrynearson
Copy link
Contributor

Nice @kathrynberger! Thanks for (double) checking this. If flight numbers are difficult, is there an approach to calculate 0-2 stop connection options between airport pairs without specific flight numbers? E.g., assuming SEA has one stop in LHR to go to DUB? I guess that would be a very large dataset, wouldn't be as accurate, and would have many assumptions in routing...

@kathrynberger
Copy link

So I've just got off a call with a representative from OAG (after I requested a sample dataset). Going through them (OAG) is overkill and too expensive, but was pointed in the direction of RAPID API (another way to access their datasets). It appears that the TimeTable Lookup API has the information (i.e., flight number) that we need (and is well documented here).

@erik there is a Basic (free plan) but it has a hard limit of 100 API requests/month, after that it goes to $125 USD/mo (with a hard limit of 5,000/mo). How does this stand against your investigation with AviationEdge? Would it be worth pursuing?

@yellowcap
Copy link
Member

This is a great analysis! Some thoughts on the limitations of the tool: if one is very transparent about limitations goes a long way.

Showing that one has thought about various kind of limitations, good reasons for not overcoming them, and still having useful results helps for

  1. Ensuring people use the tool critically without assuming its going to always providing the best answer
  2. Deflecting criticisms from before they arise

So for a public tool the insights from this ticket should be very visible and easy findable.

@yellowcap
Copy link
Member

I just came across this Methodology Document from the ICAO planner

@kathrynberger
Copy link

@yellowcap Great find, yes that is the pdf referenced above ☝️ in the thread - it has a lot of helpful data and methodology that is quite clear. However, licensing of the actual data in that pdf (i.e., is it considered "open" data?) is another question. Additionally, it's missing key flight number data that we need. They (ICAO) have it, but it is not accessible to us. Getting this flight number data is extremely expensive. Like @wrynearson said we could use historical (last updated in 2014) as a next best option, but that's now almost 10 years old. Having spoken with the OAG rep on their flight schedules data - it is expensive, and even their RAPID API would prove costly if the app took off.

I agree with your comments above sharing limitations and deflecting criticisms before they arise. While other, larger organizations may be able to purchase/afford this data to perfect their own tooling or products - it could be that we are able to offer this as a more accessible option to a wider number of organizations using only open data, all caveats and disclaimers included.

@nerik
Copy link
Contributor Author

nerik commented Aug 11, 2023

Looking at @kathrynberger work, it looks like there are insurmountable bottlenecks with the open data + Google travel impact API approach:

  • No reliable/open/cheap enough way to find flight numbers that are needed by this API;
  • This API is designed for planning, which means for future flights. Unclear whether this works with historical data anyways;

Additionally, there is a larger philosophical issue with this approach: we can't claim it's open source since the Google APIs is free as in free beer, but closed source. So here's another proposal:

Plan B: historical routes data + in-house routing + in-house CO2eq estimates

I would like to explore an alternative method, along those lines:

  1. Build a graph of airports worldwide, using OpenFlight's historical route dataset ;
  2. Travelling from airport A to airport B, run a simple graph traversal algo on the graph to get a list of possible/"reasonable" routes, including layovers;
  3. Calculate the great circle flight distances for each of the routes and each of the segments;
  4. Apply weights:
    • mode: domestic/short/haul as in the existing prototype (also see definitions on Wikipedia)
    • number of take-offs/landings
    • aircraft type. (this information is present in the OpenFlights DB)
    • aircraft load factor (depends on aircraft type/mode)
    • seating class?
    • freight? (depends on aircraft type/mode)
      I believe we can start with good estimates for these weights understanding the methodology described at length in the following document: 2019 UK Government greenhouse gas conversion factors for company reporting, starting from p.73. (this was referenced in the OWID piece I used for the prototype)
  5. Get a resulting CO2eq value for each of those routes, and either average, or provide a range (which could be visible in the UI)
  6. Benchmark the results against commercial offerings, following the structure that @wrynearson set up: Accuracy Assessment #9. It should be possible to automate this step using a combination of the AviationStack + Google APIs as I did in the Observable notebook. Depending on the results, go back to step 4, fiddle with weights, and run benchmarks again (in that regard, we are kinda retro-engineering commercial algorithms)

If this works reasonably well enough, the great benefit of this strategy would be to have a fully autonomous, open-source solution that anyone can reuse independently of any third-party (proprietary) API.

@kathrynberger Do you have some bandwidth/interest in looking at :

and assessing whether this approach could work? I'm not entirely sure to which extent you can do that without having us implementing and benchmarking the full thing, but I'd love to have at least your general feeling on this.

@kathrynberger
Copy link

Absolute fantastic summary @nerik. I definitely have interest and would like to support this work!
I've read the ICAO methodology previously, so would just need to look into the UK paper and spreadsheets. I'm spread pretty thin on a variety of work as of late, but I'll try to set aside time this week to explore above and get a general feeling on the approach.

@wrynearson
Copy link
Contributor

Fantastic @nerik! I think this is a good next step. However, I would push for us to release the tool once we fix #20, and then implement this later if/when we have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants