This Python project automatically fetches data for airline carriers and airports from the IATA (International Air Transport Association) publication pages. It generates two-letter carrier codes or three-letter airport codes and retrieves information published on the IATA website corresponding to these codes. The data fetched includes various attributes of the carriers and airports, which are then saved into a .jsonl
file for further analysis or use.
- Generates all possible combinations of two-letter and three-letter codes.
- Fetches data using the generated codes from specific IATA publication URLs.
- Parses HTML responses to extract relevant table data.
- Saves the extracted data in a JSON Lines format for easy consumption by downstream systems.
This project is managed with Poetry to handle dependencies and environments. You will need:
- Python 3.8 or higher
- Poetry for dependency management
-
Install Poetry: Install Poetry by following the instructions on the official Poetry website.
-
Clone the Repository: Clone this repository to your local machine.
-
Install Dependencies: Navigate to the project directory and run the following command to install the necessary dependencies:
poetry install
To start data extraction, use the following commands:
cd iata_code_fetcher # Navigate to the project directory
poetry run python iata_code_fetcher/fetcher.py
The script will begin processing and will save the data to .jsonl
files named carrier_data_full.jsonl
and airport_data_full.jsonl
for carrier and airport data, respectively.
The script includes rate limiting to prevent excessive requests to the IATA server, with a 1-second sleep interval between requests.
Output files will be generated in the iata_code_fetcher
directory:
carrier_data_full.jsonl
: Contains data fetched for airline carriers.airport_data_full.jsonl
: Contains data fetched for airports.
Each line in the .jsonl
file represents a JSON object containing data for one carrier or airport.
Note that these files may contain duplicate entries. To remove duplicates, you can use the following command:
poetry run python iata_code_fetcher/process.py air airport_data_full.jsonl
poetry run python iata_code_fetcher/process.py carrier carrier_data_full.jsonl
- Make sure to comply with IATA's terms of service regarding the use of data fetched from their site.
- The codes marked with an asterisk * refer to “Controlled Duplicate” where two carriers have the same code but operate different types of non-overlapping services. Example: "BB*"
- Check also List of IATA-indexed railway stations.