Tools to read data from Eurostat API.
- Read Eurostat data and metadata as list of tuples or as pandas dataframe.
- Use the new SDMX 2.1 Eurostat web services.
- Download data from Eurostat, COMEXT, DG COMP, DG ENV, DG GROW.
- Available from both pip and conda.
- Optionally cache data with joblib.Memory, to avoid downloading large unchanged datasets multiple times.
- MIT license.
From version 1.0.0, this package implements MAJOR CHANGES.
The previous official Eurostat API, used by the older releases of this Python package, is supposed to be decommissioned in January 2023. It has been replaced with a new dissemination API. This forced to rewrite this Python package, almost completely.
I have done my best to make the new Eurostat Python package compatible with the old releases. Nevertheless, you may see some differences, also in the output format.
The main differences are in the SDMX functions. In particular, OBS_STATUS is no more provided by the API. It is replaced by a flag, that has different symbols and meanings. From version 1.0.0, the SDMX functions of the Eurostat Python package are deprecated, but temporarily kept available. These functions will be removed from version 2.0.0 of the Eurostat Python package. They always print an alert message, that can be "muted" by setting the argument noalert=True.
Requires Python 3.5+
pip install eurostat
eurostat.get_toc([dataset='all'], [lang='en'])
Read the table of contents and return a list of tuples. The first element of the list contains the header line. Dates are represented as strings.
lang allows to download the table of contents in one of the following languages: 'en'=English, 'fr'=French, 'de'=German, when provided by Eurostat. Default is English.
If you want to get only the metadata of one dataset, set dataset=code, e.g. dataset='MET_10R_3EMP'.
>>> import eurostat
>>> toc = eurostat.get_toc()
>>> toc[0]
('title', 'code', 'type', 'last update of data', 'last table structure change', 'data start', 'data end')
>>> toc[12:15]
[('Employment by NACE Rev. 2 activity and metropolitan typology', 'MET_10R_3EMP', 'dataset', '2022-05-06T23:00:00+0200', '2022-05-06T23:00:00+0200', '1995', '2020'),
('Gross domestic product (GDP) at current market prices by metropolitan regions', 'MET_10R_3GDP', 'dataset', '2022-04-21T23:00:00+0200', '2022-04-21T23:00:00+0200', '2000', '2020'),
('Gross value added at basic prices by metropolitan regions', 'MET_10R_3GVA', 'dataset', '2022-04-21T23:00:00+0200', '2022-04-21T23:00:00+0200', '1995', '2020')]
eurostat.get_toc_df([dataset='all'], [lang='en'])
Read the table of contents of the main database and return a dataframe.
lang allows to download the table of contents in one of the following languages: "en"=English, "fr"=French, "de"=German, when provided by Eurostat. Default is English.
If you want to get only the metadata of one dataset, set dataset=code, e.g. dataset='MET_10R_3EMP'.
>>> import eurostat
>>> toc_df = eurostat.get_toc_df()
>>> toc_df
title ... data end
0 Road equipment: number of road vehicles by cat... ... 2018
1 Road equipment: number of road vehicles by age ... 2018
2 Road equipment: load capacity of lorries ... 2015
3 Road equipment: new registrations by categories ... 2015
4 Road traffic: road freight transport in volume ... 2018
... ... ...
7225 Tropical wood imports to the EU from chapter 4... ... 2020-12
7226 Candidate countries and potential candidates: ... ... 2019
7227 Candidate countries and potential candidates: ... ... 2014
7228 Candidate countries and potential candidates: ... ... 2015
7229 Candidate countries and potential candidates: ... ... 2014
You may also want to extract the datasets that pertain a topic. In that case, you can use:
eurostat.subset_toc_df(toc_df, keyword)
Extract from toc_df the rows where the column title contains keyword (case-insensitive).
>>> f = eurostat.subset_toc_df(toc_df, 'fleet')
>>> f
title ... data end
4873 Fishing fleet, total tonnage ... 2021
4895 Fishing Fleet, Number of Vessels ... 2021
6169 Commercial aircraft fleet by age of aircraft a... ... 2020
6172 Commercial aircraft fleet by age of aircraft a... ... 2020
6175 Commercial aircraft fleet by aircraft category... ... 2020
6178 Commercial aircraft fleet by aircraft category... ... 2020
6576 Commercial aircraft fleet by type of aircraft ... 2020
7120 Fishing fleet by age, length and gross tonnage ... 2021
7121 Fishing fleet by type of gear and engine power ... 2021
eurostat.get_pars(code)
Read the parameter names that can be filtered for a given dataset code and return them as a list.
>>> import eurostat
>>> pars = eurostat.get_pars('demo_r_d2jan')
>>> pars
['freq', 'unit', 'sex', 'age', 'geo']
From the example, you can note that code is generally case-insensitive.
To get the parameter values for filtering, you can use:
eurostat.get_par_values(code, par)
Read the values of a given parameter par that can be found in a given dataset code.
>>> import eurostat
>>> par_values = eurostat.get_par_values('demo_r_d2jan', 'sex')
>>> par_values
['T', 'M', 'F']
eurostat.get_dic(code, [par=None], [full=True], [frmt="list"], [lang="en"])
Read the dictionary with the descriptions of the parameters (dimensions) of a dataset if par =None or the descriptions of the values of a given parameter par, as a dataframe, a list of tuples or as a dictionary.
If you want the full list of possible values of par, set full=True, while full=False returns only the values that are in the given dataset, output from get_par_values(). Default is full=True.
frmt="list" makes the function return a list of tuples without header. For the dictionary of the parameters, the first element of each tuple is the parameter code, the second is its name, and the third is its description (when provided). For the dictionary of the parameter values, the first element of each tuple is the code value and the second one is its description. Set frmt="df" to get a dataframe as output. If frmt="dict" it returns a dictionary. Default is frmt="list".
lang allows to download the dictionary in one of the following languages: "en"=English, "fr"=French, "de"=German, when provided by Eurostat. Default is English.
>>> import eurostat
>>> dic = eurostat.get_dic('demo_r_d2jan')
>>> dic
[('freq', 'Time frequency', 'This code list contains the periodicity that refers to the frequency.'),
('unit', 'Unit of measure', None),
('sex', 'Sex', 'This code list provides information about the state of being male or female and refers ...'),
('age', 'Age class', 'This code list contains periods of time, i.e. the length of time that a person or ...'),
('geo', 'Geopolitical entity (reporting)', 'This code list defines the reporting geopolitical entities.')]
>>> import eurostat
>>> dic = eurostat.get_dic('demo_r_d2jan', 'sex', frmt='df')
>>> dic
val descr
0 T Total
1 M Males
2 F Females
3 DIFF Absolute difference between males and females
4 NAP Not applicable
5 NRP No response
6 UNK Unknown
eurostat.get_data(code, [flags=False], [filter_pars=dict()], [verbose=False], [reverse_time=False])
Read an Eurostat dataset and returns it as a list of tuples. The first element of the list ("the first row") is the data header.
To get a subset, set filter_pars (a dictionary where keys are parameter names, values are the wanted items).
To see a rough progress status, set verbose=True.
flag=True downloads also the flags associated to the data. Pay attention: the data format changes if flags is True or not. Flag meanings can be found here.
reverse_time=True reverses the order of the time columns. For compatibility with 0.x.x versions.
>>> import eurostat
>>> data = eurostat.get_data('GOV_10DD_SLGD')
>>> data[0]
('freq', 'na_item', 'sector', 'maturity', 'unit', 'geo\\TIME_PERIOD', 2018, 2019, 2020, 2021)
>>> data[90:95]
[('A', 'F3', 'S11', 'TOTAL', 'MIO_EUR', 'BE', None, None, 23.8, 39.3),
('A', 'F3', 'S11', 'TOTAL', 'MIO_EUR', 'ES', None, None, 130.0, 122.2),
('A', 'F3', 'S11', 'TOTAL', 'MIO_NAC', 'AT', None, None, 0.0, 0.0),
('A', 'F3', 'S11', 'TOTAL', 'MIO_NAC', 'BE', None, None, 23.8, 39.3),
('A', 'F3', 'S11', 'TOTAL', 'MIO_NAC', 'ES', None, None, 130.0, 122.2)]
>>> import eurostat
>>> data = eurostat.get_data('GOV_10DD_SLGD', True)
>>> data[0]
('freq', 'na_item', 'sector', 'maturity', 'unit', 'geo\\TIME_PERIOD', '2018_value', '2018_flag', '2019_value', '2019_flag', '2020_value', '2020_flag', '2021_value', '2021_flag')
>>> data[90:95]
[('A', 'F3', 'S11', 'TOTAL', 'MIO_EUR', 'BE', None, ':', None, ':', 23.8, '', 39.3, ''),
('A', 'F3', 'S11', 'TOTAL', 'MIO_EUR', 'ES', None, ':', None, ':', 130.0, '', 122.2, ''),
('A', 'F3', 'S11', 'TOTAL', 'MIO_NAC', 'AT', None, ':', None, ':', 0.0, '', 0.0, ''),
('A', 'F3', 'S11', 'TOTAL', 'MIO_NAC', 'BE', None, ':', None, ':', 23.8, '', 39.3, ''),
('A', 'F3', 'S11', 'TOTAL', 'MIO_NAC', 'ES', None, ':', None, ':', 130.0, '', 122.2, '')]
To download a subset, you need to set the filter_pars dictionary. Its keys can be: 'startPeriod', 'endPeriod' and any parameter you get with eurostat.get_pars(). Values can be number, string or list and generally are derived by eurostat.get_par_values().
>>> import eurostat
>>> code = 'GOV_10DD_SLGD'
>>> pars = eurostat.get_pars(code)
>>> pars
['freq', 'na_item', 'sector', 'maturity', 'unit', 'geo']
>>> par_values = eurostat.get_par_values(code, 'geo')
>>> par_values
['BE', 'DE', 'ES', 'AT']
>>> my_filter_pars = {'startPeriod': 2019, 'geo': ['AT','BE']}
>>> data = eurostat.get_data(code, filter_pars=my_filter_pars)
>>> data[0]
('freq', 'na_item', 'sector', 'maturity', 'unit', 'geo\\TIME_PERIOD', 2019, 2020, 2021)
>>> data[445:447]
[('A', 'GD', 'S1_S2', 'Y_LT1', 'PC_TOT', 'AT', None, 1.0, 0.9),
('A', 'F22', 'S1_S2', 'TOTAL', 'MIO_EUR', 'BE', None, 0.0, 0.0)]
>>> import eurostat
>>> code = 'GOV_10DD_SLGD'
>>> pars = eurostat.get_pars(code)
>>> pars
['freq', 'na_item', 'sector', 'maturity', 'unit', 'geo']
>>> par_values = eurostat.get_par_values(code, 'geo')
>>> par_values
['BE', 'DE', 'ES', 'AT']
>>> my_filter_pars = {'startPeriod': 2019, 'geo': ['AT','BE']}
>>> data = eurostat.get_data(code, True, filter_pars=my_filter_pars)
>>> data[0]
('freq', 'na_item', 'sector', 'maturity', 'unit', 'geo\\TIME_PERIOD', '2019_value', '2019_flag', '2020_value', '2020_flag', '2021_value', '2021_flag')
>>> data[446:448]
[('A', 'GD', 'S1_S2', 'Y_LT1', 'PC_TOT', 'AT', None, ':', 1.0, '', 0.9, ''),
('A', 'F22', 'S1_S2', 'TOTAL', 'MIO_EUR', 'BE', None, ':', 0.0, '', 0.0, '')]
eurostat.get_data_df(code, [flags=False], [filter_pars=None], [verbose=False], [reverse_time=False])
Read an Eurostat dataset and returns it as pandas dataframe.
To get a subset, set filter_pars (a dictionary where keys are parameter names, values are the wanted items).
To see a rough progress status, set verbose=True.
flag=True downloads also the flags associated to the data. Pay attention: the data format changes if flags is True or not. Flag meanings can be found here.
reverse_time=True reverses the order of the time columns. For compatibility with 0.x.x versions.
>>> import eurostat
>>> data = eurostat.get_data_df('GOV_10DD_SLGD')
>>> data
freq na_item sector maturity ... 2018 2019 2020 2021
0 A F22 S1_S2 TOTAL ... NaN NaN 0.0 0.0
1 A F22 S1_S2 TOTAL ... NaN NaN 0.0 0.0
2 A F22 S1_S2 TOTAL ... 0.0 0.0 0.0 0.0
3 A F22 S1_S2 TOTAL ... NaN NaN 0.0 0.0
4 A F22 S1_S2 TOTAL ... NaN NaN 0.0 0.0
... ... ... ... ... ... ... ... ...
1181 A GD S1_S2 Y_LT1 ... NaN NaN 2849.0 2408.2
1182 A GD S1_S2 Y_LT1 ... NaN NaN 1.0 0.9
1183 A GD S1_S2 Y_LT1 ... NaN NaN 6.9 5.4
1184 A GD S1_S2 Y_LT1 ... 4.9 6.0 5.9 5.5
1185 A GD S1_S2 Y_LT1 ... NaN NaN 0.9 0.8
>>> import eurostat
>>> data = eurostat.get_data_df('GOV_10DD_SLGD', True)
>>> data
freq na_item sector maturity ... 2020_value 2020_flag 2021_value 2021_flag
0 A F22 S1_S2 TOTAL ... 0.0 : 0.0
1 A F22 S1_S2 TOTAL ... 0.0 0.0
2 A F22 S1_S2 TOTAL ... 0.0 : 0.0
3 A F22 S1_S2 TOTAL ... 0.0 0.0
4 A F22 S1_S2 TOTAL ... 0.0 0.0
... ... ... ... ... ... ... ... ...
1182 A GD S1_S2 Y_LT1 ... 1.0 0.9
1183 A GD S1_S2 Y_LT1 ... 6.9 5.4
1184 A GD S1_S2 Y_LT1 ... 5.9 5.5
1185 A GD S1_S2 Y_LT1 ... 0.9 : 0.8
1186 A GD S1_S2 Y_LT1 ... 0.9 0.8
To download a subset, you need to set the filter_pars dictionary. Its keys can be: 'startPeriod', 'endPeriod' and any parameter you get with eurostat.get_pars(). Values can be number, string or list and generally are derived by eurostat.get_par_values().
>>> import eurostat
>>> code = 'GOV_10DD_SLGD'
>>> pars = eurostat.get_pars(code)
>>> pars
['freq', 'na_item', 'sector', 'maturity', 'unit', 'geo']
>>> par_values = eurostat.get_par_values(code, 'geo')
>>> par_values
['BE', 'DE', 'ES', 'AT']
>>> my_filter_pars = {'endPeriod': 2020, 'geo': ['AT','BE']}
>>> data = eurostat.get_data_df(code, filter_pars=my_filter_pars)
>>> data
freq na_item sector maturity unit geo\TIME_PERIOD 2018 2019 2020
0 A F22 S1_S2 TOTAL MIO_EUR AT NaN NaN 0.0
1 A F22 S1_S2 TOTAL MIO_NAC AT NaN NaN 0.0
2 A F22 S1_S2 Y_LT1 MIO_EUR AT NaN NaN 0.0
3 A F22 S1_S2 Y_LT1 MIO_NAC AT NaN NaN 0.0
4 A F29 S1_S2 TOTAL MIO_EUR AT NaN NaN 0.0
.. ... ... ... ... ... ... ... ... ...
633 A GD S1_S2 Y_GT1 MIO_NAC BE NaN NaN 72660.0
634 A GD S1_S2 Y_GT1 PC_TOT BE NaN NaN 93.1
635 A GD S1_S2 Y_LT1 MIO_EUR BE NaN NaN 5387.0
636 A GD S1_S2 Y_LT1 MIO_NAC BE NaN NaN 5387.0
637 A GD S1_S2 Y_LT1 PC_TOT BE NaN NaN 6.9
>>> import eurostat
>>> code = 'GOV_10DD_SLGD'
>>> pars = eurostat.get_pars(code)
>>> pars
['freq', 'na_item', 'sector', 'maturity', 'unit', 'geo']
>>> par_values = eurostat.get_par_values(code, 'geo')
>>> par_values
['BE', 'DE', 'ES', 'AT']
>>> my_filter_pars = {'endPeriod': 2020, 'geo': ['AT','BE']}
>>> data = eurostat.get_data_df(code, True, filter_pars=my_filter_pars)
>>> data
freq na_item sector maturity ... 2019_value 2019_flag 2020_value 2020_flag
0 A F22 S1_S2 TOTAL ... NaN : 0.0
1 A F22 S1_S2 TOTAL ... NaN : 0.0
2 A F22 S1_S2 Y_LT1 ... NaN : 0.0
3 A F22 S1_S2 Y_LT1 ... NaN : 0.0
4 A F29 S1_S2 TOTAL ... NaN : 0.0
.. ... ... ... ... ... ... ... ... ...
635 A GD S1_S2 Y_GT1 ... NaN : 93.1
636 A GD S1_S2 Y_LT1 ... NaN : 5387.0
637 A GD S1_S2 Y_LT1 ... NaN : 5387.0
638 A GD S1_S2 Y_LT1 ... NaN : 6.9
639 A GD S1_S2 Y_LT1 ... NaN : 6.9
You can configure the https proxy with setproxy, or with set_requests_args (the latter is described in the next section).
eurostat.setproxy(proxyinfo)
It requires in input proxyinfo, a dictionary with one key ('https') and value containing the connection parameters in a list.
If authentication is not needed, set username and password to None.
It overwrites the proxy setting of any previous runs of set_requests_args.
For the Eurostat API, only the https proxy is used.
>>> import eurostat
>>> proxyinfo = {'https': ['myuser', 'mypassword', 'http://url:port']}
>>> eurostat.setproxy(proxyinfo)
It always returns None. If you want to see your setting, use the function get_requests_args.
You may need to modify the default download settings. The Eurostat package uses the requests package and allows to set some of its arguments:
- timeout: how long to wait for the server before raising an error, in sec. Default is 120 sec.
- proxies : sets the proxies. It overwrites the proxy setting of any previous runs of setproxy. For the Eurostat API, only the https proxy is used. Default is None (the optional argument is not passed to the request).
- verify : whether to verify the server’s TLS certificate, or to use a CA bundle. Defaults to None (the optional argument is not passed to the request).
- cert : whether to use a SSL client cert file. Defaults to None (the optional argument is not passed to the request).
eurostat.set_requests_args([timeout=120.], [proxies=None], [verify=None], [cert=None])
It returns None.
For detailed information, please refer to the documentation of the package requests.
>>> import eurostat
>>> mytimeout = 240.
>>> myproxy = {'https': 'http://myuser:[email protected]:1234'}
>>> eurostat.set_requests_args(timeout=mytimeout, proxies=myproxy)
To check the settings:
eurostat.get_requests_args()
It returns a dictionary with the argument names and their respective values, exactly as they are passed to the request.
Note that a caching call checks the last update date, so your data will be downloaded again when a data update is done in Eurostat. This also means that reproducibility is not guaranteed (but you can get past results with joblib if you haven't cleared your cache).
For a temporary cache (your coding session), you can use:
import tempfile
from joblib import Memory
tempdir = tempfile.mkdtemp()
memory = Memory(tempdir)
eurostat.get_data(code, cache=memory.cache)
For a permanent cache (on your hard disk), you can use:
from joblib import Memory
memory = Memory(your_joblib_cache_directory)
eurostat.get_data(code, cache=memory.cache)
Please open an issue or send a message to noemi.cazzaniga [at] polimi.it .
Download and usage of Eurostat data is subject to Eurostat's general copyright notice and licence policy (see Policies). Please also be aware of the European Commission's general conditions.
- Eurostat database: online catalog.
- Eurostat classifications: Access to classifications.
- Eurostat Interactive Data Browser: Data Browser.
- Eurostat Interactive Tool for Comext Data: Easy Comext.
- Eurostat PRODCOM website: PRODCOM.
- Eurostat acronyms: Symbols and abbreviations.
- Eurostat web services description: Web services.
- Python package pandas: Python Data Analysis Library.
- Python package requests: HTTP Library for Python.
- R package eurostat: R Tools for Eurostat Open Data.
- Bug fix: async download.
- set_requests_args, get_requests_args added.
- get_dic gives also dims dictionary.
- Added dataframe output from get_dic.
- get_toc of single dataset.
- Enhanced localisation.
- Changed input of setproxy to read the full address.
- Bug fix: proxy setting.
- Bug fix: proxy setting. Deprecated.
- Bug fix (wheel for conda venv).
- Adapted to the new Eurostat API (SDMX 2.1).
- pandasdmx not required anymore.
- Error messages improved.
- Compiled also for conda install.
- Multilingual dictionary.
- Internal bug fix.
- Bug fix (sdmx non-annual data). Deprecated.
- Bug fix (pandasdmx 0.9).
- Improved SDMX download capability in case of slow internet connections.
- Bug fix (proxy info).
- get_avail_sdmx, get_avail_sdmx_df, subset_avail_sdmx_df added.
- Added support to proxy.
- Bug fix (non-annual data headers).
- Added possibility of downloading flags.
- get_toc_df, subset_toc_df added.
- First official release.
- joblib.Memory data caching implemented by Nicolas Graves (CIRED, [email protected])