-
-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
📊 famines: add new wpf dataset (#3894)
- Loading branch information
1 parent
d7608da
commit b20be74
Showing
21 changed files
with
1,651 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
61 changes: 61 additions & 0 deletions
61
etl/steps/data/garden/wpf/2025-01-17/famines.countries.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
{ | ||
"Armenia": "Armenia", | ||
"Bangladesh": "Bangladesh", | ||
"Brazil": "Brazil", | ||
"CAR": "Central African Republic", | ||
"Cambodia": "Cambodia", | ||
"China": "China", | ||
"Cuba": "Cuba", | ||
"East Timor": "East Timor", | ||
"Ethiopia": "Ethiopia", | ||
"Germany": "Germany", | ||
"India": "India", | ||
"Mozambique": "Mozambique", | ||
"Nigeria": "Nigeria", | ||
"North Korea": "North Korea", | ||
"Philippines": "Philippines", | ||
"Poland": "Poland", | ||
"Russia": "Russia", | ||
"Somalia": "Somalia", | ||
"South Sudan": "South Sudan", | ||
"Spain": "Spain", | ||
"Sudan": "Sudan", | ||
"Syria": "Syria", | ||
"Uganda": "Uganda", | ||
"Vietnam": "Vietnam", | ||
"Yemen": "Yemen", | ||
"Austria-Hungary (Poland)": "Poland", | ||
"Somaliland, African Red Sea Region": "Somaliland, African Red Sea Region", | ||
"Congo Free State (Democratic Republic of Congo)": "Democratic Republic of Congo", | ||
"DRC": "Democratic Republic of Congo", | ||
"East Africa (Kenya, Uganda, Tanzania)": "Kenya, Uganda, Tanzania", | ||
"Eastern Europe": "Eastern Europe", | ||
"German East Africa (Tanzania, Mozambique, Rwanda, Burundi)": "Tanzania, Mozambique, Rwanda, Burundi", | ||
"Germany/USSR": "Germany, USSR", | ||
"Greater Syria": "Syria, Lebanon, Israel", | ||
"India (India, West Bengal, Bangladesh)": "India, Bangladesh", | ||
"Nigeria (Biafra)": "Nigeria", | ||
"Ottoman Empire (Turkey)": "Turkey", | ||
"Ottoman Empire (Turkey, Armenians)": "Turkey, Armenians", | ||
"Ottoman Empire (Turkey, Iraq, Iran, Syria)": "Turkey, Iraq, Iran, Syria", | ||
"Persia": "Iran", | ||
"Persia (Iran)": "Iran", | ||
"Sahel (Mauritania, Mali, Niger)": "Mauritania, Mali, Niger", | ||
"Sahel (Upper Senegal and Niger (contemporary Burkina Faso and Mali), the Military Territory of Niger, and Chad)": "Senegal, Burkina Faso, Mali, Niger, Chad", | ||
"Serbia and the Balkans": "Serbia, Albania, Bosnia and Herzegovina, Bulgaria, Greece, Kosovo, Montenegro, North Macedonia, Romania, Croatia, Slovenia", | ||
"Sudan\n(South Sudan)": "South Sudan", | ||
"Sudan\n(including South Sudan)": "Sudan", | ||
"Tanganyika (German East Africa, Tanzania)": "Tanzania", | ||
"USSR (Moldova, Ukraine, Russia, Belarus)": "Moldova, Ukraine, Russia, Belarus", | ||
"USSR (Russia and Western Soviet States)": "Russia, Western Soviet States", | ||
"USSR (Russia)": "Russia", | ||
"USSR (Kazakhstan)": "Kazakhstan", | ||
"USSR (Ukraine)": "Ukraine", | ||
"USSR (Southern Russia)": "Russia", | ||
"USSR (southern Russia & Ukraine)": "Russia, Ukraine", | ||
"USSR": "USSR", | ||
"Indonesia": "Indonesia", | ||
"Greece":"Greece", | ||
"East Asia": "Japan", | ||
"German occupied USSR ": "USSR" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# NOTE: To learn more about the fields, hover over their names. | ||
definitions: | ||
common: | ||
description_short: Famines that are estimated to have killed 100,000 people or more. | ||
presentation: | ||
topic_tags: | ||
- Famines | ||
|
||
|
||
# Learn more about the available fields: | ||
# http://docs.owid.io/projects/etl/architecture/metadata/reference/ | ||
dataset: | ||
update_period_days: 365 | ||
title: Famine deaths by region | ||
|
||
|
||
tables: | ||
famines: | ||
variables: | ||
region: | ||
title: Region | ||
unit: '' | ||
description_short: Region where the famine occurred. | ||
|
||
wpf_authoritative_mortality_estimate: | ||
title: Deaths from famines | ||
unit: 'deaths' | ||
description_short: Deaths in famines that are estimated to have killed 100,000 people or more. | ||
description_key: | ||
- WPF defines a famine as mass mortality due to mass starvation, with mass starvation being the "destruction, deprivation or loss of objects and activities required for survival". | ||
description_from_producer: |- | ||
Famines are assessed based on severity, magnitude, and duration. Magnitude, measured as the total number of excess deaths, was used to determine inclusion in the catalogue. A threshold of 100,000 deaths was applied due to limited demographic research on proportional death rate increases. | ||
display: | ||
numDecimalPlaces: 0 | ||
|
||
principal_cause: | ||
title: Principal cause | ||
unit: '' | ||
description_key: | ||
- Famines were classified into four main triggers - adverse climate, government policies, armed conflict, or genocide - though in reality, multiple factors, especially human decisions, almost always play a significant role in their development and severity. | ||
- Historical examples demonstrate this complexity, such as when El Niño-related famines in the late 19th century were made worse by imperial conquest, and when the 1984-85 Sudan famine, initially triggered by drought, was intensified by exploitative politics. | ||
- The Ukrainian Holodomor (1931-1933) is a subject of some controversy, with interpretations divided between Stalin’s genocidal intent and Soviet claims of unintentional policy failures; however, most scholars now classify it as genocide. | ||
- The Darfur crisis (2003-2005) also faced initial controversy before achieving scholarly consensus as genocide. | ||
- More recently, the Tigray famine (2020-2022) has been categorized as "armed conflict," though ongoing research may shift its classification. | ||
- The classification system continues to evolve as new research emerges and experts provide additional insights. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
"""Load a meadow dataset and create a garden dataset.""" | ||
|
||
from owid.catalog import Dataset, Table | ||
|
||
from etl.data_helpers import geo | ||
from etl.helpers import PathFinder, create_dataset | ||
|
||
# Get paths and naming conventions for current step. | ||
paths = PathFinder(__file__) | ||
|
||
# Regions for which aggregates will be created. | ||
REGIONS = ["North America", "South America", "Europe", "Africa", "Asia", "Oceania"] | ||
|
||
# Custom regions for specific places where famines occured | ||
CUSTOM_REGION_DICT = { | ||
"Persia": "Asia", | ||
"Congo Free State": "Africa", | ||
"Sudan, Ethiopia": "Africa", | ||
"Ottoman Empire": "Asia", | ||
"East Africa": "Africa", | ||
"Somaliland": "Africa", | ||
"African Red Sea Region": "Africa", | ||
"Sahel": "Africa", | ||
"German East Africa": "Africa", | ||
"Serbia, Balkans": "Europe", | ||
"Greater Syria": "Asia", | ||
"Russia, Ukraine": "Asia", | ||
"USSR (Southern Russia & Ukraine)": "Asia", | ||
"Russia, Kazakhstan": "Asia", | ||
"Germany, USSR": "Asia", | ||
"East Asia": "Asia", | ||
"India, Bangladesh": "Asia", | ||
"Eastern Europe": "Europe", | ||
"USSR": "Asia", | ||
"Somaliland, African Red Sea Region": "Africa", | ||
"USSR (Kazakhstan)": "Asia", | ||
"USSR (Southern Russia)": "Asia", | ||
"German occupied USSR ": "Asia", | ||
"Poland (ghettos and concentration camps)": "Europe", | ||
} | ||
|
||
|
||
def run(dest_dir: str) -> None: | ||
# | ||
# Load inputs. | ||
# | ||
# Load meadow dataset. | ||
ds_meadow = paths.load_dataset("famines") | ||
|
||
# Read regions | ||
ds_regions = paths.load_dataset("regions") | ||
|
||
# Read table from meadow dataset. | ||
tb = ds_meadow.read("famines") | ||
|
||
# | ||
# Process data. | ||
# | ||
tb = geo.harmonize_countries(df=tb, countries_file=paths.country_mapping_path) | ||
|
||
# Add regions to the table. | ||
tb = add_regions(tb, ds_regions) | ||
|
||
# Ensure there are no NaNs in the 'region' column | ||
assert not tb["region"].isna().any(), "There are NaN values in the 'region' column" | ||
|
||
# Split and convert the 'date' column to lists of integers | ||
tb["date"] = tb["date"].astype(str) | ||
tb["date_list"] = tb["date"].apply(lambda x: [int(year) for year in x.split(",")]) | ||
|
||
# Create a new column 'date_range' with the minimum and maximum years | ||
tb["date_range"] = tb["date_list"].apply(lambda x: f"{min(x)}" if min(x) == max(x) else f"{min(x)}-{max(x)}") | ||
tb["simplified_place"] = tb["simplified_place"].astype(str) | ||
|
||
# Create a new column with famine names that combines dates and simplified places | ||
tb["famine_name"] = tb["simplified_place"] + " " + tb["date_range"] | ||
|
||
# Rename the cause from natural calamity to climatic adversity (as suggested by the source) | ||
tb["principal_cause"] = tb["principal_cause"].str.replace("Natural calamity", "Adverse climate") | ||
|
||
# Add origins metadata to new columns. | ||
for col in [ | ||
"wpf_authoritative_mortality_estimate", | ||
"famine_name", | ||
]: | ||
tb[col].metadata.origins = tb["simplified_place"].metadata.origins | ||
|
||
# Drop columns that are not needed. | ||
tb = tb.drop(columns=["date_list", "date_range", "simplified_place"]) | ||
tb = tb.format(["famine_name", "date"]) | ||
|
||
# | ||
# Save outputs. | ||
# | ||
# Create a new garden dataset with the same metadata as the meadow dataset. | ||
ds_garden = create_dataset( | ||
dest_dir, tables=[tb], check_variables_metadata=True, default_metadata=ds_meadow.metadata | ||
) | ||
|
||
# Save changes in the new garden dataset. | ||
ds_garden.save() | ||
|
||
|
||
def add_regions(tb: Table, ds_regions: Dataset) -> Table: | ||
""" | ||
Add regions to the famine data table. | ||
""" | ||
# First assign custom regions | ||
tb["region"] = tb["simplified_place"].map(CUSTOM_REGION_DICT) | ||
|
||
# Add the rest as usual | ||
for region in REGIONS: | ||
# List of countries in region. | ||
countries_in_region = geo.list_members_of_region(region=region, ds_regions=ds_regions) | ||
|
||
# Add region to the table. | ||
tb.loc[tb["simplified_place"].isin(countries_in_region), "region"] = region | ||
|
||
return tb |
32 changes: 32 additions & 0 deletions
32
etl/steps/data/garden/wpf/2025-01-17/famines_by_place.meta.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# NOTE: To learn more about the fields, hover over their names. | ||
definitions: | ||
common: | ||
display: | ||
numDecimalPlaces: 0 | ||
description_short: Deaths in famines that are estimated to have killed 100,000 people or more. | ||
description_key: | ||
- WPF defines a famine as mass mortality due to mass starvation, with mass starvation being the "destruction, deprivation or loss of objects and activities required for survival". | ||
description_processing: The deaths were assumed to be evenly distributed over the duration of each famine, except for the famine in China between 1958 and 1962, where the source provides a year-by-year breakdown of mortality. | ||
description_from_producer: |- | ||
Famines are assessed based on severity, magnitude, and duration. Magnitude, measured as the total number of excess deaths, was used to determine inclusion in the catalogue. A threshold of 100,000 deaths was applied due to limited demographic research on proportional death rate increases. | ||
presentation: | ||
topic_tags: | ||
- Famines | ||
|
||
|
||
|
||
# Learn more about the available fields: | ||
# http://docs.owid.io/projects/etl/architecture/metadata/reference/ | ||
dataset: | ||
update_period_days: 365 | ||
title: Deaths from famines by top countries and by decade | ||
|
||
tables: | ||
famines_by_place: | ||
variables: | ||
decadal_famine_deaths: | ||
title: Deaths from famines by top countries by decade | ||
unit: 'deaths' | ||
presentation: | ||
grapher_config: | ||
note: Decadal figures represent data averaged over each ten-year period (e.g., 1990–1999 for the 1990s). The 2020s figure is provisional and includes data only up to and including 2023. |
Oops, something went wrong.