Skip to content

Commit

Permalink
📊 famines: add new wpf dataset (#3894)
Browse files Browse the repository at this point in the history
  • Loading branch information
veronikasamborska1994 authored Jan 29, 2025
1 parent d7608da commit b20be74
Show file tree
Hide file tree
Showing 21 changed files with 1,651 additions and 26 deletions.
33 changes: 33 additions & 0 deletions dag/archive/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,39 @@ steps:
- data-private://meadow/antibiotics/2024-12-02/total_pathogen_bloodstream_amr
data-private://grapher/antibiotics/2024-12-02/total_pathogen_bloodstream_amr:
- data-private://garden/antibiotics/2024-12-02/total_pathogen_bloodstream_amr

# World Peace Foundation - Famines
data://meadow/wpf/2024-10-03/famines:
- snapshot://wpf/2024-10-03/famines.xlsx
data://garden/wpf/2024-10-03/famines:
- data://meadow/wpf/2024-10-03/famines
- data://garden/regions/2023-01-01/regions

data://grapher/wpf/2024-10-03/famines:
- data://garden/wpf/2024-10-03/famines
data://garden/wpf/2024-10-03/total_famines_by_year_decade:
- data://garden/wpf/2024-10-03/famines
- data://garden/demography/2024-07-15/population
data://grapher/wpf/2024-10-03/total_famines_by_year_decade:
- data://garden/wpf/2024-10-03/total_famines_by_year_decade

data://garden/wpf/2024-10-03/famines_by_regime_gdp:
- data://garden/wpf/2024-10-03/famines
- data://garden/democracy/2024-03-07/vdem
- data://garden/ggdc/2024-04-26/maddison_project_database
data://grapher/wpf/2024-10-03/famines_by_regime_gdp:
- data://garden/wpf/2024-10-03/famines_by_regime_gdp

data://garden/wpf/2024-10-03/famines_by_factor:
- data://garden/wpf/2024-10-03/famines
data://grapher/wpf/2024-10-03/famines_by_factor:
- data://garden/wpf/2024-10-03/famines_by_factor

data://garden/wpf/2024-10-03/famines_by_place:
- data://garden/wpf/2024-10-03/famines
data://grapher/wpf/2024-10-03/famines_by_place:
- data://garden/wpf/2024-10-03/famines_by_place

include:
# Include all active steps plus all archive steps.
- dag/main.yml
Expand Down
52 changes: 26 additions & 26 deletions dag/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -681,37 +681,37 @@ steps:
data://grapher/un/2024-10-21/census_dates:
- data://garden/un/2024-10-21/census_dates

# World Peace Foundation - Famines
data://meadow/wpf/2024-10-03/famines:
- snapshot://wpf/2024-10-03/famines.xlsx
data://garden/wpf/2024-10-03/famines:
- data://meadow/wpf/2024-10-03/famines
# World Peace Foundation - Famines (2025)
data://meadow/wpf/2025-01-17/famines:
- snapshot://wpf/2025-01-17/famines.xlsx
data://garden/wpf/2025-01-17/famines:
- data://meadow/wpf/2025-01-17/famines
- data://garden/regions/2023-01-01/regions

data://grapher/wpf/2024-10-03/famines:
- data://garden/wpf/2024-10-03/famines
data://garden/wpf/2024-10-03/total_famines_by_year_decade:
- data://garden/wpf/2024-10-03/famines
data://grapher/wpf/2025-01-17/famines:
- data://garden/wpf/2025-01-17/famines
data://garden/wpf/2025-01-17/total_famines_by_year_decade:
- data://garden/wpf/2025-01-17/famines
- data://garden/demography/2024-07-15/population
data://grapher/wpf/2024-10-03/total_famines_by_year_decade:
- data://garden/wpf/2024-10-03/total_famines_by_year_decade
data://grapher/wpf/2025-01-17/total_famines_by_year_decade:
- data://garden/wpf/2025-01-17/total_famines_by_year_decade

data://garden/wpf/2024-10-03/famines_by_regime_gdp:
- data://garden/wpf/2024-10-03/famines
data://garden/wpf/2025-01-17/famines_by_regime_gdp_population:
- data://garden/wpf/2025-01-17/famines
- data://garden/democracy/2024-03-07/vdem
- data://garden/ggdc/2024-04-26/maddison_project_database
data://grapher/wpf/2024-10-03/famines_by_regime_gdp:
- data://garden/wpf/2024-10-03/famines_by_regime_gdp

data://garden/wpf/2024-10-03/famines_by_factor:
- data://garden/wpf/2024-10-03/famines
data://grapher/wpf/2024-10-03/famines_by_factor:
- data://garden/wpf/2024-10-03/famines_by_factor

data://garden/wpf/2024-10-03/famines_by_place:
- data://garden/wpf/2024-10-03/famines
data://grapher/wpf/2024-10-03/famines_by_place:
- data://garden/wpf/2024-10-03/famines_by_place
- data://garden/demography/2024-07-15/population
data://grapher/wpf/2025-01-17/famines_by_regime_gdp_population:
- data://garden/wpf/2025-01-17/famines_by_regime_gdp_population

data://garden/wpf/2025-01-17/famines_by_place:
- data://garden/wpf/2025-01-17/famines
data://grapher/wpf/2025-01-17/famines_by_place:
- data://garden/wpf/2025-01-17/famines_by_place

data://garden/wpf/2025-01-17/famines_by_trigger:
- data://garden/wpf/2025-01-17/famines
data://grapher/wpf/2025-01-17/famines_by_trigger:
- data://garden/wpf/2025-01-17/famines_by_trigger

data-private://meadow/owid/latest/ig_countries:
- snapshot-private://owid/latest/ig_countries.csv
Expand Down
61 changes: 61 additions & 0 deletions etl/steps/data/garden/wpf/2025-01-17/famines.countries.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
{
"Armenia": "Armenia",
"Bangladesh": "Bangladesh",
"Brazil": "Brazil",
"CAR": "Central African Republic",
"Cambodia": "Cambodia",
"China": "China",
"Cuba": "Cuba",
"East Timor": "East Timor",
"Ethiopia": "Ethiopia",
"Germany": "Germany",
"India": "India",
"Mozambique": "Mozambique",
"Nigeria": "Nigeria",
"North Korea": "North Korea",
"Philippines": "Philippines",
"Poland": "Poland",
"Russia": "Russia",
"Somalia": "Somalia",
"South Sudan": "South Sudan",
"Spain": "Spain",
"Sudan": "Sudan",
"Syria": "Syria",
"Uganda": "Uganda",
"Vietnam": "Vietnam",
"Yemen": "Yemen",
"Austria-Hungary (Poland)": "Poland",
"Somaliland, African Red Sea Region": "Somaliland, African Red Sea Region",
"Congo Free State (Democratic Republic of Congo)": "Democratic Republic of Congo",
"DRC": "Democratic Republic of Congo",
"East Africa (Kenya, Uganda, Tanzania)": "Kenya, Uganda, Tanzania",
"Eastern Europe": "Eastern Europe",
"German East Africa (Tanzania, Mozambique, Rwanda, Burundi)": "Tanzania, Mozambique, Rwanda, Burundi",
"Germany/USSR": "Germany, USSR",
"Greater Syria": "Syria, Lebanon, Israel",
"India (India, West Bengal, Bangladesh)": "India, Bangladesh",
"Nigeria (Biafra)": "Nigeria",
"Ottoman Empire (Turkey)": "Turkey",
"Ottoman Empire (Turkey, Armenians)": "Turkey, Armenians",
"Ottoman Empire (Turkey, Iraq, Iran, Syria)": "Turkey, Iraq, Iran, Syria",
"Persia": "Iran",
"Persia (Iran)": "Iran",
"Sahel (Mauritania, Mali, Niger)": "Mauritania, Mali, Niger",
"Sahel (Upper Senegal and Niger (contemporary Burkina Faso and Mali), the Military Territory of Niger, and Chad)": "Senegal, Burkina Faso, Mali, Niger, Chad",
"Serbia and the Balkans": "Serbia, Albania, Bosnia and Herzegovina, Bulgaria, Greece, Kosovo, Montenegro, North Macedonia, Romania, Croatia, Slovenia",
"Sudan\n(South Sudan)": "South Sudan",
"Sudan\n(including South Sudan)": "Sudan",
"Tanganyika (German East Africa, Tanzania)": "Tanzania",
"USSR (Moldova, Ukraine, Russia, Belarus)": "Moldova, Ukraine, Russia, Belarus",
"USSR (Russia and Western Soviet States)": "Russia, Western Soviet States",
"USSR (Russia)": "Russia",
"USSR (Kazakhstan)": "Kazakhstan",
"USSR (Ukraine)": "Ukraine",
"USSR (Southern Russia)": "Russia",
"USSR (southern Russia & Ukraine)": "Russia, Ukraine",
"USSR": "USSR",
"Indonesia": "Indonesia",
"Greece":"Greece",
"East Asia": "Japan",
"German occupied USSR ": "USSR"
}
45 changes: 45 additions & 0 deletions etl/steps/data/garden/wpf/2025-01-17/famines.meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# NOTE: To learn more about the fields, hover over their names.
definitions:
common:
description_short: Famines that are estimated to have killed 100,000 people or more.
presentation:
topic_tags:
- Famines


# Learn more about the available fields:
# http://docs.owid.io/projects/etl/architecture/metadata/reference/
dataset:
update_period_days: 365
title: Famine deaths by region


tables:
famines:
variables:
region:
title: Region
unit: ''
description_short: Region where the famine occurred.

wpf_authoritative_mortality_estimate:
title: Deaths from famines
unit: 'deaths'
description_short: Deaths in famines that are estimated to have killed 100,000 people or more.
description_key:
- WPF defines a famine as mass mortality due to mass starvation, with mass starvation being the "destruction, deprivation or loss of objects and activities required for survival".
description_from_producer: |-
Famines are assessed based on severity, magnitude, and duration. Magnitude, measured as the total number of excess deaths, was used to determine inclusion in the catalogue. A threshold of 100,000 deaths was applied due to limited demographic research on proportional death rate increases.
display:
numDecimalPlaces: 0

principal_cause:
title: Principal cause
unit: ''
description_key:
- Famines were classified into four main triggers - adverse climate, government policies, armed conflict, or genocide - though in reality, multiple factors, especially human decisions, almost always play a significant role in their development and severity.
- Historical examples demonstrate this complexity, such as when El Niño-related famines in the late 19th century were made worse by imperial conquest, and when the 1984-85 Sudan famine, initially triggered by drought, was intensified by exploitative politics.
- The Ukrainian Holodomor (1931-1933) is a subject of some controversy, with interpretations divided between Stalin’s genocidal intent and Soviet claims of unintentional policy failures; however, most scholars now classify it as genocide.
- The Darfur crisis (2003-2005) also faced initial controversy before achieving scholarly consensus as genocide.
- More recently, the Tigray famine (2020-2022) has been categorized as "armed conflict," though ongoing research may shift its classification.
- The classification system continues to evolve as new research emerges and experts provide additional insights.
119 changes: 119 additions & 0 deletions etl/steps/data/garden/wpf/2025-01-17/famines.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
"""Load a meadow dataset and create a garden dataset."""

from owid.catalog import Dataset, Table

from etl.data_helpers import geo
from etl.helpers import PathFinder, create_dataset

# Get paths and naming conventions for current step.
paths = PathFinder(__file__)

# Regions for which aggregates will be created.
REGIONS = ["North America", "South America", "Europe", "Africa", "Asia", "Oceania"]

# Custom regions for specific places where famines occured
CUSTOM_REGION_DICT = {
"Persia": "Asia",
"Congo Free State": "Africa",
"Sudan, Ethiopia": "Africa",
"Ottoman Empire": "Asia",
"East Africa": "Africa",
"Somaliland": "Africa",
"African Red Sea Region": "Africa",
"Sahel": "Africa",
"German East Africa": "Africa",
"Serbia, Balkans": "Europe",
"Greater Syria": "Asia",
"Russia, Ukraine": "Asia",
"USSR (Southern Russia & Ukraine)": "Asia",
"Russia, Kazakhstan": "Asia",
"Germany, USSR": "Asia",
"East Asia": "Asia",
"India, Bangladesh": "Asia",
"Eastern Europe": "Europe",
"USSR": "Asia",
"Somaliland, African Red Sea Region": "Africa",
"USSR (Kazakhstan)": "Asia",
"USSR (Southern Russia)": "Asia",
"German occupied USSR ": "Asia",
"Poland (ghettos and concentration camps)": "Europe",
}


def run(dest_dir: str) -> None:
#
# Load inputs.
#
# Load meadow dataset.
ds_meadow = paths.load_dataset("famines")

# Read regions
ds_regions = paths.load_dataset("regions")

# Read table from meadow dataset.
tb = ds_meadow.read("famines")

#
# Process data.
#
tb = geo.harmonize_countries(df=tb, countries_file=paths.country_mapping_path)

# Add regions to the table.
tb = add_regions(tb, ds_regions)

# Ensure there are no NaNs in the 'region' column
assert not tb["region"].isna().any(), "There are NaN values in the 'region' column"

# Split and convert the 'date' column to lists of integers
tb["date"] = tb["date"].astype(str)
tb["date_list"] = tb["date"].apply(lambda x: [int(year) for year in x.split(",")])

# Create a new column 'date_range' with the minimum and maximum years
tb["date_range"] = tb["date_list"].apply(lambda x: f"{min(x)}" if min(x) == max(x) else f"{min(x)}-{max(x)}")
tb["simplified_place"] = tb["simplified_place"].astype(str)

# Create a new column with famine names that combines dates and simplified places
tb["famine_name"] = tb["simplified_place"] + " " + tb["date_range"]

# Rename the cause from natural calamity to climatic adversity (as suggested by the source)
tb["principal_cause"] = tb["principal_cause"].str.replace("Natural calamity", "Adverse climate")

# Add origins metadata to new columns.
for col in [
"wpf_authoritative_mortality_estimate",
"famine_name",
]:
tb[col].metadata.origins = tb["simplified_place"].metadata.origins

# Drop columns that are not needed.
tb = tb.drop(columns=["date_list", "date_range", "simplified_place"])
tb = tb.format(["famine_name", "date"])

#
# Save outputs.
#
# Create a new garden dataset with the same metadata as the meadow dataset.
ds_garden = create_dataset(
dest_dir, tables=[tb], check_variables_metadata=True, default_metadata=ds_meadow.metadata
)

# Save changes in the new garden dataset.
ds_garden.save()


def add_regions(tb: Table, ds_regions: Dataset) -> Table:
"""
Add regions to the famine data table.
"""
# First assign custom regions
tb["region"] = tb["simplified_place"].map(CUSTOM_REGION_DICT)

# Add the rest as usual
for region in REGIONS:
# List of countries in region.
countries_in_region = geo.list_members_of_region(region=region, ds_regions=ds_regions)

# Add region to the table.
tb.loc[tb["simplified_place"].isin(countries_in_region), "region"] = region

return tb
32 changes: 32 additions & 0 deletions etl/steps/data/garden/wpf/2025-01-17/famines_by_place.meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# NOTE: To learn more about the fields, hover over their names.
definitions:
common:
display:
numDecimalPlaces: 0
description_short: Deaths in famines that are estimated to have killed 100,000 people or more.
description_key:
- WPF defines a famine as mass mortality due to mass starvation, with mass starvation being the "destruction, deprivation or loss of objects and activities required for survival".
description_processing: The deaths were assumed to be evenly distributed over the duration of each famine, except for the famine in China between 1958 and 1962, where the source provides a year-by-year breakdown of mortality.
description_from_producer: |-
Famines are assessed based on severity, magnitude, and duration. Magnitude, measured as the total number of excess deaths, was used to determine inclusion in the catalogue. A threshold of 100,000 deaths was applied due to limited demographic research on proportional death rate increases.
presentation:
topic_tags:
- Famines



# Learn more about the available fields:
# http://docs.owid.io/projects/etl/architecture/metadata/reference/
dataset:
update_period_days: 365
title: Deaths from famines by top countries and by decade

tables:
famines_by_place:
variables:
decadal_famine_deaths:
title: Deaths from famines by top countries by decade
unit: 'deaths'
presentation:
grapher_config:
note: Decadal figures represent data averaged over each ten-year period (e.g., 1990–1999 for the 1990s). The 2020s figure is provisional and includes data only up to and including 2023.
Loading

0 comments on commit b20be74

Please sign in to comment.