Skip to content

Commit

Permalink
Merge pull request #167 from edwardchalstrey1/clio-notebook
Browse files Browse the repository at this point in the history
Add Cliopatria viewer notebook
  • Loading branch information
edwardchalstrey1 authored Jun 18, 2024
2 parents 830f81c + 828f000 commit c129011
Show file tree
Hide file tree
Showing 6 changed files with 343 additions and 3 deletions.
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -219,5 +219,4 @@ seshat/staticfiles
pulumi/logs
pulumi/Pulumi.seshat.yaml
scripts
.DS_Store
*.ipynb
.DS_Store
21 changes: 21 additions & 0 deletions notebooks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Visualise Cliopatria shape dataset

Cliopatria is the shape dataset used by the Seshat Global History Databank website. It can also be explored in a local Jupyter notebook running on your local machine by following these instructions.

1. Ensure you have a working installation of Python 3 and Conda. If not, [download Anaconda](https://docs.anaconda.com/free/anaconda/install/index.html), which should give you both
- Note: you can use a different tool for creating a Python virtual environment than conda (e.g. venv) if you prefer

2. Set up the required virtual environment, install packages into it and create a jupyter kernel.
- Conda example:
```
conda create --name cliopatria python=3.11
conda activate cliopatria
pip install -r requirements.txt
python -m ipykernel install --user --name=cliopatria --display-name="Python (cliopatria)"
```
- Note: This will install Geopandas 0.13.2, but if you [install from source](https://geopandas.org/en/stable/getting_started/install.html#installing-from-source) it's much faster with version 1.0.0 (unreleased on pip as of 18th June 2024)
3. Open the `cliopatria.ipynb` notebook with Jupyter (or another application that can run notebooks such as VSCode).
- `jupyter lab` (or `jupyter notebook`)
- Note: make sure the notebook Python kernel is using the virtual environment you created (click top right)
4. Follow the instructions in the notebook.
144 changes: 144 additions & 0 deletions notebooks/cliopatria.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cliopatria viewer\n",
"\n",
"1. To get started, download a copy of the Cliopatria dataset from here: `[INSERT LINK]`\n",
"2. Move the downloaded dataset to an appropriate location on your machine and pass in the paths in the code cell below and run\n",
"3. Run the subsequent cells of the notebook\n",
"4. Play around with both the GeoDataFrame (gdf) and the rendered map\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"cliopatria_geojson_path = \"../data/cliopatria_composite_unique_nonsimplified.geojson_06052024/cliopatria_composite_unique_nonsimplified.geojson\"\n",
"cliopatria_json_path = \"../data/cliopatria_composite_unique_nonsimplified.geojson_06052024/cliopatria_composite_unique_nonsimplified_name_years.json\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from map_functions import cliopatria_gdf, display_map"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Load the Cliopatria data to a GeoDataFrame including end years for each shape\n",
"gdf = cliopatria_gdf(cliopatria_geojson_path, cliopatria_json_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Play with the data on the map\n",
"\n",
"**Notes**\n",
"- The slider is a bit buggy, the best way to change year is to enter a year in the box and hit enter. Use minus numbers for BCE.\n",
"- The map is also displayed thrice for some reason!\n",
"- Initial attempts to implement a play button similar to the website code failed, but that may not be needed here.\n",
"- Click the shapes to reveal the polity display names, using the same logic used in the website code - see `map_functions.py`"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a95aced3593446ceb228a171178f978b",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"IntText(value=0, description='Year:')"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "80c96982f4a34628b3026e9f853a6af9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"IntSlider(value=0, description='Year:', max=2024, min=-3400)"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "44078fdd8e91499bad99d7fd38b76a65",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/echalstrey/.pyenv/versions/3.11.4/lib/python3.11/site-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" super().__setitem__(key, value)\n"
]
}
],
"source": [
"display_year = 0\n",
"display_map(gdf, display_year)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (cliopatria1)",
"language": "python",
"name": "cliopatria1"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
171 changes: 171 additions & 0 deletions notebooks/map_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
import geopandas as gpd
import json
import folium
import folium
import ipywidgets as widgets
from IPython.display import display, clear_output


def convert_name(gdf, i):
"""
Convert the polity name of a shape in the Cliopatria dataset to what we want to display on the Seshat world map.
Where gdf is the geodataframe, i is the index of the row/shape of interest.
Returns the name to display on the map.
Returns None if we don't want to display the shape (see comments below for details).
"""
polity_name = gdf.loc[i, 'Name'].replace('(', '').replace(')', '') # Remove spaces and brackets from name
# If a shape has components (is a composite) we'll load the components instead
# ... unless the components have their own components, then load the top level shape
# ... or the shape is in a personal union, then load the personal union shape instead
try:
if gdf.loc[i, 'Components']: # If the shape has components
if ';' not in gdf.loc[i, 'SeshatID']: # If the shape is not a personal union
if len(gdf.loc[i, 'Components']) > 0 and '(' not in gdf.loc[i, 'Components']: # If the components don't have components
polity_name = None
except KeyError: # If the shape has no components, don't modify the name
pass
return polity_name


def cliopatria_gdf(cliopatria_geojson_path, cliopatria_json_path):
"""
Load the Cliopatria shape dataset with GeoPandas and add the EndYear column to the geodataframe.
"""
# Load the geojson and json files
gdf = gpd.read_file(cliopatria_geojson_path)
with open(cliopatria_json_path, 'r') as f:
name_years = json.load(f)

# Create new columns in the geodataframe
gdf['EndYear'] = None
gdf['DisplayName'] = None

# Loop through the geodataframe
for i in range(len(gdf)):

# Get the raw name of the current row and the name to display
polity_name_raw = gdf.loc[i, 'Name']
polity_name = convert_name(gdf, i)

if polity_name: # convert_name returns None if we don't want to display the shape
if gdf.loc[i, 'Type'] != 'POLITY': # Add the type to the name if it's not a polity
polity_name = gdf.loc[i, 'Type'] + ': ' + polity_name

# Get the start year of the current row
start_year = gdf.loc[i, 'Year']

# Get a sorted list of the years for that name from the geodataframe
this_polity_years = sorted(gdf[gdf['Name'] == polity_name_raw]['Year'].unique())

# Get the end year for a shape
# Most of the time, the shape end year is the year of the next shape
# Some polities have a gap in their active years
# For a shape year at the start of a gap, set the end year to be the shape year, so it doesn't cover the inactive period
start_end_years = name_years[polity_name_raw]
end_years = [x[1] for x in start_end_years]

polity_start_year = start_end_years[0][0]
polity_end_year = end_years[-1]

# Raise an error if the shape year is not the start year of the polity
if this_polity_years[0] != polity_start_year:
raise ValueError(f'First shape year for {polity_name} is not the start year of the polity')

# Find the closest higher value from end_years to the shape year
next_end_year = min(end_years, key=lambda x: x if x >= start_year else float('inf'))

if start_year in end_years: # If the shape year is in the list of polity end years, the start year is the end year
end_year = start_year
else:
this_year_index = this_polity_years.index(start_year)
try: # Try to use the next shape year minus one as the end year if possible, unless it's higher than the next_end_year
next_shape_year_minus_one = this_polity_years[this_year_index + 1] - 1
end_year = next_shape_year_minus_one if next_shape_year_minus_one < next_end_year else next_end_year
except IndexError: # Otherwise assume the end year of the shape is the end year of the polity
end_year = polity_end_year

# Set the EndYear column to the end year
gdf.loc[i, 'EndYear'] = end_year

# Set the DisplayName column to the name to display
gdf.loc[i, 'DisplayName'] = polity_name

return gdf


def create_map(selected_year, gdf, map_output):
global m
m = folium.Map(location=[0, 0], zoom_start=2, tiles='https://a.basemaps.cartocdn.com/rastertiles/voyager_nolabels/{z}/{x}/{y}.png', attr='CartoDB')

# Filter the gdf for shapes that overlap with the selected_year
filtered_gdf = gdf[(gdf['Year'] <= selected_year) & (gdf['EndYear'] >= selected_year)]

# Remove '0x' and add '#' to the start of the color strings
filtered_gdf['Color'] = '#' + filtered_gdf['Color'].str.replace('0x', '')

# Transform the CRS of the GeoDataFrame to WGS84 (EPSG:4326)
filtered_gdf = filtered_gdf.to_crs(epsg=4326)

# Define a function for the style_function parameter
def style_function(feature, color):
return {
'fillColor': color,
'color': color,
'weight': 2,
'fillOpacity': 0.5
}

# Add the polygons to the map
for _, row in filtered_gdf.iterrows():
# Convert the geometry to GeoJSON
geojson = folium.GeoJson(
row.geometry,
style_function=lambda feature, color=row['Color']: style_function(feature, color)
)

# Add a popup to the GeoJSON
folium.Popup(row['DisplayName']).add_to(geojson)

# Add the GeoJSON to the map
geojson.add_to(m)

# Display the map
with map_output:
clear_output(wait=True)
display(m)


def display_map(gdf, display_year):

# Create a text box for input
year_input = widgets.IntText(
value=display_year,
description='Year:',
)

# Define a function to be called when the value of the text box changes
def on_value_change(change):
create_map(change['new'], gdf, map_output)

# Create a slider for input
year_slider = widgets.IntSlider(
value=display_year,
min=gdf['Year'].min(),
max=gdf['EndYear'].max(),
description='Year:',
)

# Link the text box and the slider
widgets.jslink((year_input, 'value'), (year_slider, 'value'))

# Create an output widget
map_output = widgets.Output()

# Attach the function to the text box
year_input.observe(on_value_change, names='value')

# Display the widgets
display(year_input, year_slider, map_output)

# Call create_map initially to display the map
create_map(display_year, gdf, map_output)
5 changes: 5 additions & 0 deletions notebooks/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
jupyter==1.0.0
ipykernel==6.29.3
geopandas==0.13.2
contextily==1.6.0
folium==0.16.0
2 changes: 1 addition & 1 deletion seshat/apps/core/management/commands/populate_videodata.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ def handle(self, *args, **options):
polity=polity_colour_key,
wikipedia_name=properties['Wikipedia'],
seshat_id=properties['SeshatID'],
area=properties['Area_km2'],
area=properties['Area'],
start_year=properties['Year'],
end_year=end_year,
polity_start_year=polity_start_year,
Expand Down

0 comments on commit c129011

Please sign in to comment.