Skip to content

Commit

Permalink
test data moved
Browse files Browse the repository at this point in the history
  • Loading branch information
PennyHow committed Nov 5, 2024
1 parent dd691e8 commit 63545fb
Show file tree
Hide file tree
Showing 49 changed files with 899 additions and 64 deletions.
61 changes: 61 additions & 0 deletions docs/tutorial-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Dataset tutorials

The GrIML package is used for the production of the Greenland ice marginal lake inventory series, which is freely available through the [GEUS Dataverse](https://doi.org/10.22008/FK2/MBKW9N). This dataset is a series of annual inventories, mapping the extent and presence of lakes across Greenland that share a margin with the Greenland Ice Sheet and/or the surrounding ice caps and periphery glaciers.

Here, we will look at how to load and handle the dataset, and provide details on its contents.

## Dataset contents

This ice marginal lake dataset is a series of annual inventories, mapping the extent and presence of lakes across Greenland that share a margin with the Greenland Ice Sheet and/or the surrounding ice caps and periphery glaciers. The annual inventories provide a comprehensive record of all identified ice marginal lakes, which have been detected using three independent remote sensing techniques:

- DEM sink detection using the ArcticDEM (mosaic version 3)
- SAR backscatter classification from Sentinel-1 imagery
- Multi-spectral indices classification from Sentinel-2 imagery

All data were compiled and filtered in a semi-automated approach, using a modified version of the [MEaSUREs GIMP ice mask](https://nsidc.org/data/NSIDC-0714/versions/1) to clip the dataset to within 1 km of the ice margin. Each detected lake was then verified manually. The methodology is open-source and provided in the associated [Github repository](https://github.com/GEUS-Glaciology-and-Climate/GrIML) for full reproducibility.

The inventory series was created to better understand the impact of ice marginal lake change on the future sea level budget and the terrestrial and marine landscapes of Greenland, such as its ecosystems and human activities. The dataset is a complete inventory series of Greenland, with no absent data.

### Data format

The detected lakes are presented as polygon vector features in shapefile format (.shp), with coordinates provided in the WGS NSIDC Sea Ice Polar Stereographic North (EPSG:3413) projected coordinate system.

### Metadata

Each inventory in the inventory series contains the following metadata information:

| Variable name | Description | Format |
|---------------------|---------------------|---------|
| `row_id` | Index identifying number for each polygon | Integer |
| `lake_id` | Identifying number for each unique lake | Integer |
| `lake_name`| Lake placename, as defined by the [Oqaasileriffik (Language Secretariat of Greenland)](https://oqaasileriffik.gl) placename database which is distributed with [QGreenland](https://qgreenland.org/) | String |
| `margin` | Type of margin that the lake is adjacent to (`ICE_SHEET`, `ICE_CAP`) | String |
| `region` | Region that lake is located, as defined by Mouginot and Rignot (2019) (`NW`, `NO`, `NE`, `CE`, `SE`, `SW`, `CW`) | String |
| `area_sqkm` | Areal extent of polygon/s in square kilometres | Float |
| `length_km` | Length of polygon/s in kilometres | Float |
| `temp_aver` | Average lake surface temperature estimate (in degrees Celsius), derived from the Landsat 8/9 OLI/TIRS Collection 2 Level 2 surface temperature data product | Float |
| `temp_min` | Minimum pixel lake surface temperature estimate (in degrees Celsius), derived from the Landsat 8/9 OLI/TIRS Collection 2 Level 2 surface temperature data product | Float |
| `temp_max` | Maximum pixel lake surface temperature estimate (in degrees Celsius), derived from the Landsat 8/9 OLI/TIRS Collection 2 Level 2 surface temperature data product | Float |
| `temp_stdev` | Average lake surface temperature estimate standard deviation, derived from the Landsat 8/9 OLI/TIRS Collection 2 Level 2 surface temperature data product | Float |
| `method` | Method of classification (`DEM`, `SAR`, `VIS`) | String |
| `source` | Image source of classification (`ARCTICDEM`, `S1`, `S2`) | String |
| `all_src` | List of all sources that successfully classified the lake (i.e. all classifications with the same `lake_name` value) | String |
| `num_src` | Number of sources that successfully classified the lake (`1`, `2`, `3`) | String |
| `certainty` | Certainty of classification, which is calculated from `all_src` as a score between `0` and `1` | Float | - |
| `start_date` | Start date for classification image filtering | String |
| `end_date` | End date for classification image filtering | String |
| `verified` | Flag to denote if the lake has been manually verified (`Yes`, `No`) | String |
| `verif_by` | Author of verification | String |
| `edited` | Flag to denote if polygon has been manually edited (`Yes`, `No`) | String |
| `edited_by` | Author of manual editing | String |

## Getting started

Loading the dataset: Data available at [GEUS Dataverse](https://doi.org/10.22008/FK2/MBKW9N).

Quicklook plotting of the dataset


## Generating statistics

Extracting statistics
2 changes: 1 addition & 1 deletion src/griml/metadata/assign_id.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ def assign_id(gdf, col_name='unique_id'):
n, ids = connected_components(overlap_matrix)
ids=ids+1

# Assign ids and realign geoedataframe index
# Assign ids and realign geodataframe index
gdf[col_name]=ids
gdf = gdf.sort_values(col_name)
gdf.reset_index(inplace=True, drop=True)
Expand Down
33 changes: 22 additions & 11 deletions src/griml/metadata/assign_names.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from shapely.geometry import Point, LineString, Polygon
from griml.load import load

def assign_names(gdf, gdf_names):
def assign_names(gdf, gdf_names, distance=1000.0):
'''Assign placenames to geodataframe geometries based on names in another
geodataframe point geometries
Expand All @@ -39,13 +39,17 @@ def assign_names(gdf, gdf_names):
names = _compile_names(gdf2)
placenames = gpd.GeoDataFrame({"geometry": list(gdf2['geometry']),
"placename": names})

# Remove invalid geometries
gdf1 = _check_geometries(gdf1)

# Assign names based on proximity
a = _get_nearest_point(gdf1, placenames)
a = _get_nearest_point(gdf1, placenames, distance)

return a


def _get_nearest_point(gdA, gdB, distance=500.0):
def _get_nearest_point(gdA, gdB, distance=1000.0):
'''Return properties of nearest point in Y to geometry in X'''
nA = np.array(list(gdA.geometry.centroid.apply(lambda x: (x.x, x.y))))
nB = np.array(list(gdB.geometry.apply(lambda x: (x.x, x.y))))
Expand All @@ -70,18 +74,25 @@ def _get_indices(mylist, value):
return[i for i, x in enumerate(mylist) if x==value]


def _check_geometries(gdf):
'''Check that all geometries within a geodataframe are valid'''
return gdf.drop(gdf[gdf.geometry==None].index)

def _compile_names(gdf):
'''Get preferred placenames from placename geodatabase'''
placenames=[]
for i,v in gdf.iterrows():
if v['Ny_grønla'] != None:
placenames.append(v['Ny_grønla'])
if v['New Greenl'] != None:
placenames.append(v['New Greenl'])
else:
if v['Dansk'] != None:
placenames.append(v['Dansk'])
if v['Old Greenl'] != None:
placenames.append(v['Old Greenl'])
else:
if v['Alternativ'] != None:
placenames.append(v['Alternativ'])
else:
placenames.append(None)
if v['Danish'] != None:
placenames.append(v['Danish'])
else:
if v['Alternativ'] != None:
placenames.append(v['Alternativ'])
else:
placenames.append(None)
return placenames
58 changes: 27 additions & 31 deletions src/griml/metadata/assign_sources.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

def assign_sources(gdf, col_names=['unique_id', 'source']):
def assign_sources(gdf, col_names=['lake_id', 'source']):
'''Assign source metadata to geodataframe, based on unique lake id and
individual source information
Expand All @@ -17,38 +17,34 @@ def assign_sources(gdf, col_names=['unique_id', 'source']):
gdf : geopandas.GeoDataFrame
Vectors with assigned sources
'''
ids = gdf[col_names[0]].tolist()
source = gdf[col_names[1]].tolist()
satellites=[]

# Construct source list
for x in range(len(ids)):
indx = _get_indices(ids, x)
if len(indx) != 0:
res = []
if len(indx) == 1:
res.append(source[indx[0]].split('/')[-1])
else:
unid=[]
for dx in indx:
unid.append(source[dx].split('/')[-1])
res.append(list(set(unid)))
for z in range(len(indx)):
if len(indx) == 1:
satellites.append(res)
else:
satellites.append(res[0])

# Compile lists for appending
satellites_names = [', '.join(i) for i in satellites]
number = [len(i) for i in satellites]

# Return updated geodataframe
gdf['all_src']=satellites_names
gdf['num_src']=number
all_src=[]
num_src=[]
for idx, i in gdf.iterrows():
idl = i[col_names[0]]
g = gdf[gdf[col_names[0]] == idl]
source = list(set(list(gdf[col_names[1]])))
satellites=''
if len(source)==1:
satellites = satellites.join(source)
num = 1
elif len(source)==2:
satellites = satellites.join(source[0]+', '+source[1])
num = 2
elif len(source)==3:
satellites = satellites.join(source[0]+', '+source[1]+', '+source[2])
num = 3
else:
print('Unknown number of sources detected')
print(source)
satellites=None
num=None
all_src.append(satellites)
num_src.append(num)
satellites
gdf['all_src']=all_src
gdf['num_src']=num_src
return gdf


def _get_indices(mylist, value):
'''Get indices for value in list'''
return[i for i, x in enumerate(mylist) if x==value]
41 changes: 41 additions & 0 deletions src/griml/metadata/iml_abundancy_error_estimate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Sep 26 16:09:20 2024
@author: pho
"""
import geopandas as gpd
import glob
import numpy as np
import pandas as pd
from pathlib import Path
from scipy.sparse.csgraph import connected_components
from scipy.spatial import cKDTree

# Map inventory file locations
gdf_files = '/home/pho/Desktop/python_workspace/GrIML/other/iml_2016-2023/final/checked/*IML-fv1.shp'

# Load inventory point file with lake_id, region, basin-type and placename info
gdf2 = gpd.read_file('/home/pho/Desktop/python_workspace/GrIML/other/iml_2016-2023/manual_validation/iml_manual_validation_with_names.shp')
gdf2_corr = gdf2.drop(gdf2[gdf2.geometry==None].index)


# Iterate across inventory series files
gdfs=[]
for g in list(sorted(glob.glob(gdf_files))):
print(g)
gdf = gpd.read_file(g)
gdf = gdf.dissolve(by='lake_id')
print(len(gdf['geometry']))
gdfs.append(gdf)

dfs = pd.concat(gdfs)
dfs = dfs.dissolve(by='lake_id')
dfs['area_sqkm']=[g.area/10**6 for g in list(dfs['geometry'])]
dfs['length_km']=[g.length/1000 for g in list(dfs['geometry'])]


print('Average lake size: ' + str(dfs.area_sqkm.mean()))

dfs.to_file('/home/pho/Desktop/python_workspace/GrIML/other/iml_2016-2023/final/checked/'+'ALL-ESA-GRIML-IML-MERGED-fv1.shp')
Loading

0 comments on commit 63545fb

Please sign in to comment.