Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETL Pipe for download and ingest EEZ [MARXAN-1618] #1121

Draft
wants to merge 6 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,9 @@ generate-geo-test-data: extract-geo-test-data
# Don't forget to run make clean-slate && make start-api before repopulating the whole db
# This will delete all existing data and create tables/views/etc. through the migrations that
# run when starting up the API service.
# if you want to test or run an individual pipe please do like this:
# docker-compose -p marxan-cloud -f ./data/docker-compose-data_management.yml up --no-start --build marxan-seed-data marxan-seed-data
# docker-compose -p marxan-cloud -f ./data/docker-compose-data_management.yml run marxan-seed-data make seed-eez
# Also, be sure to create a user before importing the geodata, otherwise it will fail with an
# unrelated error message
seed-geodb-data:
Expand Down
7 changes: 6 additions & 1 deletion data/data_download/Makefile
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
.DEFAULT_GOAL := seed-data


seed-data: seed-gadm seed-wdpa seed-demo-features-species seed-demo-features-bioregion seed-ecosystems
seed-data: seed-gadm seed-eez seed-wdpa seed-demo-features-species seed-demo-features-bioregion seed-ecosystems

seed-gadm:
@echo "Starting seeding gadm data... "
@time $(MAKE) -C ./gadm_3.6 import

seed-eez:
@echo "Starting seeding gadm data... "
aagm marked this conversation as resolved.
Show resolved Hide resolved
@time $(MAKE) -C ./eez import

seed-wdpa:
@echo "Starting seeding wdpa data... "
@time $(MAKE) -C ./wdpa import
Expand Down
49 changes: 49 additions & 0 deletions data/data_download/eez/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
.PHONY: import
# MAKEFLAGS := --jobs=$(shell nproc)
# MAKEFLAGS += --output-sync=target

MarxanUser:=$(shell psql -X -A -t "postgresql://$$API_POSTGRES_USER:$$API_POSTGRES_PASSWORD@$$API_POSTGRES_HOST:$$API_POSTGRES_PORT/$$API_POSTGRES_DB" -c "select id from users limit 1")
import: data/eez/eez_v11_simp.geojson
ogr2ogr -makevalid \
-update -append \
-geomfield the_geom \
--config OGR_TRUNCATE NO \
-nln admin_regions -nlt PROMOTE_TO_MULTI \
-t_srs EPSG:4326 -a_srs EPSG:4326 \
-f PostgreSQL PG:"dbname=$$GEO_POSTGRES_DB host=$$GEO_POSTGRES_HOST \
port=$$GEO_POSTGRES_PORT user=$$GEO_POSTGRES_USER password=$$GEO_POSTGRES_PASSWORD" $< \
-sql "select *,'$(MarxanUser)' as created_by from \"$$(basename -s .geojson "$<")\"";

data/eez/eez_v11_simp.geojson: data/eez/World_EEZ_v11_20191118/eez_v11.shp
mapshaper-xl -i $< snap \
-simplify 25% planar keep-shapes \
-filter-islands min-vertices=3 min-area=10000m2 remove-empty \
-filter-slivers min-area=10000m2 remove-empty \
-clean rewind \
-each 'level= "eez"' \
-rename-fields NAME_0=GEONAME,ISO3=ISO_SOV1 \
-drop fields=MRGID,MRGID_TER1,MRGID_SOV1,TERRITORY1,ISO_TER1,SOVEREIGN1,MRGID_TER2,MRGID_SOV2,TERRITORY2,ISO_TER2,SOVEREIGN2,MRGID_TER3,MRGID_SOV3,TERRITORY3,ISO_TER3,SOVEREIGN3,X_1,Y_1,AREA_KM2,ISO_SOV2,ISO_SOV3,UN_SOV1,UN_SOV2,UN_SOV3,UN_TER1,UN_TER2,UN_TER3 \
-o $@ format=geojson force ndjson && \
rm -rf data/eez/World_EEZ_v11_20191118

data/eez/World_EEZ_v11_20191118/eez_v11.shp: data/eez/World_EEZ_v11_20191118.zip
unzip -u $< -d data/eez

data/eez/World_EEZ_v11_20191118.zip: data/eez
cd $< && \
curl --location --request POST 'https://www.marineregions.org/download_file.php?name=World_EEZ_v11_20191118.zip' \
--header 'Cookie: PHPSESSID=870e305efbc0519d59b361427dbd8336; vliz_webc=vliz_webc1' \
--form 'name="Jen"' \
--form 'organisation="TNC"' \
--form 'email="[email protected]"' \
--form 'country="EEUU"' \
--form 'user_category="academia"' \
--form 'purpose_category="Conservation"' \
--form 'agree="1"' \
--output './World_EEZ_v11_20191118.zip'
Comment on lines +34 to +43
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I had missed parts of this PR while I reviewed it - probably my tab wasn't refreshed when you pushed later changes... this bit, I recommend to handle it differently, without hardcoding cookie and most of the form data such as name, org, email, country, user_category, purpose_category

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to do this... my first thought would be to move the whole curl command to a simple shell script that is committed to the repo with placeholders, and needs to be copied to a specific file (that should be added to .gitignore, similarly to env.default vs .env), mounted in the eez ingestion container, and executed in the eez Makefile


data/eez:
mkdir -p $@

clean:
rm -rf data/eez/
1 change: 1 addition & 0 deletions data/docker-compose-data_management.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ services:
dockerfile: Dockerfile
volumes:
- './data/seed/gadm_3.6/:/gadm_3.6/data/'
- './data/seed/eez/:/eez/data/'
- './data/seed/wdpa/:/wdpa/data/'
- './data/seed/iucn/:/iucn/data/'
- './data/seed/world_terrestrial_ecosystems/:/world_terrestrial_ecosystems/data/'
Expand Down