-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add new custom_sql_filter parameter (#180)
* feat: add new custom_sql_filter parameter * fix: add None check for custom filters * fix: change file hash generation * chore: add tests for custom sql filtering * fix: add missing custom sql filters to a prefiltering step * chore: add new test scenario * chore: update progress bar logic during multiprocessing startup * feat: add custom sql filter example notebook * chore: add changelog entry
- Loading branch information
Showing
8 changed files
with
363 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,202 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Custom SQL filter\n", | ||
"\n", | ||
"**QuackOSM** enables advanced users to filter data using SQL filters that will be used by DuckDB during processing.\n", | ||
"\n", | ||
"The filter will be loaded alongside with [OSM tags filters](../osm_tags_filter/) and features IDs filters. \n", | ||
"\n", | ||
"SQL filter clause will can be passed both in Python API (as `custom_sql_filter` parameter) and the CLI (as `--custom-sql-filter` option).\n", | ||
"\n", | ||
"Two columns available to users are: `id` (type `BIGINT`) and `tags` (type: `MAP(VARCHAR, VARCHAR)`).\n", | ||
"\n", | ||
"You can look for available functions into a [DuckDB documentation](https://duckdb.org/docs/sql/functions/overview).\n", | ||
"\n", | ||
"Below are few examples on how to use the custom SQL filters." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Features with exactly 10 tags\n", | ||
"\n", | ||
"Here we will use `cardinality` function dedicated to the `MAP` type.\n", | ||
"\n", | ||
"More `MAP` functions are available [here](https://duckdb.org/docs/sql/functions/map)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import quackosm as qosm\n", | ||
"\n", | ||
"data = qosm.convert_geometry_to_geodataframe(\n", | ||
" geometry_filter=qosm.geocode_to_geometry(\"Greater London\"),\n", | ||
" osm_extract_source=\"Geofabrik\",\n", | ||
" custom_sql_filter=\"cardinality(tags) = 10\",\n", | ||
")\n", | ||
"data[\"tags\"].head(10).values" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"print(\"All have exactly 10 tags:\", (data[\"tags\"].str.len() == 10).all())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Features with ID divisible by 13 and starting wit a number 6\n", | ||
"\n", | ||
"Here we will operate on the `ID` column.\n", | ||
"\n", | ||
"More `NUMERIC` functions are available [here](https://duckdb.org/docs/sql/functions/numeric).\n", | ||
"\n", | ||
"More `STRING` functions are available [here](https://duckdb.org/docs/sql/functions/char)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"data = qosm.convert_geometry_to_geodataframe(\n", | ||
" geometry_filter=qosm.geocode_to_geometry(\"Greater London\"),\n", | ||
" osm_extract_source=\"Geofabrik\",\n", | ||
" custom_sql_filter=\"id % 13 = 0 AND starts_with(id::STRING, '6')\",\n", | ||
")\n", | ||
"data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"print(\"All starting with digit 6:\", data.index.map(lambda x: x.split(\"/\")[1].startswith(\"6\")).all())\n", | ||
"print(\"All divisible by 13:\", data.index.map(lambda x: (int(x.split(\"/\")[1]) % 13) == 0).all())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Find features that have all selected tags present\n", | ||
"\n", | ||
"When using `osm_tags_filter` with value `{ \"building\": True, \"historic\": True, \"name\": True }`, the result will contain every feature that have at least one of those tags.\n", | ||
"\n", | ||
"Positive tags filters are combined using an `OR` operator. You can read more about it [here](../osm_tags_filter/).\n", | ||
"\n", | ||
"To get filters with `AND` operator, the `custom_sql_filter` parameter has to be used.\n", | ||
"\n", | ||
"To match a list of keys against given values we have to use list-related functions.\n", | ||
"\n", | ||
"More `LIST` functions are available [here](https://duckdb.org/docs/sql/functions/list)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"data = qosm.convert_geometry_to_geodataframe(\n", | ||
" geometry_filter=qosm.geocode_to_geometry(\"Greater London\"),\n", | ||
" osm_extract_source=\"Geofabrik\",\n", | ||
" custom_sql_filter=\"list_has_all(map_keys(tags), ['building', 'historic', 'name'])\",\n", | ||
")\n", | ||
"data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"tags_names = [\"name\", \"building\", \"historic\"]\n", | ||
"for tag_name in tags_names:\n", | ||
" data[tag_name] = data[\"tags\"].apply(lambda x, tag_name=tag_name: x.get(tag_name))\n", | ||
"data[[*tags_names, \"geometry\"]].explore(tiles=\"CartoDB DarkMatter\", color=\"orange\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Regex search to find streets starting with word New or Old\n", | ||
"\n", | ||
"*(If you really need to)* You can utilize regular expressions on a tag value (or key) to find some specific examples.\n", | ||
"\n", | ||
"More `REGEX` functions are available [here](https://duckdb.org/docs/sql/functions/regular_expressions)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"data = qosm.convert_geometry_to_geodataframe(\n", | ||
" geometry_filter=qosm.geocode_to_geometry(\"Greater London\"),\n", | ||
" osm_extract_source=\"Geofabrik\",\n", | ||
" custom_sql_filter=\"\"\"\n", | ||
" list_has_all(map_keys(tags), ['highway', 'name'])\n", | ||
" AND regexp_matches(tags['name'][1], '^(New|Old)\\s\\w+')\n", | ||
" \"\"\",\n", | ||
")\n", | ||
"data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"ways_only = data[data.index.str.startswith(\"way/\")]\n", | ||
"ways_only[\"name\"] = ways_only[\"tags\"].apply(lambda x: x[\"name\"])\n", | ||
"ways_only[\"prefix\"] = ways_only[\"name\"].apply(lambda x: x.split()[0])\n", | ||
"ways_only[[\"name\", \"prefix\", \"geometry\"]].explore(\n", | ||
" tiles=\"CartoDB DarkMatter\", column=\"prefix\", cmap=[\"orange\", \"royalblue\"]\n", | ||
")" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": ".venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.