Add aggregators (#35)

* fix: better syntax for sort * feat: upgrade postgrest and allow aggregates * feat: add aggregators * refactor: loop for parameters descriptions * feat: add tests for aggregators * fix: restore previous behaviour * fix: lint * docs: update changelog * docs: update readme * docs: add missing hint types * refactor: remove default __id side sort to allow sort with aggregation * refactor: return 400 if argument could not be parsed, stricter than before * refactor: adapt tests * fix: lint
datagouv · Nov 27, 2024 · 1360f83 · 1360f83
1 parent f5afe16
commit 1360f83
Show file tree

Hide file tree

Showing 8 changed files with 263 additions and 114 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,7 +2,7 @@
 
 ## Current (in progress)
 
-- Nothing yet
+- Handle queries with aggregators [#35](https://github.com/datagouv/api-tabular/pull/35)
 
 ## 0.2.1 (2024-11-21)
 

diff --git a/README.md b/README.md
@@ -112,7 +112,7 @@ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/da
 }
 ```
 
-This endpoint can be queried with the following operators as query string (replacing `column_name` with the name of an actual column):
+This endpoint can be queried with the following operators as query string (replacing `column_name` with the name of an actual column), if the column type allows it (see the swagger for each column's allowed parameter):
 
 ```
 # sort by column
@@ -142,7 +142,26 @@ column_name__strictly_less=value
 
 # strictly greater
 column_name__strictly_greater=value
+
+# group by values
+column_name__groupby
+
+# count values
+column_name__count
+
+# mean / average
+column_name__avg
+
+# minimum
+column_name__min
+
+# maximum
+column_name__max
+
+# sum
+column_name__sum
 ```
+> NB : passing an aggregation operator (`count`, `avg`, `min`, `max`, `sum`) returns a column that is named `<column_name>__<operator>` (for instance: `?birth__groupby&score__sum` will return a list of dicts with the keys `birth` and `score__sum`).
 
 For instance:
 ```shell
@@ -185,6 +204,31 @@ returns
 }
 ```
 
+With filters and aggregators (filtering is always done **before** aggregation, no matter the order in the parameters):
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?decompte__groupby&birth__less=1996&score__avg
+```
+i.e. `decompte` and average of `score` for all rows where `birth<="1996"`, grouped by `decompte`, returns
+```json
+{
+    "data": [
+        {
+            "decompte": 55,
+            "score__avg": 0.7123333333333334
+        },
+        {
+            "decompte": 27,
+            "score__avg": 0.6068888888888889
+        },
+        {
+            "decompte": 23,
+            "score__avg": 0.4603333333333334
+        },
+        ...
+    ]
+}
+```
+
 Pagination is made through queries with `page` and `page_size`:
 ```shell
 curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=30

diff --git a/api_tabular/app.py b/api_tabular/app.py
@@ -96,8 +96,8 @@ async def resource_data(request):
 
     try:
         sql_query = build_sql_query_string(query_string, page_size, offset)
-    except ValueError:
-        raise QueryException(400, None, "Invalid query string", "Malformed query")
+    except ValueError as e:
+        raise QueryException(400, None, "Invalid query string", f"Malformed query: {e}")
 
     resource = await get_resource(request.app["csession"], resource_id, ["parsing_table"])
     response, total = await get_resource_data(request.app["csession"], resource, sql_query)