Custom media types + conversion for `Content-Type` #2826

steve-chavez · 2023-06-15T07:12:07Z

Problem

Currently we allow handling raw input for application/json, text/xml, text/plain, application/octet-stream on functions(ref).

However it's inflexible, no other custom media types can be handled. It also doesn't work for tables/views.

Solution

Use a domain-based media type as a function unnamed parameter to do custom conversion. Unlike ref this can work for both functions and tables/views.

Functions interface

CREATE DOMAIN "application/json" as json;
CREATE DOMAIN "application/octet-stream" as bytea;

CREATE FUNCTION mult_them("application/json") RETURNS int AS $$
  SELECT ($1->>'x')::int * ($1->>'y')::int
$$ LANGUAGE SQL;

CREATE FUNCTION upload_binary("application/octet-stream") RETURNS void AS $$
  INSERT INTO files(blob) VALUES ($1);
$$ LANGUAGE SQL;

Tables/views interface

CREATE FUNCTION convert_projects("application/json") RETURNS projects AS $$
-- ...
$$ LANGUAGE SQL;

CREATE FUNCTION convert_projects("text/plain") RETURNS projects AS $$
-- ...
$$ LANGUAGE SQL;


CREATE FUNCTION convert_projects("text/xml") RETURNS projects AS $$
-- ...
$$ LANGUAGE SQL;

e.g. the first function will be used when doing:

POST /projects 
Content-Type: application/json

This means we'll use the unnamed parameter + return type to apply a conversion function for a particular table/view.

Inside the function a json_to_recordset can be used (this is what we do internally). With this interface the user can override our default behavior.

Thanks to overloaded functions the same function name can be used to handle different media types.

Notes

Consider dropping Functions with a single unnamed parameter after solving this. Advantages:
- This new interface is more intuitive and less magical.
- Simplify error messages.
- Simplify searching the rpc function.
- It will also be consistent with SQL handlers for custom media types #2825.
- The forward migration is relatively simple - changing a parameter type.
The domain-based media types idea comes from Add pgrst.accept setting to RPC #1582 and Data Representations #2523. (long threads) .

I did not have the time, yet, to follow along what exactly has been implemented for Accept already. Am I right that currently it's only possible to say RETURNS "media/type" for functions using a domain media/type? So we don't support any of the CAST idea, yet, but all the pieces are in place to be able to extend that with casts later on?

Assuming the above is correct, then:

The functions interface suggested in this issue seems like exactly the same thing, just for input via Content-Type. It will be extensible the same way with CASTs. 👍
However, the proposed table/view interface looks different. In which schema are those functions defined? If they are defined in the exposed schema, then what's the difference between those functions and a regular RPC ("function interface style")? If they are not in the exposed schema, how do we detect them?

The answer to those questions in the second paragraph is: Casts. Those functions are effectively implementing a cast, so we should create proper casts for them and detect them via casts. Then they can be located anywhere, don't conflict with RPCs, etc.

I would suggest to do the function interface first, and then extend later on.

steve-chavez · 2023-11-16T15:08:42Z

Am I right that currently it's only possible to say RETURNS "media/type" for functions using a domain media/type?
So we don't support any of the CAST idea, yet, but all the pieces are in place to be able to extend that with casts later on?

Yes (for aggregates too (#2825 (comment))) and yes.

I would suggest to do the function interface first, and then extend later on.

Agree, that sounds good.

steve-chavez · 2024-01-11T17:23:26Z

The answer to those questions in the second paragraph is: Casts. Those functions are effectively implementing a cast, so we should create proper casts for them and detect them via casts. Then they can be located anywhere, don't conflict with RPCs, etc.

Oh, thinking more about the above. Wouldn't the interface for media type handlers be simpler too if it's defined via CASTs?

So the example on https://postgrest.org/en/stable/references/api/media_type_handlers.html#handlers-for-tables-views would be:

create or replace function twkb_handler_transition (state bytea, next lines)
returns bytea as $$
  select state || st_astwkb(next.geom);
$$ language sql;

create or replace aggregate twkb_agg (lines) (
  initcond = ''
, stype = bytea
, sfunc = twkb_handler_transition
);

CREATE OR REPLACE FUNCTION twkb(lines) RETURNS "application/vnd.twkb" AS $$
  select twkb_agg($1);
$$ LANGUAGE SQL IMMUTABLE;

CREATE CAST (lines AS "application/vnd.twkb") WITH FUNCTION twkb(lines) AS IMPLICIT;

That way the aggregate is an internal detail and the media type is not at all involved in its definition. I believe it's much more clear despite having to define one more function too.

@wolfgangwalther WDYT? This would be a breaking change on current media type handlers, but not for functions only for tables (which doesn't seem to be used that much yet).

steve-chavez · 2024-01-12T18:34:51Z

Damn, the above makes sense but the current query doesn't work:

WITH pgrst_source AS ( SELECT "test"."lines".* FROM "test"."lines")
SELECT
    null::bigint AS total_result_set,
    pg_catalog.count(_postgrest_t) AS page_total,
    twkb(_postgrest_t::"test"."lines") AS body,
    nullif(current_setting('response.headers', true), '') AS response_headers,
    nullif(current_setting('response.status', true), '') AS response_status,
    '' AS response_inserted
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

ERROR:  column "_postgrest_t.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 5:     twkb(_postgrest_t::"test"."lines") AS body,

And an aggregate cannot be used as cast function:

CREATE CAST (lines AS "application/vnd.twkb") WITH FUNCTION twkb_agg(lines) AS IMPLICIT;

ERROR:  cast function must be a normal function

steve-chavez · 2024-01-12T19:01:27Z

The only way I see to make the CAST abstraction work is to scan the body of the CREATE FUNCTION twkb and apply the internal function ourselves. Too bad that pg can't see that there's an aggregate inside the SQL function.

steve-chavez · 2024-01-12T20:17:02Z

and apply the internal function ourselves

Maybe that's not too bad, we can apply the function body directly like:

WITH pgrst_source AS ( SELECT "test"."lines".* FROM "test"."lines")
SELECT
    null::bigint AS total_result_set,
    pg_catalog.count(_postgrest_t) AS page_total,
    (select twkb_agg(_postgrest_t::"test"."lines")) AS body,
    nullif(current_setting('response.headers', true), '') AS response_headers,
    nullif(current_setting('response.status', true), '') AS response_status,
    '' AS response_inserted
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

That also lets apply functions on top of the aggregation, which is not possible with the current interface. This makes it possible to do select coalesce(json_agg($1),[])->0 to override application/vnd.pgrst.object.

The any handler interface would have to change though, since the aggregate final function would no longer be used for setting the response headers.

wolfgangwalther · 2024-01-15T17:04:13Z

The "correct" way to do this, would be to use array_agg(...) all the time, once you hit a CAST, so something like this:

-- note the argument type is now an array
CREATE OR REPLACE FUNCTION twkb(lines[]) RETURNS "application/vnd.twkb" AS $$
  -- whatever you need to do to the array
$$ LANGUAGE SQL IMMUTABLE;

CREATE CAST (lines[] AS "application/vnd.twkb") WITH FUNCTION twkb(lines[]) AS IMPLICIT;

WITH pgrst_source AS ( SELECT "test"."lines".* FROM "test"."lines")
SELECT
    null::bigint AS total_result_set,
    pg_catalog.count(_postgrest_t) AS page_total,
    twkb(array_agg(_postgrest_t::"test"."lines"))) AS body,
    nullif(current_setting('response.headers', true), '') AS response_headers,
    nullif(current_setting('response.status', true), '') AS response_status,
    '' AS response_inserted
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

That means the user can either provide custom aggregates, probably mostly useful when implemented in C for performance (i.e. pass through the full response once instead of twice with the cast). Or they can provide the cast, in which case we still do the aggregation for them beforehand, but in a type-preserving way via array_agg(). This way you can still create different casts for different endpoints.

steve-chavez · 2024-01-15T21:26:28Z

The "correct" way to do this, would be to use array_agg(...) all the time, once you hit a CAST, so something like this:

How would that work? I'm guessing like:

CREATE OR REPLACE FUNCTION twkb(lines[]) RETURNS "application/vnd.twkb" AS $$
  select twkb_agg(x) from unnest($1) x;
$$ LANGUAGE SQL IMMUTABLE;

Wouldn't it be wasteful to aggregate the array (twkb(array_agg(_postgrest_t::"test"."lines"))) and unnest it again?

AFAICT, operations over arrays on pg always require unnesting, there's no array.map() function (ref).

That means the user can either provide custom aggregates, probably mostly useful when implemented in C for performance. Or they can provide the cast

Hm, ideally handling both content-type and accept would use the same interface with CASTs. Otherwise this complicates the docs and the implementation.

Currently it doesn't look good how we mix the media type with the aggregate definition.

wolfgangwalther · 2024-01-16T07:28:48Z

Wouldn't it be wasteful to aggregate the array (twkb(array_agg(_postgrest_t::"test"."lines"))) and unnest it again?

AFAICT, operations over arrays on pg always require unnesting, there's no array.map() function (ref).

What makes you think that unnest/agg is less efficient than a map function if that was implemented? I have no idea whether it's inefficient, but unnest/agg just seems to be idiomatic SQL code. So certainly not bad. Also note that this is a special case for a cast function written in SQL. For example the function given in #3160 (comment) would be much more efficient with array_agg. Currently it uses a poor-man's aggregation via json/jsonb, but that's just because that's the only way to do it right now. An aggregation via array_agg would be much better - and then the python code doesn't care how it receives the data.

Hm, ideally handling both content-type and accept would use the same interface with CASTs.

Agreed. But the example in #2826 (comment) is insufficient:

CREATE FUNCTION convert_projects("application/json") RETURNS projects AS $$
-- ...
$$ LANGUAGE SQL;

This would only ever allow a single project to be returned. What if the json document contained multiple projects? I'm quite sure that a cast can't return a SET OF. So the only thing you could return would be projects[] - and then PostgREST would have to unnest that to insert etc.

This is exactly the opposite of array_agg.

So basically:

Casts should work on arrays. Or rather: If a cast takes or returns an array, we automagically add array_agg / unnest respectively. In the future, we could add chains of casts, too, as discussed earlier in connection with data reps. Those would not need to be on arrays.
The function interface can return SET OF. The corresponding output (i.e. "how to transform a set into a scalar?") are.. aggregates.

Very symmetric.

steve-chavez added the idea Needs of discussion to become an enhancement, not ready for implementation label Jun 15, 2023

steve-chavez mentioned this issue Jun 15, 2023

Add pgrst.accept setting to RPC #1582

Closed

steve-chavez mentioned this issue Aug 14, 2023

CSV bulk insert performance #1206

Open

steve-chavez mentioned this issue Oct 25, 2023

SQL handlers for custom media types #2825

Merged

steve-chavez added enhancement a feature, ready for implementation and removed idea Needs of discussion to become an enhancement, not ready for implementation labels Oct 26, 2023

steve-chavez self-assigned this Oct 26, 2023

steve-chavez added idea Needs of discussion to become an enhancement, not ready for implementation and removed enhancement a feature, ready for implementation labels Nov 16, 2023

steve-chavez mentioned this issue Dec 6, 2023

Support for ordering by distance (using <-> operator) #2989

Open

wolfgangwalther mentioned this issue Jan 15, 2024

Returning mimetypes via aggregates only works for select=* #3160

Closed

steve-chavez mentioned this issue Jan 16, 2024

Prefer:resolution/missing options should be inside a Content-Type #3164

Open

steve-chavez mentioned this issue Feb 14, 2024

Partial document update based on JSON Patch #3166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom media types + conversion for `Content-Type` #2826

Custom media types + conversion for `Content-Type` #2826

steve-chavez commented Jun 15, 2023 •

edited

Loading

steve-chavez commented Nov 16, 2023

wolfgangwalther commented Nov 16, 2023

steve-chavez commented Nov 16, 2023

steve-chavez commented Jan 11, 2024

steve-chavez commented Jan 12, 2024

steve-chavez commented Jan 12, 2024

steve-chavez commented Jan 12, 2024 •

edited

Loading

wolfgangwalther commented Jan 15, 2024

steve-chavez commented Jan 15, 2024

wolfgangwalther commented Jan 16, 2024

Custom media types + conversion for Content-Type #2826

Custom media types + conversion for Content-Type #2826

Comments

steve-chavez commented Jun 15, 2023 • edited Loading

Problem

Solution

Functions interface

Tables/views interface

Notes

Related

steve-chavez commented Nov 16, 2023

wolfgangwalther commented Nov 16, 2023

steve-chavez commented Nov 16, 2023

steve-chavez commented Jan 11, 2024

steve-chavez commented Jan 12, 2024

steve-chavez commented Jan 12, 2024

steve-chavez commented Jan 12, 2024 • edited Loading

wolfgangwalther commented Jan 15, 2024

steve-chavez commented Jan 15, 2024

wolfgangwalther commented Jan 16, 2024

Custom media types + conversion for `Content-Type` #2826

Custom media types + conversion for `Content-Type` #2826

steve-chavez commented Jun 15, 2023 •

edited

Loading

steve-chavez commented Jan 12, 2024 •

edited

Loading