Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computed relationships - First-class computed/virtual columns - a generalization of foreign-key-based resource embedding #2144

Closed
wolfgangwalther opened this issue Feb 1, 2022 · 17 comments · Fixed by #2419
Labels
embedding resource embedding enhancement a feature, ready for implementation

Comments

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Feb 1, 2022

Problem

This builds on some ideas we had earlier:

@steve-chavez understood resource embedding as functions:

So, currently we support embedding - or joining - with the embed(col1,col2) syntax. A join is an operator(according to relational algebra) and thus: a function.

And I gave a practical example, showing the similarities:

create function clients (projects) returns clients
stable language sql as $$
  select * from clients where id = $1.client_id
$$;

Because this function has only 1 argument of type projects it can be selected as a virtual column from the projects table. The url would look like this: GET /projects?select=*,clients. The result is the same as the embedding we provide: GET /projects?select=*,clients(*).

@steve-chavez then suggested this as the way forward to do custom embeddings.

However, the current support for virtual columns is only implicit through field notation. This has several drawbacks for embedding, i.e. the following can only be done through fk-based embedding right now:

  • ability to select only a subset of columns
  • nesting
  • performance: inlining of o2m and m2m queries (doesn't work with a real function, because the aggregation would prevent it)

Additionally it's currently impossible to use a virtual column that has the same name as a real column of the same relation - field notation will always choose the latter.


Basic implementation

To solve these issues, I propose to introduce first-class support for virtual columns: When creating the schema cache, we should parse function definitions to detect virtual columns and then make them callable the same way as resource embedding is currently done.

The good thing: This is actually much simpler than it sounds. We need to add relationships based on function definitions in:

let rels = addO2MRels . addM2MRels $ addViewM2ORels srcCols m2oRels

Those relationships would have a cardinality of M2O or O2M depending whether the function returns multiple rows or not. This would generate the right query for aggregation here:

getJoinsSelects :: ReadRequest -> ([SQL.Snippet], [SQL.Snippet]) -> ([SQL.Snippet], [SQL.Snippet])
getJoinsSelects rr@(Node (_, (name, Just Relationship{relCardinality=card,relTable=Table{tableName=table}}, alias, _, joinType, _)) _) (joins,selects) =
let subquery = readRequestToQuery rr in
case card of
M2O _ ->
let aliasOrName = fromMaybe name alias
localTableName = pgFmtIdent $ table <> "_" <> aliasOrName
sel = SQL.sql ("row_to_json(" <> localTableName <> ".*) AS " <> pgFmtIdent aliasOrName)
joi = (if joinType == Just JTInner then " INNER" else " LEFT")
<> " JOIN LATERAL( " <> subquery <> " ) AS " <> SQL.sql localTableName <> " ON TRUE " in
(joi:joins,sel:selects)
_ -> case joinType of
Just JTInner ->
let aliasOrName = fromMaybe name alias
locTblName = table <> "_" <> aliasOrName
localTableName = pgFmtIdent locTblName
internalTableName = pgFmtIdent $ "_" <> locTblName
sel = SQL.sql $ localTableName <> "." <> internalTableName <> " AS " <> pgFmtIdent aliasOrName
joi = "INNER JOIN LATERAL(" <>
"SELECT json_agg(" <> SQL.sql internalTableName <> ") AS " <> SQL.sql internalTableName <>
"FROM (" <> subquery <> " ) AS " <> SQL.sql internalTableName <>
") AS " <> SQL.sql localTableName <> " ON " <> SQL.sql localTableName <> "IS NOT NULL" in
(joi:joins,sel:selects)
_ ->
let sel = "COALESCE (("
<> "SELECT json_agg(" <> SQL.sql (pgFmtIdent table) <> ".*) "
<> "FROM (" <> subquery <> ") " <> SQL.sql (pgFmtIdent table) <> " "
<> "), '[]') AS " <> SQL.sql (pgFmtIdent (fromMaybe name alias)) in
(joins,sel:selects)
getJoinsSelects (Node (_, (_, Nothing, _, _, _, _)) _) _ = ([], [])

We would then extend the Cardinality type to hold either the FKConstraint or a QualifiedIdentifier pointing to the virtual column's function. This would allow addRels to add the type of embedding (virtual vs fk-based) to the ReadRequest in:

let newFrom r = if qiName tbl == nodeName then tableQi (relForeignTable r) else tbl
newReadNode = (\r -> (query{from=newFrom r}, (nodeName, Just r, alias, hint, joinType, depth))) <$> rel
rel = findRel schema allRels (qiName parentNodeQi) nodeName hint
in
Node <$> newReadNode <*> (updateForest . hush $ Node <$> newReadNode <*> pure forest)

And based on this, we can then make sure that the actual SELECT from the embedded table is changed to call the function instead:

readRequestToQuery :: ReadRequest -> SQL.Snippet
readRequestToQuery (Node (Select colSelects mainQi tblAlias implJoins logicForest joinConditions_ ordts range, _) forest) =
"SELECT " <>
intercalateSnippet ", " ((pgFmtSelectItem qi <$> colSelects) ++ selects) <>
"FROM " <> SQL.sql (BS.intercalate ", " (tabl : implJs)) <> " " <>
intercalateSnippet " " joins <> " " <>
(if null logicForest && null joinConditions_ then mempty else "WHERE " <> intercalateSnippet " AND " (map (pgFmtLogicTree qi) logicForest ++ map pgFmtJoinCondition joinConditions_))

This currently creates a subquery like the following:

SELECT "test"."clients".* FROM "test"."clients"  WHERE "test"."projects"."client_id" = "test"."clients"."id" 

and we'd need to make this:

SELECT "virtual_column".* FROM "test"."virtual_column"("projects") 

The result would be embedding based on virtual columns, with the full support of features we currently have: Nesting, selecting subfields, inlining for performance - there is basically no difference, afaict. It should even be possible to use hints, if we fill the Relationship fields in a smart way.


Consequences for Embedding in general

This will solve #591, #1179 and #1230 immediately and would provide a workaround solution for #1643, #2566, #1907 and #1984 - basically any issue with disambiguation, because you can always roll your own in a more explicit way.

Furthermore, we could remove some of the auto-detected fk-based relationships easily, that have recently posed problems with disambiguation in general. We could restrict M2M relationships to simple junction tables with only FK columns, we might decide to remove self-references completely, because those create a lot of challenges on their own. Those can all be replaced with explicit virtual column embedding.

Taking this one step further, we could add a config option db-fk-embedding, which defaults to true. Setting this to false would disable fk-based embedding entirely. This would remove our dependency on constraint names, at least partially, getting us one step closer to true schema isolation. We'd still expose those in error messages for failed constraints, I think, but at least our query API wouldn't depend on it. This would also allow users to be very explicit in which embeddings they want to allow - and would also guarantee, that further improvements in our view-parsing will not break existing queries when updating PostgREST due to new embedding possibilities resulting in a "can't disambiguate" response. This would solve #2070.

Further generalization for function calls

At this stage, ?select=*,func(*) is essentially a function call, which takes the current row as the only argument and allows selecting partial output.

Scalars

The next step would be to allow ?select=*,func() to be used to call functions which return a scalar instead of a composite type. This would solve the problem mentioned above, regarding real and virtual columns with the same name on a given table.

Aggregates

Given that we have a lot of similarities with those virtual columns and the field notation call syntax, we can think of both of the following to be the same (pseudo-code):

GET /projects?select=*,clients(*)
GET /projects?select=*,projects.clients

We don't need the table prefix, because we're calling the function in a a better way, but we can take a similar looking approach to pass columns instead of rows as the only argument to a function:

GET /employees?select=salary.sum()

This would be the same as SELECT sum(employees.salary) FROM employees, essentially.

This is very similar to this suggestion on how to call aggregate functions:

GET /employees?select=total_salaray&total_salaray=sum.salary

However, the new proposal has less boilerplate, less indirection and is more consistent with how we treat functions as virtual columns. It also avoids potential name conflicts with column filters on the operator etc.

It would still be possible to implement the FILTER or HAVING clauses:

GET /users?select=males:count(),non_betas:count()&males?gender=eq.male&non_betas?beta=is.true&non_betas?active=is.false

Even more

It should be possible to extend these concepts to:

@wolfgangwalther wolfgangwalther added enhancement a feature, ready for implementation embedding resource embedding labels Feb 1, 2022
@steve-chavez
Copy link
Member

First-class computed/virtual columns are a sure win!

Taking this one step further, we could add a config option db-fk-embedding, which defaults to true. Setting this to false would disable fk-based embedding entirely

Likely better to discuss this on a different issue, but would thatdb-fk-embedding=false disable automatic resource embedding entirely? Would all embeds require a function?

@wolfgangwalther
Copy link
Member Author

Likely better to discuss this on a different issue, but would thatdb-fk-embedding=false disable automatic resource embedding entirely? Would all embeds require a function?

Yes, yes and yes.

@steve-chavez
Copy link
Member

steve-chavez commented Jun 10, 2022

Another thing that's good about this one is that it could be used to override our detected relationships as well, so it provides a way forward from breaking changes in embedding. For example, if we start detecting one-to-one relationships users could go back to redefining as a one-to-many.

It's much better than doing some postgREST specific stuff(like on this proposal) to override relationships because it reduces user cognitive load.

Those relationships would have a cardinality of M2O or O2M depending whether the function returns multiple rows or not

So a function that RETURNS SETOF tbl vs just RETURNS tbl. Cool, I'll give this a try.

It should even be possible to use hints, if we fill the Relationship fields in a smart way.

IIUC, this would not need hints because the virtual column would be unambiguous as it's explicitly defining the join with the table and columns.


We would then extend the Cardinality type to hold either the FKConstraint or a QualifiedIdentifier pointing to the virtual column's function.

 data Cardinality 
   = O2M {relConnection :: Either FKConstraint ComputedColumn} 
   -- ^ one-to-many 
   | M2O {relConnection :: Either FKConstraint ComputedColumn} 
   -- ^ many-to-one 
   | M2M Junction 
   -- ^ many-to-many 
   deriving (Eq, Ord, Generic, JSON.ToJSON) 

  type ComputedColumn = QualifiedIdentifier
  newtype FKConstraint = FKConstraint Text [(FieldName, FieldName)]
-- | Relationship between two tables.
data Relationship = Relationship
  { relTable        :: QualifiedIdentifier
  , relForeignTable :: QualifiedIdentifier
  , relIsSelf       :: Bool -- ^ Whether is a self relationship
  , relCardinality  :: Cardinality
  }
  deriving (Eq, Ord, Generic, JSON.ToJSON)

The target name, the relTable and relForeginTable can be inferred from the function as:

create function target_name(relTable) returns relForeignTable

Kinda makes me think that targetName should be an attribute of Relationship instead of computing it dinamically in findRel

@steve-chavez steve-chavez changed the title First-class computed/virtual columns - a generalization of foreign-key-based resource embedding Computed relationships - First-class computed/virtual columns - a generalization of foreign-key-based resource embedding Aug 4, 2022
@steve-chavez
Copy link
Member

steve-chavez commented Aug 5, 2022

Came up with this query:

-- Adding this on the fixtures:
-- CREATE FUNCTION manual_client_rel(test.projects) RETURNS SETOF test.clients AS $$
--  SELECT * FROM test.clients WHERE id = $1.client_id;
--$$ LANGUAGE sql STABLE ROWS 1;

with
all_relations as (
  select reltype
  from pg_class
  where relkind in ('v','r','m','f','p')
)
select
  p.pronamespace::regnamespace as schema,
  p.proname as name,
  p.proargtypes[0]::regtype as rel_table,
  p.prorettype::regtype as rel_ftable,
  p.prorows = 1 as single_row
FROM pg_proc p
WHERE
  p.pronargs = 1
  and p.proargtypes[0] in (select reltype from all_relations)
  and p.prorettype in (select reltype from all_relations);

-[ RECORD 1 ]-+------------------
schema        | test
name          | manual_client_rel
rel_table     | test.projects
rel_ftable    | test.clients
single_row    | t

@steve-chavez
Copy link
Member

The result would be embedding based on virtual columns, with the full support of features we currently have: Nesting, selecting subfields, inlining for performance

Perf is great as mentioned above, I get a cheaper plan when calling the computed relationship:

-- our generated query
explain analyze WITH pgrst_source AS (
  SELECT "test"."projects".*, row_to_json("projects_clients".*) AS "clients"FROM "test"."projects"
  LEFT JOIN LATERAL (
    SELECT "test"."clients".*
    FROM "test"."clients"  WHERE "test"."projects"."client_id" = "test"."clients"."id"   ) AS "projects_clients" ON TRUE)
SELECT
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

 Aggregate  (cost=75.74..75.76 rows=1 width=112) (actual time=0.114..0.117 rows=1 loops=1)
   ->  Hash Left Join  (cost=38.58..63.74 rows=1200 width=72) (actual time=0.045..0.054 rows=5 loops=1)
         Hash Cond: (projects.client_id = clients.id)
         ->  Seq Scan on projects  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.013..0.015 rows=5 loops=1)
         ->  Hash  (cost=22.70..22.70 rows=1270 width=36) (actual time=0.019..0.020 rows=2 loops=1)
               Buckets: 2048  Batches: 1  Memory Usage: 17kB
               ->  Seq Scan on clients  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.010..0.013 rows=2 loops=1)
 Planning Time: 0.416 ms
 Execution Time: 0.198 ms

-- using a computed relationship
CREATE FUNCTION manual_client_rel(test.projects) RETURNS test.clients AS $$
  SELECT * FROM test.clients WHERE id = $1.client_id;
$$ LANGUAGE sql STABLE;

explain analyze WITH pgrst_source AS (
  SELECT x.*, row_to_json("projects_clients".*) AS "clients"FROM "test"."projects" x
  LEFT JOIN LATERAL (
    SELECT "manual_client_rel".*
    FROM "manual_client_rel"(x)) AS "projects_clients" ON TRUE)
SELECT
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

 Aggregate  (cost=58.25..58.28 rows=1 width=112) (actual time=0.359..0.361 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.25..46.25 rows=1200 width=72) (actual time=0.209..0.290 rows=5 loops=1)
         ->  Seq Scan on projects x  (cost=0.00..22.00 rows=1200 width=104) (actual time=0.021..0.029 rows=5 loops=1)
         ->  Function Scan on manual_client_rel  (cost=0.25..0.26 rows=1 width=32) (actual time=0.050..0.050 rows=1 loops=5)
 Planning Time: 0.426 ms
 Execution Time: 0.437 ms

Of course the row samples are too small(and somehow the execution time is always higher for the computed relationships) but still shows the direction is good.

@wolfgangwalther
Copy link
Member Author

wolfgangwalther commented Aug 6, 2022

Perf is great as mentioned above, I get a cheaper plan when calling the computed relationship:

Wait a moment, you're mixing a few things up. What ultimately matters for performance is execution time. And that is clearly worse with the computed relationship. Costs are always only an estimation. Performance is worse, because the query does not inline the function body at the moment. You can tell this from the explain for the second query - it has a "function scan on manual_client_rel" in it. As long as you can see the function name in the explain output, it's not inlined, afaik. So it's not surprising that the cost is estimated lower here, because PG does not look inside the function at all and just assumes the default cost for a function.

According to the inlining requirements, the reason why it's currently not doing that is:

A table function call will be inlined if all of the following conditions are met (Source Code):
...

  • the function is declared RETURNS SETOF or RETURNS TABLE

Currently your helper function only RETURNS test.clients. But it needs to RETURNS SETOF test.clients.

How do the query plans look like with that change?

@steve-chavez
Copy link
Member

steve-chavez commented Aug 6, 2022

the function is declared RETURNS SETOF or RETURNS TABLE

Ah, didn't know that was an inlining requirement. Hm, so that means we cannot use SETOF to denote a many-to-one relationship. And RETURNS TABLE arleady implies a SETOF.

Those relationships would have a cardinality of M2O or O2M depending whether the function returns multiple rows or not

Did you thought of using SETOF for multiple rows? Or is there another way?

How do the query plans look like with that change?

explain analyze WITH pgrst_source AS (
  SELECT x.*, row_to_json("projects_clients".*) AS "clients"FROM "test"."projects" x
  LEFT JOIN LATERAL (
    SELECT "manual_client_rel".*
    FROM "manual_client_rel"(x)) AS "projects_clients" ON TRUE)
SELECT
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=75.74..75.75 rows=1 width=40) (actual time=0.105..0.108 rows=1 loops=1)
   ->  Hash Left Join  (cost=38.58..63.74 rows=1200 width=72) (actual time=0.043..0.052 rows=5 loops=1)
         Hash Cond: (x.client_id = clients.id)
         ->  Seq Scan on projects x  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.012..0.014 rows=5 loops=1)
         ->  Hash  (cost=22.70..22.70 rows=1270 width=36) (actual time=0.018..0.019 rows=2 loops=1)
               Buckets: 2048  Batches: 1  Memory Usage: 17kB
               ->  Seq Scan on clients  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.010..0.013 rows=2 loops=1)
 Planning Time: 0.480 ms
 Execution Time: 0.182 ms

Execution time now looks about the same as the query without the computed relationship. "Function scan" is also gone.

@steve-chavez
Copy link
Member

steve-chavez commented Aug 6, 2022

Did you thought of using SETOF for multiple rows? Or is there another way?

There's the ROWS parameter. Maybe we can use a ROWS 1 for this case.

ROWS result_rows
A positive number giving the estimated number of rows that the planner should expect the function to return. This is only allowed when the function is declared to return a set. The default assumption is 1000 rows.

Edit: That's it. It works fine with inlining too. I should correct the pg_proc query above to use prorows instead of proretset.

@steve-chavez
Copy link
Member

steve-chavez commented Aug 11, 2022

So far this has been working great. I mean it even works for rpc:

CREATE FUNCTION test.clients(test.projects) RETURNS SETOF test.clients AS $$
  SELECT * FROM test.clients WHERE id = $1.client_id;
$$ LANGUAGE sql STABLE ROWS 1;
curl 'localhost:3000/rpc/getallprojects?select=*,clients(id,name)'

[{"id":1,"name":"Windows 7","client_id":1,"clients":{"id":1,"name":"Microsoft"}},
 {"id":2,"name":"Windows 10","client_id":1,"clients":{"id":1,"name":"Microsoft"}},
 {"id":3,"name":"IOS","client_id":2,"clients":{"id":2,"name":"Apple"}},
 {"id":4,"name":"OSX","client_id":2,"clients":{"id":2,"name":"Apple"}},
generated query
WITH pgrst_source AS (
  SELECT "getallprojects".*
  FROM "test"."getallprojects"())
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM (
  SELECT
    "projects".*,
    row_to_json("projects_clients".*) AS "clients"
  FROM "pgrst_source" AS "projects"
  LEFT JOIN LATERAL (
    SELECT "clients"."id", "clients"."name" FROM "test"."clients"("projects")
  ) AS "projects_clients" ON TRUE
) _postgrest_t;
 

Other thing I noticed is that thanks to overloaded functions, we can use the same function name to define relationships for many tables:

CREATE FUNCTION test.clients(test.projects) RETURNS SETOF test.clients
-- curl 'localhost:3000/projects?select=*,clients(id,name)'
CREATE FUNCTION test.clients(test.companies) RETURNS SETOF test.clients
-- curl 'localhost:3000/companies?select=*,clients(id,name)'
CREATE FUNCTION test.clients(test.others) RETURNS SETOF test.clients
-- curl 'localhost:3000/others?select=*,clients(id,name)'

@steve-chavez
Copy link
Member

There is one problem though. It's not working for mutations.

curl 'localhost:3000/projects?select=id,name,clients(id,name)' -H "Prefer: return=representation" -H "Content-Type: application/json" -d @- <<JSON
{"id":6,"name":"New Project","client_id":2}
JSON

{"code":"42846","details":"Input has too few columns.","hint":null,"message":"cannot cast type record to projects"}
WITH pgrst_source AS (
  WITH
  pgrst_payload AS (SELECT '{"id":6,"name":"New Project","client_id":2}'::json AS json_data),
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
  INSERT INTO "test"."projects"("client_id", "id", "name")
  SELECT "client_id", "id", "name" FROM json_populate_recordset (null::"test"."projects", (SELECT val FROM pgrst_body)) _
  RETURNING "test"."projects"."id", "test"."projects"."name")
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM (
  SELECT
    "projects"."id",
    "projects"."name",
    row_to_json("projects_clients".*) AS "clients"
  FROM "pgrst_source" AS "projects"
  LEFT JOIN LATERAL (
    SELECT "clients"."id", "clients"."name"
    FROM "test"."clients"("projects")
  ) AS "projects_clients" ON TRUE
) _postgrest_t;


ERROR:  cannot cast type record to test.projects
LINE 18:     FROM "test"."clients"("projects")
                                   ^
DETAIL:  Input has too few columns.

At first I thought it was because of the CTE but the RPC query above proves it's not.

It's actually because the RETURNING "test"."projects"."id", "test"."projects"."name" is making the query lose its resulting type. Now if that's changed to RETURNING * it works. I'd need to check if it's possible to modify our queries to always use RETURNING * and select the columns on the next SELECT query.


Found this trick that allows to cast like FROM "test"."clients"("projects"::text::test.projects) but that doesn't help because of the row_to_json, it needs the exact columns.

@wolfgangwalther
Copy link
Member Author

I'd need to check if it's possible to modify our queries to always use RETURNING * and select the columns on the next SELECT query.

You already have that in the outer SELECT "projects"."id", "projects"."name", right?

So it's just a matter of making it RETURNING * in that case?

I suggest to only make it that way if any computed relationship is used... otherwise, we'd lose the ability to do mutations with select privileges on only a subset of columns, right?

@steve-chavez
Copy link
Member

I suggest to only make it that way if any computed relationship is used... otherwise, we'd lose the ability to do mutations with select privileges on only a subset of columns, right?

Yeah, though we'd lose the same ability for computed relationships. Say if you want to insert a row of "projects" but don't have privilege on a new column "cost" and you'd do POST /projects?select=id,name,clients(id,name), since RETURNING * is done then you should get an error for "cost".

I found another way to fix the type, but it would require the schema cache; basically we need to fill the missing columns in the RETURNING with null:<type>.

WITH pgrst_source AS (
  WITH
  pgrst_payload AS (SELECT '{"id":6,"name":"New Project","client_id":2}'::json AS json_data),
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
  INSERT INTO "test"."projects"("client_id", "id", "name")
  SELECT "client_id", "id", "name" FROM json_populate_recordset (null::"test"."projects", (SELECT val FROM pgrst_body)) _
  RETURNING "test"."projects"."id", "test"."projects"."name", null::int)
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM (
  SELECT
    "projects"."id",
    "projects"."name",
    row_to_json("projects_clients".*) AS "clients"
  FROM "pgrst_source" AS "projects"
  LEFT JOIN LATERAL (
    SELECT "clients"."id", "clients"."name"
    FROM "test"."clients"("projects")
  ) AS "projects_clients" ON TRUE
) _postgrest_t;

@steve-chavez
Copy link
Member

An alternative that doesn't require the schema cache could also be:

WITH pgrst_source AS (
  WITH
  pgrst_payload AS (SELECT '{"id":6,"name":"New Project","client_id":2}'::json AS json_data),
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
  INSERT INTO "test"."projects"("client_id", "id", "name")
  SELECT "client_id", "id", "name" FROM json_populate_recordset (null::"test"."projects", (SELECT val FROM pgrst_body)) _
  RETURNING json_build_object('id', "test"."projects"."id", 'name', "test"."projects"."name"))
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM (
  SELECT
    "projects"."id",
    "projects"."name",
    row_to_json("projects_clients".*) AS "clients"
  FROM json_populate_record(null::test.projects, (SELECT * FROM "pgrst_source")) AS "projects"
  LEFT JOIN LATERAL (
    SELECT "clients"."id", "clients"."name"
    FROM "test"."clients"("projects")
  ) AS "projects_clients" ON TRUE
) _postgrest_t;

But it doesn't feel right to use it for all returnings, the json casting could mess up some types. Using json functions on the body(defined by content-type) and the response(defined by accept) is fine but not for the generic returning.

@steve-chavez
Copy link
Member

steve-chavez commented Aug 13, 2022

Besides the type casting, there's another problem with mutations. For our detected relationships, we enhance the RETURNING with the foreign keys so an embedding can succeed:

returningCols :: ReadRequest -> [FieldName] -> [FieldName]
returningCols rr@(Node _ forest) pkCols
-- if * is part of the select, we must not add pk or fk columns manually -
-- otherwise those would be selected and output twice
| "*" `elem` fldNames = ["*"]
| otherwise = returnings
where
fldNames = fstFieldNames rr
-- Without fkCols, when a mutateRequest to
-- /projects?select=name,clients(name) occurs, the RETURNING SQL part would
-- be `RETURNING name`(see QueryBuilder). This would make the embedding
-- fail because the following JOIN would need the "client_id" column from
-- projects. So this adds the foreign key columns to ensure the embedding
-- succeeds, result would be `RETURNING name, client_id`.
fkCols = concat $ mapMaybe (\case
Node (_, (_, Just Relationship{relCardinality=O2M _ cols}, _, _, _, _)) _ -> Just $ fst <$> cols
Node (_, (_, Just Relationship{relCardinality=M2O _ cols}, _, _, _, _)) _ -> Just $ fst <$> cols
Node (_, (_, Just Relationship{relCardinality=M2M Junction{junColumns1, junColumns2}}, _, _, _, _)) _ -> Just $ (fst <$> junColumns1) ++ (fst <$> junColumns2)
_ -> Nothing
) forest
-- However if the "client_id" is present, e.g. mutateRequest to
-- /projects?select=client_id,name,clients(name) we would get `RETURNING
-- client_id, name, client_id` and then we would produce the "column
-- reference \"client_id\" is ambiguous" error from PostgreSQL. So we
-- deduplicate with Set: We are adding the primary key columns as well to
-- make sure, that a proper location header can always be built for
-- INSERT/POST
returnings = S.toList . S.fromList $ fldNames ++ fkCols ++ pkCols

However with computed relationships the join conditions are basically a private detail, we cannot know for sure which conditions(FKs included) we need for the embedding to succeed.

That leaves the RETURNING * as the only way forward.


Should be noted that computed relationships do work on mutations when specifying * on select:

curl 'localhost:3000/projects?select=*,clients(id,name)' -H "Prefer: return=representation" -H "Content-Type: application/json" -d @- <<JSON
{"id":6,"name":"New Project","client_id":2}
JSON

[{"id":6,"name":"nom","client_id":2,"clients":{"id":2,"name":"Apple"}}]

@steve-chavez
Copy link
Member

Since STABLE/IMMUTABLE is a condition for inlining, I think we should enforce that computed relationships are marked as STABLE or otherwise they'll not be detected. (IMMUTABLE doesn't make sense here since that's for pure functions)

@wolfgangwalther
Copy link
Member Author

However with computed relationships the join conditions are basically a private detail, we cannot know for sure which conditions(FKs included) we need for the embedding to succeed.

That leaves the RETURNING * as the only way forward.

Yes, came to the same conclusion.

Since STABLE/IMMUTABLE is a condition for inlining, I think we should enforce that computed relationships are marked as STABLE or otherwise they'll not be detected. (IMMUTABLE doesn't make sense here since that's for pure functions)

Please don't. I can easily see computed relationships that could be marked VOLATILE or IMMUTABLE on purpose. For example an "embed a random row" could be VOLATILE. Not being able to inline might not be a show-stopper in this case.

@steve-chavez
Copy link
Member

Note: computed relationships will not work for "relational inserts" later, but we have another way on #818 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embedding resource embedding enhancement a feature, ready for implementation
Development

Successfully merging a pull request may close this issue.

3 participants