Computed relationships - First-class computed/virtual columns - a generalization of foreign-key-based resource embedding #2144

wolfgangwalther · 2022-02-01T11:20:19Z

Problem

This builds on some ideas we had earlier:

@steve-chavez understood resource embedding as functions:

So, currently we support embedding - or joining - with the embed(col1,col2) syntax. A join is an operator(according to relational algebra) and thus: a function.

And I gave a practical example, showing the similarities:

create function clients (projects) returns clients
stable language sql as $$
  select * from clients where id = $1.client_id
$$;
Because this function has only 1 argument of type projects it can be selected as a virtual column from the projects table. The url would look like this: GET /projects?select=*,clients. The result is the same as the embedding we provide: GET /projects?select=*,clients(*).

@steve-chavez then suggested this as the way forward to do custom embeddings.

However, the current support for virtual columns is only implicit through field notation. This has several drawbacks for embedding, i.e. the following can only be done through fk-based embedding right now:

ability to select only a subset of columns

nesting

performance: inlining of o2m and m2m queries (doesn't work with a real function, because the aggregation would prevent it)

Additionally it's currently impossible to use a virtual column that has the same name as a real column of the same relation - field notation will always choose the latter.

Basic implementation

To solve these issues, I propose to introduce first-class support for virtual columns: When creating the schema cache, we should parse function definitions to detect virtual columns and then make them callable the same way as resource embedding is currently done.

The good thing: This is actually much simpler than it sounds. We need to add relationships based on function definitions in:

postgrest/src/PostgREST/DbStructure.hs

Line 96 in 8934190

let rels = addO2MRels . addM2MRels $ addViewM2ORels srcCols m2oRels

Those relationships would have a cardinality of M2O or O2M depending whether the function returns multiple rows or not. This would generate the right query for aggregation here:

postgrest/src/PostgREST/Query/QueryBuilder.hs

Lines 52 to 81 in 8934190

    
           getJoinsSelects :: ReadRequest -> ([SQL.Snippet], [SQL.Snippet]) -> ([SQL.Snippet], [SQL.Snippet]) 
        
           getJoinsSelects rr@(Node (_, (name, Just Relationship{relCardinality=card,relTable=Table{tableName=table}}, alias, _, joinType, _)) _) (joins,selects) = 
        
             let subquery = readRequestToQuery rr in 
        
             case card of 
        
               M2O _ -> 
        
                 let aliasOrName = fromMaybe name alias 
        
                     localTableName = pgFmtIdent $ table <> "_" <> aliasOrName 
        
                     sel = SQL.sql ("row_to_json(" <> localTableName <> ".*) AS " <> pgFmtIdent aliasOrName) 
        
                     joi = (if joinType == Just JTInner then " INNER" else " LEFT") 
        
                       <> " JOIN LATERAL( " <> subquery <> " ) AS " <> SQL.sql localTableName <> " ON TRUE " in 
        
                 (joi:joins,sel:selects) 
        
               _ -> case joinType of 
        
                 Just JTInner -> 
        
                   let aliasOrName = fromMaybe name alias 
        
                       locTblName = table <> "_" <> aliasOrName 
        
                       localTableName = pgFmtIdent locTblName 
        
                       internalTableName = pgFmtIdent $ "_" <> locTblName 
        
                       sel = SQL.sql $ localTableName <> "." <> internalTableName <> " AS " <> pgFmtIdent aliasOrName 
        
                       joi = "INNER JOIN LATERAL(" <> 
        
                               "SELECT json_agg(" <> SQL.sql internalTableName <> ") AS " <> SQL.sql internalTableName <> 
        
                               "FROM (" <> subquery <> " ) AS " <> SQL.sql internalTableName <> 
        
                             ") AS " <> SQL.sql localTableName <> " ON " <> SQL.sql localTableName <> "IS NOT NULL" in 
        
                   (joi:joins,sel:selects) 
        
                 _ -> 
        
                   let sel = "COALESCE ((" 
        
                          <> "SELECT json_agg(" <> SQL.sql (pgFmtIdent table) <> ".*) " 
        
                          <> "FROM (" <> subquery <> ") " <> SQL.sql (pgFmtIdent table) <> " " 
        
                          <> "), '[]') AS " <> SQL.sql (pgFmtIdent (fromMaybe name alias)) in 
        
                   (joins,sel:selects) 
        
           getJoinsSelects (Node (_, (_, Nothing, _, _, _, _)) _) _ = ([], [])

We would then extend the Cardinality type to hold either the FKConstraint or a QualifiedIdentifier pointing to the virtual column's function. This would allow addRels to add the type of embedding (virtual vs fk-based) to the ReadRequest in:

postgrest/src/PostgREST/Request/DbRequestBuilder.hs

Lines 133 to 137 in 8934190

    
           let newFrom r = if qiName tbl == nodeName then tableQi (relForeignTable r) else tbl 
        
               newReadNode = (\r -> (query{from=newFrom r}, (nodeName, Just r, alias, hint, joinType, depth))) <$> rel 
        
               rel = findRel schema allRels (qiName parentNodeQi) nodeName hint 
        
           in 
        
           Node <$> newReadNode <*> (updateForest . hush $ Node <$> newReadNode <*> pure forest)

And based on this, we can then make sure that the actual SELECT from the embedded table is changed to call the function instead:

postgrest/src/PostgREST/Query/QueryBuilder.hs

Lines 36 to 42 in 8934190

    
           readRequestToQuery :: ReadRequest -> SQL.Snippet 
        
           readRequestToQuery (Node (Select colSelects mainQi tblAlias implJoins logicForest joinConditions_ ordts range, _) forest) = 
        
             "SELECT " <> 
        
             intercalateSnippet ", " ((pgFmtSelectItem qi <$> colSelects) ++ selects) <> 
        
             "FROM " <> SQL.sql (BS.intercalate ", " (tabl : implJs)) <> " " <> 
        
             intercalateSnippet " " joins <> " " <> 
        
             (if null logicForest && null joinConditions_ then mempty else "WHERE " <> intercalateSnippet " AND " (map (pgFmtLogicTree qi) logicForest ++ map pgFmtJoinCondition joinConditions_))

This currently creates a subquery like the following:

SELECT "test"."clients".* FROM "test"."clients"  WHERE "test"."projects"."client_id" = "test"."clients"."id"

and we'd need to make this:

SELECT "virtual_column".* FROM "test"."virtual_column"("projects")

The result would be embedding based on virtual columns, with the full support of features we currently have: Nesting, selecting subfields, inlining for performance - there is basically no difference, afaict. It should even be possible to use hints, if we fill the Relationship fields in a smart way.

Consequences for Embedding in general

This will solve #591, #1179 and #1230 immediately and would provide a workaround solution for #1643, #2566, #1907 and #1984 - basically any issue with disambiguation, because you can always roll your own in a more explicit way.

Furthermore, we could remove some of the auto-detected fk-based relationships easily, that have recently posed problems with disambiguation in general. We could restrict M2M relationships to simple junction tables with only FK columns, we might decide to remove self-references completely, because those create a lot of challenges on their own. Those can all be replaced with explicit virtual column embedding.

Taking this one step further, we could add a config option db-fk-embedding, which defaults to true. Setting this to false would disable fk-based embedding entirely. This would remove our dependency on constraint names, at least partially, getting us one step closer to true schema isolation. We'd still expose those in error messages for failed constraints, I think, but at least our query API wouldn't depend on it. This would also allow users to be very explicit in which embeddings they want to allow - and would also guarantee, that further improvements in our view-parsing will not break existing queries when updating PostgREST due to new embedding possibilities resulting in a "can't disambiguate" response. This would solve #2070.

Further generalization for function calls

At this stage, ?select=*,func(*) is essentially a function call, which takes the current row as the only argument and allows selecting partial output.

Scalars

The next step would be to allow ?select=*,func() to be used to call functions which return a scalar instead of a composite type. This would solve the problem mentioned above, regarding real and virtual columns with the same name on a given table.

Aggregates

Given that we have a lot of similarities with those virtual columns and the field notation call syntax, we can think of both of the following to be the same (pseudo-code):

GET /projects?select=*,clients(*)
GET /projects?select=*,projects.clients

We don't need the table prefix, because we're calling the function in a a better way, but we can take a similar looking approach to pass columns instead of rows as the only argument to a function:

GET /employees?select=salary.sum()

This would be the same as SELECT sum(employees.salary) FROM employees, essentially.

This is very similar to this suggestion on how to call aggregate functions:

GET /employees?select=total_salaray&total_salaray=sum.salary

However, the new proposal has less boilerplate, less indirection and is more consistent with how we treat functions as virtual columns. It also avoids potential name conflicts with column filters on the operator etc.

It would still be possible to implement the FILTER or HAVING clauses:

GET /users?select=males:count(),non_betas:count()&males?gender=eq.male&non_betas?beta=is.true&non_betas?active=is.false

Even more

It should be possible to extend these concepts to:

window functions
chaining multiple functions: ?select=*,col.func1().func2(), which looks like a syntax that could be used for Multiple "hops" for embed(Unnest/flatten embeds) #1233, too.

The text was updated successfully, but these errors were encountered:

steve-chavez · 2022-02-07T17:15:26Z

First-class computed/virtual columns are a sure win!

Taking this one step further, we could add a config option db-fk-embedding, which defaults to true. Setting this to false would disable fk-based embedding entirely

Likely better to discuss this on a different issue, but would thatdb-fk-embedding=false disable automatic resource embedding entirely? Would all embeds require a function?

wolfgangwalther · 2022-02-07T17:22:42Z

Likely better to discuss this on a different issue, but would thatdb-fk-embedding=false disable automatic resource embedding entirely? Would all embeds require a function?

Yes, yes and yes.

steve-chavez · 2022-06-10T04:18:46Z

Another thing that's good about this one is that it could be used to override our detected relationships as well, so it provides a way forward from breaking changes in embedding. For example, if we start detecting one-to-one relationships users could go back to redefining as a one-to-many.

It's much better than doing some postgREST specific stuff(like on this proposal) to override relationships because it reduces user cognitive load.

Those relationships would have a cardinality of M2O or O2M depending whether the function returns multiple rows or not

So a function that RETURNS SETOF tbl vs just RETURNS tbl. Cool, I'll give this a try.

It should even be possible to use hints, if we fill the Relationship fields in a smart way.

IIUC, this would not need hints because the virtual column would be unambiguous as it's explicitly defining the join with the table and columns.

We would then extend the Cardinality type to hold either the FKConstraint or a QualifiedIdentifier pointing to the virtual column's function.

 data Cardinality 
   = O2M {relConnection :: Either FKConstraint ComputedColumn} 
   -- ^ one-to-many 
   | M2O {relConnection :: Either FKConstraint ComputedColumn} 
   -- ^ many-to-one 
   | M2M Junction 
   -- ^ many-to-many 
   deriving (Eq, Ord, Generic, JSON.ToJSON) 

  type ComputedColumn = QualifiedIdentifier
  newtype FKConstraint = FKConstraint Text [(FieldName, FieldName)]

-- | Relationship between two tables.
data Relationship = Relationship
  { relTable        :: QualifiedIdentifier
  , relForeignTable :: QualifiedIdentifier
  , relIsSelf       :: Bool -- ^ Whether is a self relationship
  , relCardinality  :: Cardinality
  }
  deriving (Eq, Ord, Generic, JSON.ToJSON)

The target name, the relTable and relForeginTable can be inferred from the function as:

create function target_name(relTable) returns relForeignTable

Kinda makes me think that targetName should be an attribute of Relationship instead of computing it dinamically in findRel

steve-chavez · 2022-08-05T04:54:27Z

Came up with this query:

-- Adding this on the fixtures:
-- CREATE FUNCTION manual_client_rel(test.projects) RETURNS SETOF test.clients AS $$
--  SELECT * FROM test.clients WHERE id = $1.client_id;
--$$ LANGUAGE sql STABLE ROWS 1;

with
all_relations as (
  select reltype
  from pg_class
  where relkind in ('v','r','m','f','p')
)
select
  p.pronamespace::regnamespace as schema,
  p.proname as name,
  p.proargtypes[0]::regtype as rel_table,
  p.prorettype::regtype as rel_ftable,
  p.prorows = 1 as single_row
FROM pg_proc p
WHERE
  p.pronargs = 1
  and p.proargtypes[0] in (select reltype from all_relations)
  and p.prorettype in (select reltype from all_relations);

-[ RECORD 1 ]-+------------------
schema        | test
name          | manual_client_rel
rel_table     | test.projects
rel_ftable    | test.clients
single_row    | t

steve-chavez · 2022-08-06T03:31:58Z

The result would be embedding based on virtual columns, with the full support of features we currently have: Nesting, selecting subfields, inlining for performance

Perf is great as mentioned above, I get a cheaper plan when calling the computed relationship:

-- our generated query
explain analyze WITH pgrst_source AS (
  SELECT "test"."projects".*, row_to_json("projects_clients".*) AS "clients"FROM "test"."projects"
  LEFT JOIN LATERAL (
    SELECT "test"."clients".*
    FROM "test"."clients"  WHERE "test"."projects"."client_id" = "test"."clients"."id"   ) AS "projects_clients" ON TRUE)
SELECT
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

 Aggregate  (cost=75.74..75.76 rows=1 width=112) (actual time=0.114..0.117 rows=1 loops=1)
   ->  Hash Left Join  (cost=38.58..63.74 rows=1200 width=72) (actual time=0.045..0.054 rows=5 loops=1)
         Hash Cond: (projects.client_id = clients.id)
         ->  Seq Scan on projects  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.013..0.015 rows=5 loops=1)
         ->  Hash  (cost=22.70..22.70 rows=1270 width=36) (actual time=0.019..0.020 rows=2 loops=1)
               Buckets: 2048  Batches: 1  Memory Usage: 17kB
               ->  Seq Scan on clients  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.010..0.013 rows=2 loops=1)
 Planning Time: 0.416 ms
 Execution Time: 0.198 ms

-- using a computed relationship
CREATE FUNCTION manual_client_rel(test.projects) RETURNS test.clients AS $$
  SELECT * FROM test.clients WHERE id = $1.client_id;
$$ LANGUAGE sql STABLE;

explain analyze WITH pgrst_source AS (
  SELECT x.*, row_to_json("projects_clients".*) AS "clients"FROM "test"."projects" x
  LEFT JOIN LATERAL (
    SELECT "manual_client_rel".*
    FROM "manual_client_rel"(x)) AS "projects_clients" ON TRUE)
SELECT
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

 Aggregate  (cost=58.25..58.28 rows=1 width=112) (actual time=0.359..0.361 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.25..46.25 rows=1200 width=72) (actual time=0.209..0.290 rows=5 loops=1)
         ->  Seq Scan on projects x  (cost=0.00..22.00 rows=1200 width=104) (actual time=0.021..0.029 rows=5 loops=1)
         ->  Function Scan on manual_client_rel  (cost=0.25..0.26 rows=1 width=32) (actual time=0.050..0.050 rows=1 loops=5)
 Planning Time: 0.426 ms
 Execution Time: 0.437 ms

Of course the row samples are too small(and somehow the execution time is always higher for the computed relationships) but still shows the direction is good.

wolfgangwalther · 2022-08-06T06:36:08Z

Perf is great as mentioned above, I get a cheaper plan when calling the computed relationship:

Wait a moment, you're mixing a few things up. What ultimately matters for performance is execution time. And that is clearly worse with the computed relationship. Costs are always only an estimation. Performance is worse, because the query does not inline the function body at the moment. You can tell this from the explain for the second query - it has a "function scan on manual_client_rel" in it. As long as you can see the function name in the explain output, it's not inlined, afaik. So it's not surprising that the cost is estimated lower here, because PG does not look inside the function at all and just assumes the default cost for a function.

According to the inlining requirements, the reason why it's currently not doing that is:

A table function call will be inlined if all of the following conditions are met (Source Code):
...

the function is declared RETURNS SETOF or RETURNS TABLE

Currently your helper function only RETURNS test.clients. But it needs to RETURNS SETOF test.clients.

How do the query plans look like with that change?

steve-chavez · 2022-08-06T15:06:11Z

the function is declared RETURNS SETOF or RETURNS TABLE

Ah, didn't know that was an inlining requirement. Hm, so that means we cannot use SETOF to denote a many-to-one relationship. And RETURNS TABLE arleady implies a SETOF.

Those relationships would have a cardinality of M2O or O2M depending whether the function returns multiple rows or not

Did you thought of using SETOF for multiple rows? Or is there another way?

How do the query plans look like with that change?

explain analyze WITH pgrst_source AS (
  SELECT x.*, row_to_json("projects_clients".*) AS "clients"FROM "test"."projects" x
  LEFT JOIN LATERAL (
    SELECT "manual_client_rel".*
    FROM "manual_client_rel"(x)) AS "projects_clients" ON TRUE)
SELECT
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=75.74..75.75 rows=1 width=40) (actual time=0.105..0.108 rows=1 loops=1)
   ->  Hash Left Join  (cost=38.58..63.74 rows=1200 width=72) (actual time=0.043..0.052 rows=5 loops=1)
         Hash Cond: (x.client_id = clients.id)
         ->  Seq Scan on projects x  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.012..0.014 rows=5 loops=1)
         ->  Hash  (cost=22.70..22.70 rows=1270 width=36) (actual time=0.018..0.019 rows=2 loops=1)
               Buckets: 2048  Batches: 1  Memory Usage: 17kB
               ->  Seq Scan on clients  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.010..0.013 rows=2 loops=1)
 Planning Time: 0.480 ms
 Execution Time: 0.182 ms

Execution time now looks about the same as the query without the computed relationship. "Function scan" is also gone.

steve-chavez · 2022-08-06T15:20:38Z

Did you thought of using SETOF for multiple rows? Or is there another way?

There's the ROWS parameter. Maybe we can use a ROWS 1 for this case.

ROWS result_rows
A positive number giving the estimated number of rows that the planner should expect the function to return. This is only allowed when the function is declared to return a set. The default assumption is 1000 rows.

Edit: That's it. It works fine with inlining too. I should correct the pg_proc query above to use prorows instead of proretset.

steve-chavez · 2022-08-11T23:15:50Z

So far this has been working great. I mean it even works for rpc:

CREATE FUNCTION test.clients(test.projects) RETURNS SETOF test.clients AS $$
  SELECT * FROM test.clients WHERE id = $1.client_id;
$$ LANGUAGE sql STABLE ROWS 1;

curl 'localhost:3000/rpc/getallprojects?select=*,clients(id,name)'

[{"id":1,"name":"Windows 7","client_id":1,"clients":{"id":1,"name":"Microsoft"}},
 {"id":2,"name":"Windows 10","client_id":1,"clients":{"id":1,"name":"Microsoft"}},
 {"id":3,"name":"IOS","client_id":2,"clients":{"id":2,"name":"Apple"}},
 {"id":4,"name":"OSX","client_id":2,"clients":{"id":2,"name":"Apple"}},

generated query

WITH pgrst_source AS (
  SELECT "getallprojects".*
  FROM "test"."getallprojects"())
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM (
  SELECT
    "projects".*,
    row_to_json("projects_clients".*) AS "clients"
  FROM "pgrst_source" AS "projects"
  LEFT JOIN LATERAL (
    SELECT "clients"."id", "clients"."name" FROM "test"."clients"("projects")
  ) AS "projects_clients" ON TRUE
) _postgrest_t;

Other thing I noticed is that thanks to overloaded functions, we can use the same function name to define relationships for many tables:

CREATE FUNCTION test.clients(test.projects) RETURNS SETOF test.clients
-- curl 'localhost:3000/projects?select=*,clients(id,name)'
CREATE FUNCTION test.clients(test.companies) RETURNS SETOF test.clients
-- curl 'localhost:3000/companies?select=*,clients(id,name)'
CREATE FUNCTION test.clients(test.others) RETURNS SETOF test.clients
-- curl 'localhost:3000/others?select=*,clients(id,name)'

steve-chavez · 2022-08-11T23:28:36Z

There is one problem though. It's not working for mutations.

curl 'localhost:3000/projects?select=id,name,clients(id,name)' -H "Prefer: return=representation" -H "Content-Type: application/json" -d @- <<JSON
{"id":6,"name":"New Project","client_id":2}
JSON

{"code":"42846","details":"Input has too few columns.","hint":null,"message":"cannot cast type record to projects"}

WITH pgrst_source AS (
  WITH
  pgrst_payload AS (SELECT '{"id":6,"name":"New Project","client_id":2}'::json AS json_data),
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
  INSERT INTO "test"."projects"("client_id", "id", "name")
  SELECT "client_id", "id", "name" FROM json_populate_recordset (null::"test"."projects", (SELECT val FROM pgrst_body)) _
  RETURNING "test"."projects"."id", "test"."projects"."name")
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM (
  SELECT
    "projects"."id",
    "projects"."name",
    row_to_json("projects_clients".*) AS "clients"
  FROM "pgrst_source" AS "projects"
  LEFT JOIN LATERAL (
    SELECT "clients"."id", "clients"."name"
    FROM "test"."clients"("projects")
  ) AS "projects_clients" ON TRUE
) _postgrest_t;


ERROR:  cannot cast type record to test.projects
LINE 18:     FROM "test"."clients"("projects")
                                   ^
DETAIL:  Input has too few columns.

At first I thought it was because of the CTE but the RPC query above proves it's not.

It's actually because the RETURNING "test"."projects"."id", "test"."projects"."name" is making the query lose its resulting type. Now if that's changed to RETURNING * it works. I'd need to check if it's possible to modify our queries to always use RETURNING * and select the columns on the next SELECT query.

Found this trick that allows to cast like FROM "test"."clients"("projects"::text::test.projects) but that doesn't help because of the row_to_json, it needs the exact columns.

wolfgangwalther · 2022-08-12T05:58:42Z

I'd need to check if it's possible to modify our queries to always use RETURNING * and select the columns on the next SELECT query.

You already have that in the outer SELECT "projects"."id", "projects"."name", right?

So it's just a matter of making it RETURNING * in that case?

I suggest to only make it that way if any computed relationship is used... otherwise, we'd lose the ability to do mutations with select privileges on only a subset of columns, right?

steve-chavez · 2022-08-12T15:47:03Z

I suggest to only make it that way if any computed relationship is used... otherwise, we'd lose the ability to do mutations with select privileges on only a subset of columns, right?

Yeah, though we'd lose the same ability for computed relationships. Say if you want to insert a row of "projects" but don't have privilege on a new column "cost" and you'd do POST /projects?select=id,name,clients(id,name), since RETURNING * is done then you should get an error for "cost".

I found another way to fix the type, but it would require the schema cache; basically we need to fill the missing columns in the RETURNING with null:<type>.

WITH pgrst_source AS (
  WITH
  pgrst_payload AS (SELECT '{"id":6,"name":"New Project","client_id":2}'::json AS json_data),
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
  INSERT INTO "test"."projects"("client_id", "id", "name")
  SELECT "client_id", "id", "name" FROM json_populate_recordset (null::"test"."projects", (SELECT val FROM pgrst_body)) _
  RETURNING "test"."projects"."id", "test"."projects"."name", null::int)
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM (
  SELECT
    "projects"."id",
    "projects"."name",
    row_to_json("projects_clients".*) AS "clients"
  FROM "pgrst_source" AS "projects"
  LEFT JOIN LATERAL (
    SELECT "clients"."id", "clients"."name"
    FROM "test"."clients"("projects")
  ) AS "projects_clients" ON TRUE
) _postgrest_t;

steve-chavez · 2022-08-12T17:23:31Z

An alternative that doesn't require the schema cache could also be:

WITH pgrst_source AS (
  WITH
  pgrst_payload AS (SELECT '{"id":6,"name":"New Project","client_id":2}'::json AS json_data),
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
  INSERT INTO "test"."projects"("client_id", "id", "name")
  SELECT "client_id", "id", "name" FROM json_populate_recordset (null::"test"."projects", (SELECT val FROM pgrst_body)) _
  RETURNING json_build_object('id', "test"."projects"."id", 'name', "test"."projects"."name"))
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM (
  SELECT
    "projects"."id",
    "projects"."name",
    row_to_json("projects_clients".*) AS "clients"
  FROM json_populate_record(null::test.projects, (SELECT * FROM "pgrst_source")) AS "projects"
  LEFT JOIN LATERAL (
    SELECT "clients"."id", "clients"."name"
    FROM "test"."clients"("projects")
  ) AS "projects_clients" ON TRUE
) _postgrest_t;

But it doesn't feel right to use it for all returnings, the json casting could mess up some types. Using json functions on the body(defined by content-type) and the response(defined by accept) is fine but not for the generic returning.

steve-chavez · 2022-08-13T00:23:46Z

Besides the type casting, there's another problem with mutations. For our detected relationships, we enhance the RETURNING with the foreign keys so an embedding can succeed:

postgrest/src/PostgREST/Request/DbRequestBuilder.hs

Lines 350 to 377 in 3950b36

    
           returningCols :: ReadRequest -> [FieldName] -> [FieldName] 
        
           returningCols rr@(Node _ forest) pkCols 
        
             -- if * is part of the select, we must not add pk or fk columns manually - 
        
             -- otherwise those would be selected and output twice 
        
             | "*" `elem` fldNames = ["*"] 
        
             | otherwise           = returnings 
        
             where 
        
               fldNames = fstFieldNames rr 
        
               -- Without fkCols, when a mutateRequest to 
        
               -- /projects?select=name,clients(name) occurs, the RETURNING SQL part would 
        
               -- be `RETURNING name`(see QueryBuilder).  This would make the embedding 
        
               -- fail because the following JOIN would need the "client_id" column from 
        
               -- projects.  So this adds the foreign key columns to ensure the embedding 
        
               -- succeeds, result would be `RETURNING name, client_id`. 
        
               fkCols = concat $ mapMaybe (\case 
        
                   Node (_, (_, Just Relationship{relCardinality=O2M _ cols}, _, _, _, _)) _ -> Just $ fst <$> cols 
        
                   Node (_, (_, Just Relationship{relCardinality=M2O _ cols}, _, _, _, _)) _ -> Just $ fst <$> cols 
        
                   Node (_, (_, Just Relationship{relCardinality=M2M Junction{junColumns1, junColumns2}}, _, _, _, _)) _ -> Just $ (fst <$> junColumns1) ++ (fst <$> junColumns2) 
        
                   _                                                        -> Nothing 
        
                 ) forest 
        
               -- However if the "client_id" is present, e.g. mutateRequest to 
        
               -- /projects?select=client_id,name,clients(name) we would get `RETURNING 
        
               -- client_id, name, client_id` and then we would produce the "column 
        
               -- reference \"client_id\" is ambiguous" error from PostgreSQL. So we 
        
               -- deduplicate with Set: We are adding the primary key columns as well to 
        
               -- make sure, that a proper location header can always be built for 
        
               -- INSERT/POST 
        
               returnings = S.toList . S.fromList $ fldNames ++ fkCols ++ pkCols

However with computed relationships the join conditions are basically a private detail, we cannot know for sure which conditions(FKs included) we need for the embedding to succeed.

That leaves the RETURNING * as the only way forward.

Should be noted that computed relationships do work on mutations when specifying * on select:

curl 'localhost:3000/projects?select=*,clients(id,name)' -H "Prefer: return=representation" -H "Content-Type: application/json" -d @- <<JSON
{"id":6,"name":"New Project","client_id":2}
JSON

[{"id":6,"name":"nom","client_id":2,"clients":{"id":2,"name":"Apple"}}]

steve-chavez · 2022-08-13T05:21:10Z

Since STABLE/IMMUTABLE is a condition for inlining, I think we should enforce that computed relationships are marked as STABLE or otherwise they'll not be detected. (IMMUTABLE doesn't make sense here since that's for pure functions)

wolfgangwalther · 2022-08-13T09:15:27Z

However with computed relationships the join conditions are basically a private detail, we cannot know for sure which conditions(FKs included) we need for the embedding to succeed.

That leaves the RETURNING * as the only way forward.

Yes, came to the same conclusion.

Since STABLE/IMMUTABLE is a condition for inlining, I think we should enforce that computed relationships are marked as STABLE or otherwise they'll not be detected. (IMMUTABLE doesn't make sense here since that's for pure functions)

Please don't. I can easily see computed relationships that could be marked VOLATILE or IMMUTABLE on purpose. For example an "embed a random row" could be VOLATILE. Not being able to inline might not be a show-stopper in this case.

steve-chavez · 2023-03-23T10:48:57Z

Note: computed relationships will not work for "relational inserts" later, but we have another way on #818 (comment).

wolfgangwalther added enhancement a feature, ready for implementation embedding resource embedding labels Feb 1, 2022

wolfgangwalther mentioned this issue Feb 14, 2022

Ability to generate queries with distinct/group by #915

Closed

steve-chavez mentioned this issue Apr 23, 2022

Unable to disambiguate in presence of a view when column name is equal to table name #2238

Closed

steve-chavez mentioned this issue May 16, 2022

Resource embedding with view works with some tables but not others #1230

Closed

steve-chavez mentioned this issue Jun 4, 2022

Adding a new view can break previous requests when embedding with column or FK as target #2277

Closed

steve-chavez self-assigned this Jun 4, 2022

This was referenced Jun 10, 2022

HOW-TO for SOAP? PostgREST/postgrest-docs#547

Closed

Feature request: transformations to format and parse values in the API #2310

Closed

steve-chavez assigned laurenceisla and unassigned steve-chavez Jun 30, 2022

steve-chavez changed the title ~~First-class computed/virtual columns - a generalization of foreign-key-based resource embedding~~ Computed relationships - First-class computed/virtual columns - a generalization of foreign-key-based resource embedding Aug 4, 2022

steve-chavez mentioned this issue Aug 10, 2022

Add computed relationships #2419

Merged

9 tasks

wolfgangwalther mentioned this issue Aug 11, 2022

A relational insert #818

Open

steve-chavez unassigned laurenceisla Aug 12, 2022

steve-chavez closed this as completed in #2419 Aug 18, 2022

This was referenced Aug 18, 2022

No way to join when postgresql_fdw is used #591

Closed

Feature request: Being able to manually specify foreign keys in views #1179

Closed

steve-chavez mentioned this issue Aug 25, 2022

Allow multiple hints in hint disambiguation, remove FK disambiguation #2448

Closed

steve-chavez mentioned this issue Nov 22, 2023

Add aggregate functions #2925

Merged

wolfgangwalther mentioned this issue Jan 21, 2024

Postgrest is unable to determine relationship between user-defined type and table using computed relationship function #3096

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computed relationships - First-class computed/virtual columns - a generalization of foreign-key-based resource embedding #2144

Computed relationships - First-class computed/virtual columns - a generalization of foreign-key-based resource embedding #2144

wolfgangwalther commented Feb 1, 2022 •

edited

Loading

steve-chavez commented Feb 7, 2022

wolfgangwalther commented Feb 7, 2022

steve-chavez commented Jun 10, 2022 •

edited

Loading

steve-chavez commented Aug 5, 2022 •

edited

Loading

steve-chavez commented Aug 6, 2022

wolfgangwalther commented Aug 6, 2022 •

edited

Loading

steve-chavez commented Aug 6, 2022 •

edited

Loading

steve-chavez commented Aug 6, 2022 •

edited

Loading

steve-chavez commented Aug 11, 2022 •

edited

Loading

steve-chavez commented Aug 11, 2022

wolfgangwalther commented Aug 12, 2022

steve-chavez commented Aug 12, 2022

steve-chavez commented Aug 12, 2022

steve-chavez commented Aug 13, 2022 •

edited

Loading

steve-chavez commented Aug 13, 2022

wolfgangwalther commented Aug 13, 2022

steve-chavez commented Mar 23, 2023

Computed relationships - First-class computed/virtual columns - a generalization of foreign-key-based resource embedding #2144

Computed relationships - First-class computed/virtual columns - a generalization of foreign-key-based resource embedding #2144

Comments

wolfgangwalther commented Feb 1, 2022 • edited Loading

Problem

Basic implementation

Consequences for Embedding in general

Further generalization for function calls

Scalars

Aggregates

Even more

steve-chavez commented Feb 7, 2022

wolfgangwalther commented Feb 7, 2022

steve-chavez commented Jun 10, 2022 • edited Loading

steve-chavez commented Aug 5, 2022 • edited Loading

steve-chavez commented Aug 6, 2022

wolfgangwalther commented Aug 6, 2022 • edited Loading

steve-chavez commented Aug 6, 2022 • edited Loading

steve-chavez commented Aug 6, 2022 • edited Loading

steve-chavez commented Aug 11, 2022 • edited Loading

steve-chavez commented Aug 11, 2022

wolfgangwalther commented Aug 12, 2022

steve-chavez commented Aug 12, 2022

steve-chavez commented Aug 12, 2022

steve-chavez commented Aug 13, 2022 • edited Loading

steve-chavez commented Aug 13, 2022

wolfgangwalther commented Aug 13, 2022

steve-chavez commented Mar 23, 2023

wolfgangwalther commented Feb 1, 2022 •

edited

Loading

steve-chavez commented Jun 10, 2022 •

edited

Loading

steve-chavez commented Aug 5, 2022 •

edited

Loading

wolfgangwalther commented Aug 6, 2022 •

edited

Loading

steve-chavez commented Aug 6, 2022 •

edited

Loading

steve-chavez commented Aug 6, 2022 •

edited

Loading

steve-chavez commented Aug 11, 2022 •

edited

Loading

steve-chavez commented Aug 13, 2022 •

edited

Loading