Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computed relationship changes the shape of main query when selecting it as a computed column #2480

Open
laurenceisla opened this issue Sep 21, 2022 · 3 comments
Labels
idea Needs of discussion to become an enhancement, not ready for implementation

Comments

@laurenceisla
Copy link
Member

laurenceisla commented Sep 21, 2022

create schema api;

create table api.premieres (
  id text primary key,
  film_id text -- references api.films(id)
);
insert into api.premieres values
  ('P1', 'F1'),
  ('P2', 'F1'),
  ('P3', 'F2'),
  ('P4', 'F3')
 ;

create table api.films(id text primary key);
insert into api.films values('F1'), ('F2'), ('F4');
create or replace function api.computed_films_m2o(api.premieres) returns setof api.films as $$
  select * from api.films where id = $1.film_id
$$ stable language sql rows 1;

I see that taking a set-returning function as a computed column is acceptable, but the result is still unexpected (to me). Intuitively (and conventionally with PostgREST), a query with computed columns returns the same number of rows as the main query, as in a left join. That is, if we accept taking the computed_films_m2o as a computed column, it should return

GET /premieres?select=*,computed_films_m2o

[{"id":"P1","film_id":"F1","computed_films_m2o":{"id":"F1"}},
 {"id":"P2","film_id":"F1","computed_films_m2o":{"id":"F1"}},
 {"id":"P3","film_id":"F2","computed_films_m2o":{"id":"F2"}},
 {"id":"P4","film_id":"F3","computed_films_m2o":null}]

instead. Otherwise, we have to break the current behevior of that a computed column never change the shape of the main query.

But the presented inner join is not a bug, but a postgresql feature, because PostgreSQL's behavior for a set-returning function in a query's select list is almost exactly the same as if the set-returning function had been written in a LATERAL FROM-clause item instead, i.e.,

select *,  computed_films_m2o(premieres) from premieres

is almost the same as

select premieres.*,  films from premieres, lateral computed_films_m2o(premieres) as films

Regarding that, we could either exploit it as a postgrest feature that a set-returning function as computed column is for inner-join and a rowtype-returning scalar function as computed column is for left-join, or simply deny a set-returning function taken as a computed column for less headache.

Personally, I think the former option is very interesting and makes postgrest more flexible, although it requires explicit documentation and more tests and experiments.

Originally posted by @Iced-Sun in #2475 (comment)

@laurenceisla
Copy link
Member Author

laurenceisla commented Sep 21, 2022

This was my take of the issue:

From my POV I see it more using a PostgreSQL perspective. Also, this was allowed even before the computed relationships were implemented. I just tested GET /premieres?select=*,computed_films_m2o on PostgREST v9.0.1 and got the same result, returning a different number of rows than the main query (it may work the same way for older versions), so changing this behavior could be considered a breaking change. Still, it needs consensus, I think.

@laurenceisla laurenceisla added question idea Needs of discussion to become an enhancement, not ready for implementation labels Sep 21, 2022
@steve-chavez
Copy link
Member

So since computed relationships are computed columns they can be used on select and thus filter the same way as an !inner.

I think we should keep the convention of select not filtering the rows. !inner broke this but it was a mistake that we can correct with the exists filter(#2340)

or simply deny a set-returning function taken as a computed column for less headache.

I would prefer that option. This would require considerable work though because we don't have computed columns in the schema cache.

@wolfgangwalther
Copy link
Member

or simply deny a set-returning function taken as a computed column for less headache.

I would prefer that option. This would require considerable work though because we don't have computed columns in the schema cache.

I think we should make sure to make the computed relationships features able to call all kinds of computed columns (i.e. scalar types with empty parens select=*,computed() and inline support for all types) and then deprecate the "implicit" way of calling computed columns without brackets. A next step could be to check all column names against the real columns of the queried relation and to reject calling computed columns without parens - i.e. disabling the deprecated feature.

This would give us a way out of the "select affects the number of rows returned" mess we have right now.


Summarizing this and the other issue:

We currently have two problems:

  • In some situations we call set-returning functions in the SELECT part, which modifies the number of rows through select. This is the case with the old computed column syntax.
  • In other cases we call non-set-returning functions in the FROM part, which makes the function not inlineable. This is the case with the new syntax.

We can fix both by making the new syntax cover all cases and then remove the old syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Needs of discussion to become an enhancement, not ready for implementation
Development

No branches or pull requests

3 participants