Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order parent by child's column (for to-one mapping) #1414

Closed
Iced-Sun opened this issue Nov 16, 2019 · 26 comments · Fixed by #2511
Closed

Order parent by child's column (for to-one mapping) #1414

Iced-Sun opened this issue Nov 16, 2019 · 26 comments · Fixed by #2511
Labels
embedding resource embedding enhancement a feature, ready for implementation

Comments

@Iced-Sun
Copy link

Environment

  • PostgreSQL version: not relevant
  • PostgREST version: 6.0.2
  • Operating system: not relevant

Description of issue

CREATE TABLE address AS (
  id uuid PRIMARY KEY,
  name text
);

CREATE TABLE site AS (
  id uuid PRIMARY KEY,
  address_id uuid NOT NULL REFERENCES address(id)
); 

I want something like GET /site?select=*,address(*)&order=address.name. Is that possible?

@steve-chavez
Copy link
Member

For ordering an embedded resource, the syntax is like:

GET /site?select=*,address(*)&address.order=name

Does that do what you want?

@Iced-Sun
Copy link
Author

I am sorry that I didn't make my intention clear.

By ordering parent, I mean "ordering parent", not ordering the embedded resource.

As one site can refer one and only one address, sorting the addresses belong to one site is meaningless.

Explicitly, I want

SELECT site.* FROM site JOIN address ON site.address_id = address.id
ORDER BY address.name

rather than

SELECT
  site.*
, SELECT json_agg(address ORDER BY address.name) FROM address WHERE address.id = site.address_id
FROM site;

@steve-chavez
Copy link
Member

I see what you mean. Perhaps this could be arranged for many-to-one embeds(we currently don't detect one-to-one's). Not sure about the query string syntax but this sql could be generated:

-- projects >- clients (many-to-one)

-- enhacing the generated query from:
-- curl localhost:3000/projects?select=*,clients(*)

SELECT
  "test"."projects".*,
  row_to_json("projects_clients".*) AS "clients"
FROM "test"."projects"
LEFT JOIN LATERAL(
  SELECT "test"."clients".*
  FROM "test"."clients"
  WHERE "test"."projects"."client_id" = "test"."clients"."id") AS "projects_clients" ON TRUE
ORDER BY projects_clients.name;

@Iced-Sun
Copy link
Author

IMHO, the query string could be
curl localhost:3000/projects?select=*,clients(*)&order=clients.name
which seems quite consistent with the current syntax.

@christiaanwesterbeek
Copy link

This is something that I need as well. I would like to sort the parent by some column of the 1-to-1 embedded resource. I would use this query syntax, because it the same as the select-syntax and it allows ascending en descending order and is consistent with the fact that asc/desc as first thing after the dot.

curl localhost:3000/projects?select=*,clients(*)&order=clients(name).asc

When I tried the above, Postgrest 6.0.2 returns:

{
    "details": "unexpected '(' expecting letter, digit, \"-\", \"->>\", \"->\", delimiter (.), \",\" or end of input",
    "message": "\"failed to parse order (clients(name).asc)\" (line 1, column 9)"
}

@steve-chavez
Copy link
Member

IMHO, the query string could be
curl localhost:3000/projects?select=,clients()&order=clients.name

I wonder if the syntax should stay the same as always:

GET /projects?select=*,clients(*)&clients.order=name

Because since projects-clients is many-to-one, ordering the embedded clients as we do now is useless since they will always contain one element.

@steve-chavez
Copy link
Member

curl localhost:3000/projects?select=,clients()&order=clients(name).asc

Parentheses would be better(since . is used as a separator for us), but I think it should be like:

GET /projects?select=*,clients(*)&order=clients(name.asc,rank.desc)

Since that would allow ordering on multiple columns of the embedded table.

@Iced-Sun
Copy link
Author

Iced-Sun commented Dec 3, 2021

IMHO, the query string could be
curl localhost:3000/projects?select=,clients()&order=clients.name

I wonder if the syntax should stay the same as always:

GET /projects?select=*,clients(*)&clients.order=name

Because since projects-clients is many-to-one, ordering the embedded clients as we do now is useless since they will always contain one element.

By order=clients.name I mean to order parent (projects) by clients.name; to order the embedded clients, we stay clients.order=name.

Parentheses would be better(since . is used as a separator for us), but I think it should be like:

GET /projects?select=*,clients(*)&order=clients(name.asc,rank.desc)

Since that would allow ordering on multiple columns of the embedded table.

I wonder if parentheses work well with some cases, say, how do we order parent by a mixture of columns from different levels?

select ... from ...
order by projects_clients.name, project.name, projects_clients.rank desc

I think it could be expressed easily as order=clients.name,name,clients.rank.desc, but using parentheses seems to be a challenge, unless the form in #1414 (comment) is taken: order=clients(name),name,clients(rank).desc.

@wolfgangwalther
Copy link
Member

I know this is breaking, but instead of wrapping everything in parentheses, I'd like something like this more:

GET /projects?select=*,clients(*)&order=clients.name,name,clients.rank!desc

@steve-chavez
Copy link
Member

order=clients(name),name,clients(rank).desc

@Iced-Sun I think it should be like:

order=clients(name),name,clients(rank.desc)

That's more consistent and seems similar to the way we do or. We don't have a syntax for doing (<any>).<any> right now.

In general, I've been avoiding using PostgreSQL dot notation(table.column) because it conflicts with how we use dot(.) as a separator.

@steve-chavez
Copy link
Member

Using pg dot notation opens up a door for having a very hard to read(and type) or operation: https://postgrest.org/en/v9.0/api.html#logical-operators.

@Iced-Sun
Copy link
Author

Iced-Sun commented Dec 4, 2021

I see the point.

As far as I am aware, the dot notation has several usages in Postgrest:

  1. a separator between identifiers/expressions and keyswords:
    • name=like.Chris*
    • not.or=(age.lt.10,age.gt.20)
    • order=rank.desc
  2. a qualification separator:
    • clients.rank=gt.1
    • clients.order=rank.desc.nullsfirst
    • clients.or=(name.like.Chris*,rank.gt.1)

The parentheses have different semantics, either:

  1. resembling the PG syntax (partially):
    • IN list: age=in.(2,3,5) -> age IN (2,3,5)
    • range: range=ov.(1,10) -> range && (1,10)
    • operation order grouping: or=(age.lt.10,age.gt.20) -> (age < 10 OR age > 20)
    • function parameter: my_tsv=fts(french).amusant -> my_tsv @@ to_tsquery('french', 'amusant')
  2. indicating an embedding of resources:
    • select=clients(*)

It seems to me that both order=clients(name),name,clients(rank.desc) and order=clients.name,name,clients.rank.desc have some violations with the current convention:

  • order=clients(name),name,clients(rank.desc) introduces a new semantic of parentheses, it is neither a pg syntax, nor a resource embedding
  • order=clients.name,name,clients.rank.desc : although . has a usage for qualification, they appear only in the leftside of = for now

That said, I think both notations are acceptable. After all, they are not that different from an end-user's view.

@Iced-Sun
Copy link
Author

Iced-Sun commented Dec 5, 2021

Using pg dot notation opens up a door for having a very hard to read(and type) or operation: https://postgrest.org/en/v9.0/api.html#logical-operators.

I didn't think of that.

Do you mean that if order=clients.name,name,clients.rank were used, then we should also consider to support or=(clients.name.like.Chris*,and(name.eq.TopProject,clients.rank.gt.1)?

It is indeed hard to read, and or=(clients(name).like.Chris*,and(name.eq.TopProject,clients(rank).gt.1) seems slightly better.


Regarding the implementation for ordering/filtering parents by child columns, considering the following request

GET /projects?select=name,clients(name)&and=(clients(validity).ov.[2021-12-01,2021-12-05),name.eq.TopProject,clients(rank).gt.1)&order=clients(rank2).desc,name

obviously we need the child columns of clients.validity, clients.rank and clients.rank2 for ordering/filtering, which are not in the target list.

To support the above functionality, a query may be generated as

SELECT "projects"."name",
  "projects_clients"."clients"
FROM "projects"
LEFT JOIN LATERAL (
  SELECT
    -- the result column
    json_build_object('name', "clients"."name") as "clients",
    -- the filtering columns
    "clients"."validity", "clients"."rank",
    -- the ordering column
    "clients"."rank2"
  FROM "clients"
  WHERE "clients"."project_id" = "projects"."id"
) AS "projects_clients" ON TRUE
WHERE (
  "projects_clients"."validity" && '[2021-12-01,2021-12-05)'::unknown
  AND "projects"."name" = 'TopProject'::unknown
  AND "projects_clients"."rank" > '1'::unknown
) ORDER BY "projects_clients"."rank" DESC, "projects"."name";

There is one concern that I am aware of now. Considering the following requests:

  1. projects?select=*,clients(*)&clients.name=like.Chris*&clients.rank=gt.1
  2. projects?select=*,clients(*)&and=(clients(name).like.Chris*,clients(rank).gt.1)
  3. projects?select=*,clients(*)&clients.and=(name.like.Chris*,rank.gt.1)

1 & 3 are filtering in the child level, and returning all parents independ on the child filters; while 2 are filtering in the parent level, and returing only parents that filtered by the child filters. One can add a filter of clients=is.not.null to 1 & 3 to make all the three requests the same.

A bit confusing, right?

@steve-chavez
Copy link
Member

The parentheses have different semantics, either:

That's true, parentheses are not consistent.

range:

Note that this isn't really "our" syntax, as it is pg notation for ranges. It could also be [1,5]. The operand(col=op.operand) is really a free-form value.

IN list:

IIRC, the idea of using parentheses was temporary here, but it stayed like that. Now we're trying to provide an any operator instead and that wouldn't need the parens: #1943 (comment)

function parameter: my_tsv=fts(french).amusant -> my_tsv @@ to_tsquery('french', 'amusant')

I think that was a mistake, we could have done it like fts!french.amusant. Similar to how we specify hints for embedding. On #915 (comment), we were thinking that ! should be used for functions' arguments.

indicating an embedding of resources
operation order grouping

Yes, grouping is what parens should be used for. In a way /projects?select=*,clients(*) is really /projects?select=projects(*,clients(*)) so I think the idea of grouping still applies for embedding.

order=clients(name),name,clients(rank.desc)

So from the above, this would still be somewhat consistent to grouping since a regular /projects?order=id.asc,name.desc really is /projects?order=projects(id.asc,name.desc).


I was also thinking we could do pg dot notation with a different operator, like ->., but that would look worse:

order=clients->.name,name,clients->.rank.desc

Related to the syntax proposed for composite type filtering #1543 (comment)

@wolfgangwalther
Copy link
Member

In a way /projects?select=*,clients(*) is really /projects?select=projects(*,clients(*))

Yes.

since a regular /projects?order=id.asc,name.desc really is /projects?order=projects(id.asc,name.desc).

No. It's more like /projects?projects.order=id.asc,name.desc.


When using a child's column at the parent level for either filtering or ordering, it's not about "grouping", but about "accessing" child members. There are multiple different ways in PostgreSQL to access different kind of nested fields, of which only some are implemented right now in PostgREST:

  • For json/jsonb using -> and ->>: implemented.
  • Subscripting with [...] for arrays, jsonb and hstore: not implemented
  • . for relations: implemented only for computed colummns. Not applicable otherwise, because the query we are constructing does just not have multiple tables in a single from clause, but uses row_to_json and json_agg instead.
  • . for composite types: not implemented.

Taking a step back for a second: Once we agree on a syntax for how to pass the child column in the parent level order param in the query string - what will we have to do with it in the query? Since we're only dealing with aggregated json documents from embedding, we'd have to implement it as a ORDER BY clients->>'id' ASC, clients->>'name' DESC, correct?

Why don't we just use -> and ->> in the order param of the querystring right away? And even more so in general for all kinds of field accessors?

I suggest the following:

  • Use -> and ->> in ?order= like so: /projects?select=*,clients(*)&order=clients->name,name,clients->rank.desc.
  • Use -> and ->> in filtering the parent based on child values like so: /projects?select=*,clients(*)&clients->>name=eq.test+client
  • Wrap all other usages of col->field and col->>field into a to_json(col)->'field' construct. When using those on a regular json column, this will be a no-op, since it's already json. But when using those on composite or array columns, this would essentially solve Composite type filtering  #1543 and Access information inside an array data type #2075 in one go.

Since we're serving JSON documents in the end, we can as well use operators that rely on that. We'd only have one syntax for accessing fields in -> and ->>. This would remove all the ambiguity challenges we currently have with the dot operator in that regard.

@steve-chavez
Copy link
Member

steve-chavez commented Feb 2, 2022

we'd have to implement it as a ORDER BY clients->>'id' ASC, clients->>'name' DESC, correct?
Use -> and ->> in ?order= like so: /projects?select=,clients()&order=clients->name,name,clients->rank.desc

I haven't yet tried the queries, but if it'd be done with the arrows then I fully agree, that keeps things consistent.

Use -> and ->> in filtering the parent based on child values like so: /projects?select=,clients()&clients->>name=eq.test+client

Wait, is this one an alternative to the !inner join? Both for syntax and generated query?

@wolfgangwalther
Copy link
Member

Use -> and ->> in filtering the parent based on child values like so: /projects?select=,clients()&clients->>name=eq.test+client

Wait, is this one an alternative to the !inner join? Both for syntax and generated query?

Yes, you could:

  • use client->>col=not.is.null for a NOT NULL column
  • use client->col=not.is.null for any column

But, in both cases the column needs to be selected in the clients(...) definition. So it's not possible to do an "empty inner join". This is not possible with !inner either - but it might be possible with the exists operator we discussed elsewhere?

I think the !inner / exists approach is also likely to perform better.

@steve-chavez
Copy link
Member

steve-chavez commented Sep 23, 2022

Since we're only dealing with aggregated json documents from embedding, we'd have to implement it as a ORDER BY clients->>'id' ASC, clients->>'name' DESC, correct?

Just to clarify, the above ORDER would not work because the JSON is not available at that level but it can be done as mentioned above.

order by projects_clients.name, project.name, projects_clients.rank desc

use client->>col=not.is.null for a NOT NULL column
use client->col=not.is.null for any column
But, in both cases the column needs to be selected in the clients(...) definition.

An option would be having an empty embed might as suggested on #2340 (comment). Overall the arrow looks better than exists for bringing an embed column to the top-level, underlyingly we can use the same query as !inner.

GET /projects?select=*,clients()&clients->name=eq.nombre&order=clients->name

It would also save the need to reinvent the hint syntax on an exists filter.

@steve-chavez
Copy link
Member

steve-chavez commented Oct 22, 2022

This is more or less working on #2511 but I think there could be trouble with the operator since it's ambiguous with the arrow json operators.

In theory this special arrow operator could be extended to go down multiple levels(if all the embeds are to-one relationships)

/grandchild_entities?select=*,child_entities(*,entities(*))&order=child_entities->entities->id

Looking at the order param, entities could also be a json column of child_entities.

Considering this special arrow operator has multiple uses:

  • the order proposed here
  • replacement for inner(ref)
  • "bringing up" conditionals of a child table to a parent table(ref)

It would be good to have a unambiguous syntax for it, this will also simplify the implementations.

How about:

  • -->?
/grandchild_entities?select=*,child_entities(*,entities(*))&order=child_entities-->entities-->id
  • -.>?
/grandchild_entities?select=*,child_entities(*,entities(*))&order=child_entities-.>entities-.>id

@wolfgangwalther
Copy link
Member

Hm, both --> and -.> are very similar to -> and ->> - but not very similar to our embedding syntax. This could be confusing, I think.

What about a syntax that is closer to embedding?

/grandchild_entities?select=*,child_entities(*,entities(*))&order=child_entities(entities(id))

@steve-chavez
Copy link
Member

steve-chavez commented Oct 24, 2022

Yeah, I think that's the best option. Iced-Sun also proposed the same above for the or/and filters.

For inner it would be like /projects?client(col)=not.is.null.

I'm having trouble coming up with a name for this feature. We already have "embedded filters", what should we call these?

I think the previous "embedded filters" section should have been called "filtering embedded resources" - only filters embedded resources and not the top/parent resource. If I rename it then

  • the order proposed here

Could be called "embedded order"

  • replacement for inner

"Embedded filters"

  • "bringing up" conditionals of a child table to a parent

"Embedded conditionals"

@wolfgangwalther
Copy link
Member

order=clients(name),name,clients(rank).desc

@Iced-Sun I think it should be like:

order=clients(name),name,clients(rank.desc)

I think we should keep the .desc on the outside. Should also make the parsers simpler, because we can use the nested parens parser in different places?

@wolfgangwalther
Copy link
Member

I'm having trouble coming up with a name for this feature. We already have "embedded filters", what should we call these?

I think the previous "embedded filters" section should have been called "filtering embedded resources" - only filters embedded resources and not the top/parent resource. If I rename it then

Instead of 3 names for 3 features, how about only 1 feature - accessing embedded values in a "higher" context? How about calling that something like "lifting", "hoisting", "elevating" or "promoting"? And then give examples how to do that in the 3 different contexts.

@steve-chavez
Copy link
Member

How about calling that something like "lifting", "hoisting", "elevating" or "promoting"? And then give examples how to do that in the 3 different contexts.

I'm thinking of calling these features "related orders"/"related filters"/"related conditions". "related" seems more apt and consistent. Not seeing a better name for now.

@wolfgangwalther
Copy link
Member

In terms of "embedding relations", we now have "embedded filters" and "related filters", "embedded orders" and "related orders" and so on. I actually like it. "embedded filter" implies that the filter itself is embedded ("inside"), too. "related filter" implies that the filter is not only working on the current table's data, but also on other, related tables. Both have a good connection to "embedding relations".

Only that "embedding relations" is currently called "embedding resources" or "resource embedding". Hm.

Anyway I like the "related".

@steve-chavez
Copy link
Member

Only that "embedding relations" is currently called "embedding resources" or "resource embedding". Hm.

I also had some some thoughts about this while thinking of refactoring the ApiRequest type.

Right now we have two type of resources: relations(collections of resources) and functions(which are a REST resource, as discussed on #1086 (comment))

If we allowed embedding a relation with a function(which should be possible in theory), then we could call the feature "resource embedding" but truly we only have "relation embedding". So to rename "resource embedding" to "relation embedding" I think we'd have to clarify our REST design on the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embedding resource embedding enhancement a feature, ready for implementation
Development

Successfully merging a pull request may close this issue.

4 participants