Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: whether to add new query syntaxes (syntax sugars) #10268

Closed
xxchan opened this issue Jun 9, 2023 · 18 comments
Closed

Discussion: whether to add new query syntaxes (syntax sugars) #10268

xxchan opened this issue Jun 9, 2023 · 18 comments
Milestone

Comments

@xxchan
Copy link
Member

xxchan commented Jun 9, 2023

New syntaxes have been proposed several times

IIRC until now we haven’t introduced any new query syntaxes/sugars (only some new statements). To avoid bikeshedding, it's time to decide our general attitude/policy about whether and when to add syntax sugars.

Arguments against new syntaxes

  • We have claimed to be PG-compatible. Users might feel confused if they found we have convenient new syntaxes. We might at least have a doc listing our new syntaxes if we are going to add some.

  • Our users mainly run long-running MVs instead of ad-hoc queries (unlike Clickhouse/Bigquery/Snowflakes/...), so they can write the SQLs in an editor and it's not that inconvenient and painful.
    Besides,

    I think it appears to be some dbt-level stuff, like a macro and users write that and get translated automatically, but still send the original to our Frontend to process. No changes need to be made on our side.

    Originally posted by @lmatz in feat: SELECT * EXCEPT [columnA,columnB,...]  #9092 (comment)

  • A technical concern is that adding new syntaxes may break other things unexpectedly, as SQL is complex and not well-designed.

Arguments for new syntaxes

  • The new syntax is really convenient in some use cases...

  • Just feel that the ultimate goal is letting people use RW smoothly,
    and being compatible with PG(PG's syntax works in RW) is a proxy goal because if DuckDB suddenly becomes the most popular DB and PG becomes the least popular, it makes sense for us to be compatible with DuckDB.
    And providing something extra does not prevent PG users from using their favorite syntax.

    Originally posted by @lmatz in Enable having clause to refer to an alias #7506 (comment)

Other points

  • If we do want to add, what's the criteria to add? (Maybe just case by case arguments)

  • How much is the proxy goal PG-compatibility valid?

    One real problem is that many PG driver libraries don't work with RisingWave now, mostly due to the complexity of system tables. So we don't benefit that much from the PG-ecosystem (at least for now). And IIRC we decided to develop some custom drivers for some libraries.

    But I think a familiar (or popular) syntax of course lowers the user's learning burden.

@github-actions github-actions bot added this to the release-0.20 milestone Jun 9, 2023
@st1page
Copy link
Contributor

st1page commented Jun 9, 2023

I think PG-compatible means that "RisingWave can run the pg SQL without high migration cost" and not the reverse. I do not think there are users who want to copy RisingWave's SQL to a PG or other PG-compatible databases. And also, we have supported for many things PG can not do such as TUMBLE, HOP or CREATE SINK. They do not "break" the PG's syntax.

@TennyZhuang
Copy link
Contributor

TennyZhuang commented Jun 9, 2023

We have claimed to be PG-compatible.

Pg also claimed to be standard-compatible, but they add many new syntaxes.

We might at least have a doc listing our new syntaxes if we are going to add some.

risingwave.dev can do this, but I don't think it's necessary.

Redshift also claimed to be pg-compatible, but they don't list every details that are incompatible, e.g. https://docs.aws.amazon.com/redshift/latest/dg/r_CONNECT_BY_clause.html is only available in Redshift.

Users might feel confused if they found we have convenient new syntaxes.

IMO no user will really port some SQL directly from Pg, most of them port their SQL from big data system. Pg-compatibility just make them easy to write SQLs. So the key goal is ease-to-use.

they can write the SQLs in an editor

They also need to debug their SQLs interactively.


BTW, we've already add our new syntaxes, e.g. SHOW families.

@xxchan
Copy link
Member Author

xxchan commented Jun 9, 2023

I do not think there are users who want to copy RisingWave's SQL to a PG or other PG-compatible databases

I'm imagine there's a cognitive problem: If I see a DB claims to be PG-compatible, and then found a new syntax, I may begin to doubt whether the claim is true. BTW, I think this already happened when users meet problems of client libraries..

And also, we have supported for many things PG can not do such as TUMBLE, HOP or CREATE SINK.

They are necessary new features. I want to discuss mainly syntax sugars here, which seems both reasonable and not that necessary at the same time.

@st1page
Copy link
Contributor

st1page commented Jun 9, 2023

Pg also claimed to be standard-compatible, but they add many new syntaxes.

FYI: https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard

@TennyZhuang
Copy link
Contributor

TennyZhuang commented Jun 9, 2023

They are necessary new features. I want to discuss mainly syntax sugars here, which seems both reasonable and not that necessary at the same time.

At least SHOW TABLES and SHOW CREATE TABLE is not necessary :)

@st1page
Copy link
Contributor

st1page commented Jun 9, 2023

BTW, I think this already happened when users meet problems of client libraries

I think it is just what I say. Users only are confused when they found something is not supported rather than they found some new syntax.

@TennyZhuang
Copy link
Contributor

I'd like to propose more syntax sugars I want to do.

  1. CREATE MVIEW as an alias to CREATE MATERIALIZED VIEW.
  2. COMMENT over a field declaration directly instead of COMMENT ON statement (It's very welcome and almost supported by most databases).
  3. trailing comma everywhere.

@xxchan
Copy link
Member Author

xxchan commented Jun 9, 2023

We might at least have a doc listing our new syntaxes if we are going to add some.

risingwave.dev can do this, but I don't think it's necessary.

Redshift also claimed to be pg-compatible, but they don't list every details that are incompatible, e.g. https://docs.aws.amazon.com/redshift/latest/dg/r_CONNECT_BY_clause.html is only available in Redshift.

I didn't find they official claim to be PG-compatible. They call themselves "Amazon Redshift SQL".

I found such a doc "Amazon Redshift and PostgreSQL" which seems good enough.

@TennyZhuang
Copy link
Contributor

TennyZhuang commented Jun 9, 2023

Another RW feature which is already documented and not supported by pg :(

CREATE TABLE IF NOT EXISTS taxi_trips(
    id VARCHAR,
    distance DOUBLE PRECISION,
    duration DOUBLE PRECISION,
    fare STRUCT<
      initial_charge DOUBLE PRECISION, 
      subsequent_charge DOUBLE PRECISION, 
      surcharge DOUBLE PRECISION, 
      tolls DOUBLE PRECISION>);

struct<> is not a pg type (it's from FlinkSQL), we should use CREATE TYPE in pg way.

@xxchan
Copy link
Member Author

xxchan commented Jun 9, 2023

CREATE MVIEW as an alias to CREATE MATERIALIZED VIEW.

(bikeshedding) BTW, psql can auto-complete MATERIALIZED VIEW by TAB.

@xxchan
Copy link
Member Author

xxchan commented Jun 9, 2023

At least SHOW TABLES and SHOW CREATE TABLE is not necessary :)

struct<> is not a pg type

Yes I know we have new statements and new data types. (we also have int256) I mentioned that at the beginning.

I'm not sure whether it's reasonable, but I do have a feeling that query-level syntax sugars are of different nature.

@TennyZhuang
Copy link
Contributor

https://materialize.com/docs/sql/select/#using-query-hints Options clause in materialize.

@xxchan
Copy link
Member Author

xxchan commented Jun 9, 2023

@TennyZhuang
Copy link
Contributor

DEFAULT INDEX in materialize https://materialize.com/docs/sql/create-index/#syntax

@neverchanje
Copy link
Contributor

neverchanje commented Jun 9, 2023

Thank @xxchan for bringing up this discussion.

My personal workflow is to introduce new features based on:

  1. what other DB products provide
  2. what our real use cases have strongly requested

Typically, Features with the lowest priority are those that aren't urgent and are likely to be changed in the future. For example, I used to suggest that we don't need to introduce uint256 if int256 is enough for crypto use cases.

In my opinion, following Postgres's syntax helps us avoid pitfalls because we trust in the stability of PG's syntaxes. For example, regarding the design of the OVER window, I suggest following PG's syntax instead of inventing something new. However, if necessary, I am open to introducing new syntax to address problems that are uncommon in PG's scenarios.

Therefore, if you want to introduce anything new, please explain the necessities, such as:

  1. At least 2 similar products have supported this syntax.
  2. A real user has given us the feedback that this sytnax would bring significant benefits. Put it simply, we need to know that users like it.

Furthermore, such a feature should not break PG drivers' compatibility. Because we don't plan to develop our own custom drivers.

@lmatz
Copy link
Contributor

lmatz commented Jun 11, 2023

Two previous issues and a meta-discussion issue with 15 replies (now 16 and continue) already say a lot about the nature of the discussion, i.e.:

  1. it just all boils down to personal preference, which is not wrong
  2. who has been bothered by the "insufficient" syntax sugar(which could be wrong due to the wrong way of expressing certain SQL), and who just doesn't feel the pain because he does not run into these cases on a daily basis. This could be asymmetric: the affected part is always small with respect to the entire SQL syntax and is likely to only affect a small percentage of people who needs to deal with that part constantly.

I support

just case by case arguments but (concrete arguments, not taste)

and a simple vote (voters specify the reason or agree to others' arguments) that can be triggered at most every XXX period with new arguments and the vote last for XXX period.

  1. Cases vary a lot in terms of the potential implication they can have. Over-generalization can prevent any conclusion from being made.
  2. Following other DBs or encouraging new inventions are both just strategies, which means they work sometimes and do not work sometimes. Someone/some product is going to make the first move after all.
  3. Avoid whataboutism, which again can prevent any conclusion from being made.
  4. Developers of Risingwave are also real users. They can either argue from their hands-on experience and/or refer to third-party feedback.

The above are all with respect to the topic of this issue add new query syntaxes (syntax sugars).
add is not invalidating any existing syntax. This can be "guaranteed" by adding new test cases. (Proposers of new syntaxes can be responsible for this).
We remark that matter of fact, nobody knows how many incompatibles RW has with regard to PostgreSQL right now. The compatibility can also only be "guaranteed" by adding new test cases progressively too.

A vote with concrete arguments from both sides leads to a decision and a second-time motion allows people to change their minds.
But voting is also just a strategy that tries to have a balance, so feel free to challenge it too.

@fuyufjh
Copy link
Member

fuyufjh commented Jun 12, 2023

For myself, I am usually willing to play the role of the opponent. A common problem for engineers (especially the smart ones) is trying too hard to avoid "repeat by yourself" (DRY). But when you look at the big picture, this syntactic sugar may have reduced the workload for 5% of users, but created an average of 15s of confusion for the remaining 95%. If that's the case, it's just not worth it. What's more, those 5% of users often don't mind "repeat by yourself". :)

Also, super +1 for case by case arguments/concrete arguments. Some cases are common enough to introduce a syntactic sugar. Welcome for any discussion on a particular case.

@xxchan
Copy link
Member Author

xxchan commented Jun 14, 2023

To summarize, the consensus is that

  • No one completely rejects adding syntax sugars. We are open to new syntaxes.
  • PG-compatibility should not be harmed. i.e., new syntax should not conflict with PG syntax.
  • Case by case arguments are required, as cases vary a lot in terms of the potential implications. There are different strategies proposed:
    • Follow other DBs
    • Wait for users' feedbacks
    • Voting

But people have different attitudes (or philosophy?) towards the necessity of syntax sugars (and maybe also usability features like #10151).

@TennyZhuang is the most aggressive. He thinks micro inventions/improvements are also worth it. We can do it if it can make our users or ourselves even a little happier.

@fuyufjh is the most conservative, He tends to avoid new features unless there are significant benefits. "Any code can be problematic." (So less is better.)

But when you look at the big picture, this syntactic sugar may have reduced the workload for 5% of users, but created an average of 15s of confusion for the remaining 95%. If that's the case, it's just not worth it. What's more, those 5% of users often don't mind "repeat by yourself". :)

This also confirms to @lmatz 's words

This could be asymmetric: the affected part is always small with respect to the entire SQL syntax and is likely to only affect a small percentage of people who needs to deal with that part constantly.

I can imagine further debate like this:

But maybe these small percentage of people are really painful? We can't really know it. So @neverchanje 's idea of waiting for users' feedbacks sound the most practical.

But this prevents micro inventions. And user feedback is a lagging indicator, maybe users don't tell and just walk away if they are unhappy.

But syntax sugars (or some other minor UX improvements) aren't a decisive point that decides whether to use RisingWave or not.

But usability is our target, and we can have a higher standard as aspiring engineers.

I think such debate can hardly come to an end, but practically I guess this is our future:

  • Of course these are not top-priority, so we should focus on other priorities if there are any.
  • When there are user feedbacks, we would begin to consider it.
  • @TennyZhuang got tired of debating, so we don't (or seldom) do micro inventions ahead of time.

I will close this discussion now, but feel free to add some more comments or correct me if I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants