feat!: new partition grammar - parser part #3347

waynexia · 2024-02-21T09:18:26Z

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

This is the first patch to the new partition rule. It only changes the parser and grammar. Other logics are left unchanged or blank. I wanna to merge this first because this part may trigger a lot of discussion.

A brief take out of the new grammar. It contains two parts, one is a list of columns, which is used as an allow-list of which columns can be present in the partition rule. And the second is a set of initial rule list. The grammar looks like the following

CREATE TABLE (...)
PARTITION ON COLUMNS (A, B, C) (
    <RULE LIST>
)

An example:

CREATE TABLE (
    A INT,
    B INT,
    C INT,
    TS TIMESTAMP TIME INDEX,
    PRIMARY KEY (A, B, C)
)
PARTITION ON (A, B, C) (
  A < 10,
  A > 10 AND A < 20,
  A > 20 AND B < 100,
  A > 20 AND B >= 100
);

Checklist

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR does not require documentation updates.

Refer to a related PR or issue link (optional)

Signed-off-by: Ruihang Xia <[email protected]>

codecov · 2024-02-21T10:10:12Z

Codecov Report

Attention: Patch coverage is 97.10983% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 84.46%. Comparing base (450dfe3) to head (2661dda).
Report is 43 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3347      +/-   ##
==========================================
- Coverage   85.26%   84.46%   -0.80%     
==========================================
  Files         881      893      +12     
  Lines      143843   147284    +3441     
==========================================
+ Hits       122641   124406    +1765     
- Misses      21202    22878    +1676

Signed-off-by: Ruihang Xia <[email protected]>

killme2008 · 2024-02-21T16:36:00Z

Looks cool.

I have a question: assuming a user wants to partition data by date, for example, one partition per day, is this syntax currently supported?

waynexia · 2024-02-22T03:18:07Z

I have a question: assuming a user wants to partition data by date, for example, one partition per day, is this syntax currently supported?

This is not supported in this place (or one can manually enumerate the partition, but this isn't what we want).

Maybe we can implement some special rule for TIME INDEX in the future? Like

PARTITION ON COLUMNS (..., ts) (
  ...,
  ts WITHIN '1d'
)

But is it a good practice to include TIME INDEX in the partition columns?

fengjiachun · 2024-02-22T03:24:53Z

Looks cool.

I have a question: assuming a user wants to partition data by date, for example, one partition per day, is this syntax currently supported?

It seems that this mode is not supported now, and it will make the number of regions dynamic. This also has a certain impact on the structure of routing info. There may also be some inactive regions, which means that no new data is being written.
I have a little concern about the above question, and I am slightly doubtful whether this mode is really necessary, given that our DB naturally organizes data according to timestamps.

killme2008 · 2024-02-22T03:25:03Z

Maybe we can implement some special rule for TIME INDEX in the future? Like
PARTITION ON COLUMNS (..., ts) (
  ...,
  ts WITHIN '1d'
)

That would be great!

But is it a good practice to include TIME INDEX in the partition columns?

Yes, it's not a good practice to partition data by TIME INDEX in common. But in some use cases, there is not too much data, and users want to organize and manage the data by date.

WenyXu · 2024-02-22T04:37:36Z

BTW, Do we need to add some alias for each <, >, >= , like adding lt, less than for <. (Maybe in another PR)

MichaelScofield · 2024-02-22T05:32:17Z

Partitioning by date like that requires creating region on demand. It's not trivial work. However, once we have dynamic partitioning, it should be easy to Implement this static partitioning. I suggest we take a consideration of starting from there.
Looking at the old partition grammar, I find the new grammar a little less interesting, more of like a "syntactic sugar". For example, the old:

CREATE TABLE dist_table(
    ts TIMESTAMP DEFAULT current_timestamp(),
    n INT,
    row_id INT,
    PRIMARY KEY(n),
    TIME INDEX (ts)
)
PARTITION BY RANGE COLUMNS (n) (
    PARTITION r0 VALUES LESS THAN (5),
    PARTITION r1 VALUES LESS THAN (9),
    PARTITION r2 VALUES LESS THAN (MAXVALUE),
)
engine=mito;

versus the new:

CREATE TABLE dist_table(
    ts TIMESTAMP DEFAULT current_timestamp(),
    n INT,
    row_id INT,
    PRIMARY KEY(n),
    TIME INDEX (ts)
)
PARTITION ON (n) (
    n < 5,
    n < 10,
    n >= 10,
)
engine=mito;

It's of course good to have this simplification, and I'm ok with this new grammar. However, I'm afraid if the partition rules grow, or the partition methods like list or hash being supported, in the end the new grammar would have became just as obscure as the old.

Partitioning by time is natural to human, and is common in tsdb. At least timescaledb and influxdb both first decide to do it. Originally there's a hotspot issue we are concerning of. But now under the reproposing of partition grammar, I think it's time to reconsider

waynexia · 2024-02-22T06:56:40Z

Yes, it's not a good practice to partition data by TIME INDEX in common. But in some use cases, there is not too much data, and users want to organize and manage the data by date.

Partitioning by time is natural to human, and is common in tsdb. At least timescaledb and influxdb both first decide to do it. Originally there's a hotspot issue we are concerning of. But now under the reproposing of partition grammar, I think it's time to reconsider

To remind, our database has the same capability corresponding to hypertables from TimescaleDB - or as a non-pg-based storage system, we simply do not have that issue that needs "hypertable".

Data would be organized to appropriate time windows during compaction, automatically. What can benefit more from encouraging users to set a time span?

BTW, Do we need to add some alias for each <, >, >= , like adding lt, less than for <. (Maybe in another PR)

Here uses the same parser with WHERE clause. Alias like <> for != is available. lt and less than can be chosen if we want more than WHERE.

However, I'm afraid if the partition rules grow, or the partition methods like list or hash being supported, in the end the new grammar would have became just as obscure as the old.

That's a good point. There should be a grammar to define repeated rules (like the time partition). I added a task to the tracker

MichaelScofield · 2024-02-22T07:29:10Z

The old grammar comes from mysql, it's complex but complete. If the new grammar looks just like the old, I'm afraid we are wasting time reinventing it. So I'm neutral to this change. I say let's design a whole new syntax for the unique dynamic partitioning feat for GreptimeDB!

killme2008

Keep pushing!

killme2008 · 2024-02-26T15:21:46Z

The integration test always fails @waynexia
https://github.com/GreptimeTeam/greptimedb/actions/runs/8050032205/job/21984750684

Signed-off-by: Ruihang Xia <[email protected]>

waynexia added 4 commits February 21, 2024 16:04

parser part

4c45228

Signed-off-by: Ruihang Xia <[email protected]>

fix test in sql

9130adf

Signed-off-by: Ruihang Xia <[email protected]>

comment out and ignore some logic

9815ebe

Signed-off-by: Ruihang Xia <[email protected]>

update sqlness cases

61ef729

Signed-off-by: Ruihang Xia <[email protected]>

github-actions bot added Invalid PR Title docs-required This change requires docs update. labels Feb 21, 2024

sunng87 mentioned this pull request Feb 21, 2024

Update docs for New partition grammar GreptimeTeam/docs#811

Closed

waynexia changed the title ~~New partition grammar~~ feat: new partition grammar - parser part Feb 21, 2024

github-actions bot removed the Invalid PR Title label Feb 21, 2024

waynexia changed the title ~~feat: new partition grammar - parser part~~ feat!: new partition grammar - parser part Feb 21, 2024

github-actions bot added the breaking-change This pull request contains breaking changes. label Feb 21, 2024

waynexia added 2 commits February 21, 2024 17:45

update region migration test

70ae6c5

Signed-off-by: Ruihang Xia <[email protected]>

temporary disable region migration test

5dafc6d

Signed-off-by: Ruihang Xia <[email protected]>

allow dead code

44ebe81

Signed-off-by: Ruihang Xia <[email protected]>

waynexia mentioned this pull request Feb 21, 2024

Tracking issue for new region partition rule #3351

Closed

5 tasks

fengjiachun requested review from MichaelScofield and WenyXu February 22, 2024 03:00

WenyXu approved these changes Feb 26, 2024

View reviewed changes

killme2008 approved these changes Feb 26, 2024

View reviewed changes

killme2008 added this pull request to the merge queue Feb 26, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 26, 2024

waynexia added this pull request to the merge queue Feb 26, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 26, 2024

killme2008 added this pull request to the merge queue Feb 26, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 26, 2024

waynexia added this pull request to the merge queue Feb 27, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 27, 2024

waynexia force-pushed the new-partition-grammar branch from eee8f2b to b20921d Compare February 27, 2024 02:03

waynexia added this pull request to the merge queue Feb 27, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 27, 2024

waynexia force-pushed the new-partition-grammar branch from b20921d to 100f19b Compare February 27, 2024 02:38

waynexia added this pull request to the merge queue Feb 27, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 27, 2024

update integration test

2661dda

Signed-off-by: Ruihang Xia <[email protected]>

waynexia force-pushed the new-partition-grammar branch from 100f19b to 2661dda Compare February 27, 2024 06:55

waynexia added this pull request to the merge queue Feb 27, 2024

Merged via the queue into GreptimeTeam:main with commit 3544c93 Feb 27, 2024
16 checks passed

waynexia deleted the new-partition-grammar branch February 27, 2024 07:31

tisonkun mentioned this pull request Mar 12, 2024

fix: avoid pushing invalid addr args GreptimeTeam/gtctl#195

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: new partition grammar - parser part #3347

feat!: new partition grammar - parser part #3347

waynexia commented Feb 21, 2024

codecov bot commented Feb 21, 2024 •

edited

Loading

killme2008 commented Feb 21, 2024

waynexia commented Feb 22, 2024

fengjiachun commented Feb 22, 2024

killme2008 commented Feb 22, 2024

WenyXu commented Feb 22, 2024 •

edited

Loading

MichaelScofield commented Feb 22, 2024

waynexia commented Feb 22, 2024

MichaelScofield commented Feb 22, 2024

killme2008 left a comment

killme2008 commented Feb 26, 2024

feat!: new partition grammar - parser part #3347

feat!: new partition grammar - parser part #3347

Conversation

waynexia commented Feb 21, 2024

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

codecov bot commented Feb 21, 2024 • edited Loading

Codecov Report

killme2008 commented Feb 21, 2024

waynexia commented Feb 22, 2024

fengjiachun commented Feb 22, 2024

killme2008 commented Feb 22, 2024

WenyXu commented Feb 22, 2024 • edited Loading

MichaelScofield commented Feb 22, 2024

waynexia commented Feb 22, 2024

MichaelScofield commented Feb 22, 2024

killme2008 left a comment

Choose a reason for hiding this comment

killme2008 commented Feb 26, 2024

codecov bot commented Feb 21, 2024 •

edited

Loading

WenyXu commented Feb 22, 2024 •

edited

Loading