-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: new partition grammar - parser part #3347
Conversation
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3347 +/- ##
==========================================
- Coverage 85.26% 84.46% -0.80%
==========================================
Files 881 893 +12
Lines 143843 147284 +3441
==========================================
+ Hits 122641 124406 +1765
- Misses 21202 22878 +1676 |
Signed-off-by: Ruihang Xia <[email protected]>
Looks cool. I have a question: assuming a user wants to partition data by date, for example, one partition per day, is this syntax currently supported? |
This is not supported in this place (or one can manually enumerate the partition, but this isn't what we want). Maybe we can implement some special rule for PARTITION ON COLUMNS (..., ts) (
...,
ts WITHIN '1d'
) But is it a good practice to include |
It seems that this mode is not supported now, and it will make the number of regions dynamic. This also has a certain impact on the structure of routing info. There may also be some inactive regions, which means that no new data is being written. |
That would be great!
Yes, it's not a good practice to partition data by |
BTW, Do we need to add some alias for each |
CREATE TABLE dist_table(
ts TIMESTAMP DEFAULT current_timestamp(),
n INT,
row_id INT,
PRIMARY KEY(n),
TIME INDEX (ts)
)
PARTITION BY RANGE COLUMNS (n) (
PARTITION r0 VALUES LESS THAN (5),
PARTITION r1 VALUES LESS THAN (9),
PARTITION r2 VALUES LESS THAN (MAXVALUE),
)
engine=mito; versus the new: CREATE TABLE dist_table(
ts TIMESTAMP DEFAULT current_timestamp(),
n INT,
row_id INT,
PRIMARY KEY(n),
TIME INDEX (ts)
)
PARTITION ON (n) (
n < 5,
n < 10,
n >= 10,
)
engine=mito; It's of course good to have this simplification, and I'm ok with this new grammar. However, I'm afraid if the partition rules grow, or the partition methods like list or hash being supported, in the end the new grammar would have became just as obscure as the old.
|
To remind, our database has the same capability corresponding to Data would be organized to appropriate time windows during compaction, automatically. What can benefit more from encouraging users to set a time span?
Here uses the same parser with
That's a good point. There should be a grammar to define repeated rules (like the time partition). I added a task to the tracker |
The old grammar comes from mysql, it's complex but complete. If the new grammar looks just like the old, I'm afraid we are wasting time reinventing it. So I'm neutral to this change. I say let's design a whole new syntax for the unique dynamic partitioning feat for GreptimeDB! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep pushing!
The integration test always fails @waynexia |
eee8f2b
to
b20921d
Compare
b20921d
to
100f19b
Compare
Signed-off-by: Ruihang Xia <[email protected]>
100f19b
to
2661dda
Compare
I hereby agree to the terms of the GreptimeDB CLA
What's changed and what's your intention?
This is the first patch to the new partition rule. It only changes the parser and grammar. Other logics are left unchanged or blank. I wanna to merge this first because this part may trigger a lot of discussion.
A brief take out of the new grammar. It contains two parts, one is a list of columns, which is used as an allow-list of which columns can be present in the partition rule. And the second is a set of initial rule list. The grammar looks like the following
An example:
Checklist
Refer to a related PR or issue link (optional)