-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: optimize count(*)
#3845
perf: optimize count(*)
#3845
Conversation
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3845 +/- ##
==========================================
- Coverage 85.60% 85.27% -0.33%
==========================================
Files 954 955 +1
Lines 163325 163426 +101
==========================================
- Hits 139808 139369 -439
- Misses 23517 24057 +540 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This patch has some corner cases that don't handle properly, for example: Create two tables:
Insert some rows: insert into "HelloWorld" values ("a", 1) ,("b", 2);
insert into test values ("c", 1) ;
mysql> select count(*) from (select count(*) from test where a = 'a');
ERROR 1815 (HY000): DataFusion error: No field named test.b. Valid fields are "COUNT(*)". The above examples work before this patch. @waynexia @fengjiachun @evenyag |
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
TL;DR: 5x
Before
After
This patch adds a new optimizer rule that converts
count(*)
tocount(<TIME INDEX>)
. This optimization is based on the fact that our underlying storage engine scans faster on time index column than primary key column. Reading time index column does not need decoding phase like in primary key column. This rule is extended from the one from datafusion and overrides it.This patch also changes some logic in range plan, making it able to handle aggr expr with alias.
Checklist