Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projection pushdown regression in 0.5.0-nightly #2730

Closed
evenyag opened this issue Nov 10, 2023 · 5 comments
Closed

Projection pushdown regression in 0.5.0-nightly #2730

evenyag opened this issue Nov 10, 2023 · 5 comments
Labels
C-bug Category Bugs

Comments

@evenyag
Copy link
Contributor

evenyag commented Nov 10, 2023

What type of bug is this?

Performance issue

What subsystems are affected?

Standalone mode, Datanode

What happened?

The following SQL has performance regression in 0.5.0-nightly

SELECT mean(usage_user), date_trunc('hour', ts) from cpu where ts >= '2023-06-12T23:35:31Z' and ts < '2023-06-13T11:35:31Z' group by date_trunc('hour', ts),hostname;

What operating system did you use?

Unrelated

Relevant log output and stack trace

2023-11-10T09:10:46.524076Z DEBUG mito2::read::scan_region: Seq scan region 4398046511104(1024, 0), request: ScanRequest { sequence: None, projection: Some([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]), filters: [Expr { df_expr: BinaryExpr(BinaryExpr { left: Column(Column { relation: None, name: "ts" }), op: GtEq, right: Literal(TimestampMillisecond(1686612931000, None)) }) }, Expr { df_expr: BinaryExpr(BinaryExpr { left: Column(Column { relation: None, name: "ts" }), op: Lt, right: Literal(TimestampMillisecond(1686656131000, None)) }) }], output_ordering: None, limit: None }, memtables: 0, ssts_to_read: 1, total_ssts: 5

How can we reproduce the bug?

Creates this table

CREATE TABLE IF NOT EXISTS `cpu` (
  `hostname` STRING NULL,
  `region` STRING NULL,
  `datacenter` STRING NULL,
  `rack` STRING NULL,
  `os` STRING NULL,
  `arch` STRING NULL,
  `team` STRING NULL,
  `service` STRING NULL,
  `service_version` STRING NULL,
  `service_environment` STRING NULL,
  `usage_user` BIGINT NULL,
  `usage_system` BIGINT NULL,
  `usage_idle` BIGINT NULL,
  `usage_nice` BIGINT NULL,
  `usage_iowait` BIGINT NULL,
  `usage_irq` BIGINT NULL,
  `usage_softirq` BIGINT NULL,
  `usage_steal` BIGINT NULL,
  `usage_guest` BIGINT NULL,
  `usage_guest_nice` BIGINT NULL,
  `ts` TIMESTAMP(3) NOT NULL,
  TIME INDEX (`ts`),
  PRIMARY KEY (`hostname`, `region`, `datacenter`, `rack`, `os`, `arch`, `team`, `service`, `service_version`, `service_environment`)
)

Inserts some data and run

SELECT mean(usage_user), date_trunc('hour', ts) from cpu where ts >= '2023-06-12T23:35:31Z' and ts < '2023-06-13T11:35:31Z' group by date_trunc('hour', ts),hostname;

You can see all columns are selected in 0.5.0-nightly. But in 0.4.2, we only select necessary columns.

@waynexia
Copy link
Member

Upstream fix: apache/datafusion#8119

@killme2008
Copy link
Contributor

Looks like apache/datafusion#8119 already merged, can we upgrade the datafusion to fix this issue? @waynexia

@killme2008
Copy link
Contributor

Upstream fix: apache/arrow-datafusion#8119

Please don't forget to upgrade the datafusion.

@tisonkun
Copy link
Collaborator

@waynexia did we upgrade datafusion? Or it'll be fixed by #3554.

@waynexia
Copy link
Member

This is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category Bugs
Projects
None yet
Development

No branches or pull requests

4 participants