Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate what is the actual bottleneck in hash agg processing for dirty groups #18748

Closed
kwannoel opened this issue Sep 28, 2024 · 2 comments
Assignees
Milestone

Comments

@kwannoel
Copy link
Contributor

kwannoel commented Sep 28, 2024

It doesn't seem to be heap or cpu bottleneck. So what is the actual bottleneck, is it IO cost, due to lookups? If so we need a metric for it.

Or is it skew? because in some scenarios, the workload peaks at 1600%. But we have 32 cores.

Needs further investigation.

@github-actions github-actions bot added this to the release-2.1 milestone Sep 28, 2024
@kwannoel kwannoel self-assigned this Sep 28, 2024
@kwannoel
Copy link
Contributor Author

kwannoel commented Oct 1, 2024

Some workloads to test:

  1. What happens when a large number of existing agg groups get updated.
  2. What happens when a large number of new agg groups are created.
  3. Does it change according to cache configurations.
  4. Test first_value agg.
  5. Make sure to use minio rate limit configuration, to simulate latency when fetching from aws s3.

Measurements:

  1. CPU use.
  2. Heap use.
  3. Cache Miss.
  4. Actor Idle.

@kwannoel
Copy link
Contributor Author

kwannoel commented Dec 4, 2024

RC optimized group top n. This fixed the issue.

@kwannoel kwannoel closed this as completed Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant