Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stream): change approx_percentile to accept an array of quantiles rather than a single one #18069

Closed
wants to merge 6 commits into from

Conversation

kwannoel
Copy link
Contributor

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

We can share state for approx_percentile to save space. To achieve this, we change the implementation of approx_percentile into approx_percentile(quantiles double[], relative_error double) within group (order by COLUMN).

In a future release we can support approx_percentile(quantile double, double) via the optimizer + project to make it more user friendly.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@kwannoel
Copy link
Contributor Author

Porbbaly won't merge.

@kwannoel
Copy link
Contributor Author

kwannoel commented Aug 16, 2024

My thoughts from internal discussions:

I’ve thought about it further. I don’t think we need to reuse state, because the bottleneck is hardly on the storage side typically. Rather it is in the cache implementation. When user have 100K buckets for instance, the iteration can become expensive. So it’s a matter of optimizing the cache.

It doesn’t seem as though user will complain about state reuse. We have other cases with large state which are not shared, e.g. min/max/approx count as well.

This is because the interface of approx_percentile now becomes complicated for the user. They have to declare an array of quantiles, rather than a single quantile, and have to do the projection manually. Sure we can let the optimizer rewrite separate approx percentiles into project + approx percentile array agg. But then it leads to complexity in the FE.
Down the road, if space really becomes a concern for the user, it’s okay to introduce approx_percentile_array_agg at that point in time, to let them have state reuse.

To summarize there’s two drawbacks of making approx_percentile support state sharing:

  • Either we make the user interface complicated (approx percentile with an array of percentiles)
  • Or we make the optimizer complicated (have to solve approx percentile and make them share state, and use project to rewrite the plan)

The only benefit is saving space occupied by exponential buckets, which is most cases is not much. Approx Percentile computation is not bottlenecked by state sharing.

This trade-off doesn’t seem worth it to me.

Copy link
Contributor

This PR has been open for 60 days with no activity.

If it's blocked by code review, feel free to ping a reviewer or ask someone else to review it.

If you think it is still relevant today, and have time to work on it in the near future, you can comment to update the status, or just manually remove the no-pr-activity label.

You can also confidently close this PR to keep our backlog clean. (If no further action taken, the PR will be automatically closed after 7 days. Sorry! 🙏)
Don't worry if you think the PR is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

Copy link
Contributor

Close this PR as there's no further actions taken after it is marked as stale for 7 days. Sorry! 🙏

You can reopen it when you have time to continue working on it.

@github-actions github-actions bot closed this Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant