Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: reduce the size of stored streaming metadata to avoid too large request to etcd #11072

Closed
yezizp2012 opened this issue Jul 19, 2023 · 7 comments
Assignees
Milestone

Comments

@yezizp2012
Copy link
Member

yezizp2012 commented Jul 19, 2023

Is your feature request related to a problem? Please describe.

Related #7728

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@github-actions github-actions bot added this to the release-1.1 milestone Jul 19, 2023
@yezizp2012 yezizp2012 self-assigned this Jul 19, 2023
@BugenZhao
Copy link
Member

BugenZhao commented Jul 20, 2023

It looks like the large request is mainly caused by duplication of the NodeBody for a fragment with too many parallelisms (actors). I guess this can be resolved by introducing a (de)compression middleware for the protobuf values actually stored.


Can a Protobuf message begin with a gzip magic number?

@yezizp2012
Copy link
Member Author

It looks like the large request is mainly caused by duplication of the NodeBody for a fragment with too many parallelisms (actors). I guess this can be resolved by introducing a (de)compression middleware for the protobuf values actually stored.

Yes exactly. Previously I'm thinking of introducing a new field in TableFragment that stores only one copy of the NodeBody data and necessary distribution info, which can be used to generate the real running StreamActor. By simply adding some transform logic in meta, the duplicate part of NodeBody can be reduced. Add non-related to this issue, this will also pay off for switching to sql backend later on. WDUT?

@yezizp2012
Copy link
Member Author

The issue will not exist after SQL-backend meta store. Keep it open for record.

@yezizp2012 yezizp2012 removed this from the release-1.4 milestone Nov 8, 2023
@yezizp2012 yezizp2012 assigned shanicky and unassigned yezizp2012 Nov 23, 2023
@yezizp2012
Copy link
Member Author

It looks like the large request is mainly caused by duplication of the NodeBody for a fragment with too many parallelisms (actors). I guess this can be resolved by introducing a (de)compression middleware for the protobuf values actually stored.

Yes exactly. Previously I'm thinking of introducing a new field in TableFragment that stores only one copy of the NodeBody data and necessary distribution info, which can be used to generate the real running StreamActor. By simply adding some transform logic in meta, the duplicate part of NodeBody can be reduced. Add non-related to this issue, this will also pay off for switching to sql backend later on. WDUT?

@shanicky is working on integration the transform logic to current TableFragments implementation, which was introduced for sql backends. The PR is otw.

@shanicky
Copy link
Contributor

shanicky commented Dec 4, 2023

I submitted a simple PR #13598 to implement compression for PbTableFragments to reduce redundant stream nodes in RPC processes and etcd storage.

@yezizp2012
Copy link
Member Author

Done by #17315 ?

@BugenZhao
Copy link
Member

Done by #17315 ?

It's a quick solution, but I believe it should be enough for the goal in this issue. 😄 Feel free to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants