feat: reduce the size of stored streaming metadata to avoid too large request to etcd #11072

yezizp2012 · 2023-07-19T11:38:33Z

Is your feature request related to a problem? Please describe.

Related #7728

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

BugenZhao · 2023-07-20T03:52:52Z

It looks like the large request is mainly caused by duplication of the NodeBody for a fragment with too many parallelisms (actors). I guess this can be resolved by introducing a (de)compression middleware for the protobuf values actually stored.

Can a Protobuf message begin with a gzip magic number?

yezizp2012 · 2023-07-20T04:00:08Z

It looks like the large request is mainly caused by duplication of the NodeBody for a fragment with too many parallelisms (actors). I guess this can be resolved by introducing a (de)compression middleware for the protobuf values actually stored.

Yes exactly. Previously I'm thinking of introducing a new field in TableFragment that stores only one copy of the NodeBody data and necessary distribution info, which can be used to generate the real running StreamActor. By simply adding some transform logic in meta, the duplicate part of NodeBody can be reduced. Add non-related to this issue, this will also pay off for switching to sql backend later on. WDUT?

yezizp2012 · 2023-10-10T05:37:30Z

The issue will not exist after SQL-backend meta store. Keep it open for record.

yezizp2012 · 2023-11-23T06:34:48Z

It looks like the large request is mainly caused by duplication of the NodeBody for a fragment with too many parallelisms (actors). I guess this can be resolved by introducing a (de)compression middleware for the protobuf values actually stored.

Yes exactly. Previously I'm thinking of introducing a new field in TableFragment that stores only one copy of the NodeBody data and necessary distribution info, which can be used to generate the real running StreamActor. By simply adding some transform logic in meta, the duplicate part of NodeBody can be reduced. Add non-related to this issue, this will also pay off for switching to sql backend later on. WDUT?

@shanicky is working on integration the transform logic to current TableFragments implementation, which was introduced for sql backends. The PR is otw.

shanicky · 2023-12-04T08:15:18Z

I submitted a simple PR #13598 to implement compression for PbTableFragments to reduce redundant stream nodes in RPC processes and etcd storage.

yezizp2012 · 2024-07-05T08:45:05Z

Done by #17315 ?

BugenZhao · 2024-07-05T08:52:33Z

Done by #17315 ?

It's a quick solution, but I believe it should be enough for the goal in this issue. 😄 Feel free to close it.

yezizp2012 added the type/feature label Jul 19, 2023

github-actions bot added this to the release-1.1 milestone Jul 19, 2023

yezizp2012 self-assigned this Jul 19, 2023

yezizp2012 added component/meta Meta related issue. priority/high labels Jul 19, 2023

lmatz mentioned this issue Jul 28, 2023

Create ch benchmark q5 MV failed #11293

Closed

yezizp2012 modified the milestones: release-1.1, release-1.2 Aug 3, 2023

yezizp2012 modified the milestones: release-1.2, release-1.3 Sep 11, 2023

yezizp2012 modified the milestones: release-1.3, release-1.4 Oct 10, 2023

yezizp2012 removed this from the release-1.4 milestone Nov 8, 2023

yezizp2012 assigned shanicky and unassigned yezizp2012 Nov 23, 2023

shanicky added this to the release-1.6 milestone Dec 4, 2023

BugenZhao mentioned this issue Dec 4, 2023

feat: Add compression capabilities for StreamNode in PbTableFragments. #13598

Open

3 tasks

yezizp2012 modified the milestones: release-1.6, release-1.7 Jan 9, 2024

yezizp2012 modified the milestones: release-1.7, release-1.8 Mar 6, 2024

shanicky modified the milestones: release-1.8, future-release-1.9 Apr 8, 2024

shanicky modified the milestones: release-1.9, future-release-1.10 May 8, 2024

BugenZhao mentioned this issue Jun 18, 2024

feat(meta): compress the encoded model if it's too large in kv meta store #17315

Merged

4 tasks

yezizp2012 closed this as completed Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: reduce the size of stored streaming metadata to avoid too large request to etcd #11072

feat: reduce the size of stored streaming metadata to avoid too large request to etcd #11072

yezizp2012 commented Jul 19, 2023 •

edited

Loading

BugenZhao commented Jul 20, 2023 •

edited

Loading

yezizp2012 commented Jul 20, 2023

yezizp2012 commented Oct 10, 2023

yezizp2012 commented Nov 23, 2023

shanicky commented Dec 4, 2023

yezizp2012 commented Jul 5, 2024

BugenZhao commented Jul 5, 2024

feat: reduce the size of stored streaming metadata to avoid too large request to etcd #11072

feat: reduce the size of stored streaming metadata to avoid too large request to etcd #11072

Comments

yezizp2012 commented Jul 19, 2023 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

BugenZhao commented Jul 20, 2023 • edited Loading

yezizp2012 commented Jul 20, 2023

yezizp2012 commented Oct 10, 2023

yezizp2012 commented Nov 23, 2023

shanicky commented Dec 4, 2023

yezizp2012 commented Jul 5, 2024

BugenZhao commented Jul 5, 2024

yezizp2012 commented Jul 19, 2023 •

edited

Loading

BugenZhao commented Jul 20, 2023 •

edited

Loading