Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support separate replicate executor from arrangement backfill #13207

Closed
kwannoel opened this issue Nov 2, 2023 · 7 comments
Closed

Support separate replicate executor from arrangement backfill #13207

kwannoel opened this issue Nov 2, 2023 · 7 comments

Comments

@kwannoel
Copy link
Contributor

kwannoel commented Nov 2, 2023

Background of Replication

Replication is required for arrangement backfill. See https://www.notion.so/risingwave-labs/Arrangement-Backfill-31d4d0044726451c9cc77dc84a0f8bb2#5fc32752d4934aaaab6f739519169a42

  1. We could lose data which was flushed to shared buffer during non-checkpoint barrier, if it's not replicated.
  2. Consider that we do snapshot read on committed data.
  3. But upstream's shared buffer also contains some uncommitted data.
  4. The uncommitted data won't be included in backfill's snapshot read.
  5. So it MUST process it from updates.
  6. Consider the case where snapshot read is from pos = 3 onwards. This means pos < 3 has been backfilled.
  7. If uncommitted data is pos >= 3, it will not have been processed in updates, since updates would only process pos < 3 for backfill, i.e. updates on backfilled data. Backfill expects to read this uncommitted data via snapshot read.
  8. This means the uncommitted data for pos >= 3 is lost.

Motivations

Currently the replication logic is coupled with arrangement backfill.

The main thing is that we also make arrangement backfill much more maintainable. Now it could potentially just reuse the no_shuffle_backfill executor. Only the logic from the dispatcher needs to be changed to handle the scaling aspect.

If we decouple it, we could potentially share the state table associated with the replicate executor to do non-checkpoint reads for other executors (not sure about precise usecase yet).

Issues

Stopping Replication when backfill complete

Previously, the key issue is that replicate executor needs to stop replication when backfill is complete to avoid consuming memory. However, meta can see that once backfill is complete, it should schedule a config change, which can stop corresponding replicate executor from replicating, if there's no other dependencies on it.

Then replicate executor should just forward updates downstream.

No Shuffle Backfill

No shuffle backfill needs to support vnode level state to be compatible with arrangement backfill. We can use separate executors first, since logic is implemented in arrangement backfill.

Compatibility with the table being replicated (target table)

This is actually independent since this same issue affects replication logic, whether it is part of arrangement backfill or not. I'm just including it here for clarity.

Replicate will maintain its shared buffer separately from the target table.
The data in the shared buffer needs to follow the same schema as the table being replicated, due to correctness, when the replicated data (from replicated table) + committed data (from target table) gets merged.

The backfilling stream job could select a subset of columns. In that case, replicate will need to fill in NULL for missing columns.

@github-actions github-actions bot added this to the release-1.5 milestone Nov 2, 2023
@chenzl25
Copy link
Contributor

chenzl25 commented Nov 2, 2023

If we separate the replicate executor from the arrangement backfill, we have a chance to share the no-shuffle backfill executor between these two scenarios. That is the benefit. My concern is that if we separate them, we will also separate the ReplicatedStateTable write path and the read path. The write path will be in the replicate executor and the read path will be in the backfill executor. It will need some development effort, but I talked with @wenym1 , and it is not difficult. BTW, for the Stopping Replication when backfill complete part, we can use a backfill Manager to let backfill executor to tell the replicate executor stop as soon as backfill finished.

@kwannoel
Copy link
Contributor Author

kwannoel commented Nov 2, 2023

The write path will be in the replicate executor and the read path will be in the backfill executor. It will need some development effort

What will be the development effort required? That's a good point, I didn't think of it.

BTW, for the Stopping Replication when backfill complete part, we can use a backfill Manager to let backfill executor to tell the replicate executor stop as soon as backfill finished.

Oh I see, like replicate executor subscribes to the backfill manager. Then when receive notification to stop, it will stop?

+1 for it, no need to couple it with barrier is best.

@chenzl25
Copy link
Contributor

chenzl25 commented Nov 2, 2023

What will be the development effort required? That's a good point, I didn't think of it.

I think we need to provide a way to let different executors refer to the same state table so that they can write and read the same data.

Oh I see, like replicate executor subscribes to the backfill manager. Then when receive notification to stop, it will stop?
+1 for it, no need to couple it with barrier is best.

Yes. Don't rely on barrier is best. Using channel is possible, since these executors are in the same CN.

@kwannoel
Copy link
Contributor Author

kwannoel commented Nov 2, 2023

I think we need to provide a way to let different executors refer to the same state table so that they can write and read the same data.

Is that not currently possible? 🤔 I'm thinking that the replicated state table can already be used as is.
It's more for safety, perhaps we should only allow reads on the state table within backfill executor, and never allow it to write to the replicate state table.

Yes. Don't rely on barrier is best. Using channel is possible, since these executors are in the same CN.

Oh so this backfill manager is local to the CN? And will manage all the backfill jobs for it?

Using channel is possible, since these executors are in the same CN.

You're right. Backfill manager may not even be necessary here. I'm thinking if there's some way to directly create a channel between the executors. When CN builds the actors I suppose that's when it should instantiate the channel.

@kwannoel
Copy link
Contributor Author

kwannoel commented Nov 2, 2023

I think we need to provide a way to let different executors refer to the same state table so that they can write and read the same data.

Is that not currently possible? 🤔 I'm thinking that the replicated state table can already be used as is. It's more for safety, perhaps we should only allow reads on the state table within backfill executor, and never allow it to write to the replicate state table.

Oh I guess there could be some in-memory state not shared.

Separate Read / Write paths should not be an issue, LocalStateStore uses a r/w lock for its read version.

@chenzl25
Copy link
Contributor

chenzl25 commented Nov 3, 2023

I mean use BackfillManager to manage these 2 executors, it can give them the channel when these executors register to the manager.

@kwannoel kwannoel self-assigned this Nov 9, 2023
@kwannoel kwannoel removed this from the release-1.5 milestone Dec 4, 2023
Copy link
Contributor

github-actions bot commented Jul 3, 2024

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

@kwannoel kwannoel closed this as not planned Won't fix, can't repro, duplicate, stale Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants