Disable checkpointing #2124

abcpro1 · 2023-10-04T19:40:10Z

Disable checkpointing for now until we make it use less disk space.

Persisting pipeline processors to disk currently uses a lot of disk space, especially when checkpointing frequency is high; for example when `max_num_records_before_persist` is set to a low value.

Jesse-Bakker · 2023-10-05T06:55:53Z

dozer-core/src/forwarder.rs

@@ -182,7 +182,8 @@ impl SourceChannelManager {

    fn should_participate_in_commit(&self) -> bool {
        self.num_uncommitted_ops >= self.commit_sz
-            || self
+            || self.num_uncommitted_ops > 0


We can tweak this already with a self.commit_sz > 0

Here I targeted the else branch, which is time based.
Basically, when time since last commit is greater than max_duration_between_commits; commit, but only if there is something to commit.
It's a minor fix, but I had it locally, and I thought I should push it.

Ah, I see what it does now. Makes sense 👍

This is really subtle. Let me try to explain.

To generate a Commit message, all the sources have to agree on the system's "state", which is represented with the SourceStates type.
The EpochManager is the synchronization point where all sources communicate. Every source calls EpochManager::wait_for_epoch_close with its own state, and the EpochManager aggregates all source states, performing and on the termination request (second parameter), performing or on the commit request (third parameter), meaning that a Terminate message is sent iif all sources want to terminate, and a Commit message is sent iif any source wants to commit.

The EpochManager being a synchronization point implies that every source must call EpochManager::wait_for_epoch_close every now and then to unblock the progress of other sources. Imagine that we have two sources, A and B. A hasn't received any new ops, and B received some. If A doesn't call EpochManager::wait_for_epoch_close, B will call it and wait for A to call it forever. The pipeline won't be able to make any progress.

So we shouldn't add this line. The information is not lost though. self.num_uncommitted_ops > 0 is passed as the third parameter of EpochManager::wait_for_epoch_close.

Jesse-Bakker · 2023-10-05T06:59:38Z

dozer-core/src/checkpoint/mod.rs

+            return Err(ExecutionError::CannotRestart);
+        }
+
+        #[cfg(FIXME_CHECKPOINTING)]


We might still write the checkpoints, which means this will fail to restart every time

Yes, we need to fix the restart issue separately. Once we settle on a solution, I will open a pull request for it.

I think I'd prefer to solve the restart issue in this pull request as well.

Jesse-Bakker · 2023-10-05T07:42:59Z

dozer-core/src/forwarder.rs

@@ -182,7 +182,8 @@ impl SourceChannelManager {

    fn should_participate_in_commit(&self) -> bool {
        self.num_uncommitted_ops >= self.commit_sz
-            || self
+            || self.num_uncommitted_ops > 0


Ah, I see what it does now. Makes sense 👍

chubei · 2023-10-18T13:31:39Z

Not needed anymore

abcpro1 requested a review from Jesse-Bakker October 4, 2023 19:40

abcpro1 force-pushed the disable-checkpointing branch from 2a8004a to eaa320a Compare October 4, 2023 22:03

abcpro1 added 2 commits October 4, 2023 22:12

fix: disable checkpointing

5851b45

Persisting pipeline processors to disk currently uses a lot of disk space, especially when checkpointing frequency is high; for example when `max_num_records_before_persist` is set to a low value.

fix: only commit when there is something to commit

5f3ccf0

abcpro1 force-pushed the disable-checkpointing branch from eaa320a to 5f3ccf0 Compare October 4, 2023 22:12

Jesse-Bakker reviewed Oct 5, 2023

View reviewed changes

Jesse-Bakker approved these changes Oct 5, 2023

View reviewed changes

chubei closed this Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable checkpointing #2124

Disable checkpointing #2124

abcpro1 commented Oct 4, 2023

Jesse-Bakker Oct 5, 2023

abcpro1 Oct 5, 2023

Jesse-Bakker Oct 5, 2023

chubei Oct 9, 2023

Jesse-Bakker Oct 5, 2023

abcpro1 Oct 5, 2023

abcpro1 Oct 6, 2023 •

edited

Loading

Jesse-Bakker Oct 5, 2023

chubei commented Oct 18, 2023

Disable checkpointing #2124

Disable checkpointing #2124

Conversation

abcpro1 commented Oct 4, 2023

Jesse-Bakker Oct 5, 2023

Choose a reason for hiding this comment

abcpro1 Oct 5, 2023

Choose a reason for hiding this comment

Jesse-Bakker Oct 5, 2023

Choose a reason for hiding this comment

chubei Oct 9, 2023

Choose a reason for hiding this comment

Jesse-Bakker Oct 5, 2023

Choose a reason for hiding this comment

abcpro1 Oct 5, 2023

Choose a reason for hiding this comment

abcpro1 Oct 6, 2023 • edited Loading

Choose a reason for hiding this comment

Jesse-Bakker Oct 5, 2023

Choose a reason for hiding this comment

chubei commented Oct 18, 2023

abcpro1 Oct 6, 2023 •

edited

Loading