You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hummock manager may cancel the compact tasks in two situations
Compactor HB reported no progress task and was canceled after expired_time.
Not in the compactor HB list, canceled by the background thread after expired_time.
The current default expire_time value is 1min. In other words, when the object store operation takes more than 1min, the task will expire. That is unreasonable and will cause the task to fall into a cycle of execution and cancellation.
Improve
After #10584, MonitoredObjectStore will ensure that the operation duration does not exceed the timeout configured in the config. Consider inferring HB timeout via Operation Timeout config.
The text was updated successfully, but these errors were encountered:
The most obvious reason for the problem described in the issue is that we don't distinguish between process and heartbeat timeouts.
process_timeout: The compactor reports the task HB, but the task has no progress.
heartbeat_timeout: The compactor reports the task HB.
Once we have distinguished between the two concepts, we can decouple the two behaviors of cancel
compactor HB only cancels tasks that belong to it and have no progress.
the background thread only cancels tasks that are no longer reported to the HB.
After an offline discussion with Zheng, we think that can use an additional HB timeout config to achieve the above. We can calculate a reasonable process timeout interval based on the object store timeout config.
Besides, after introducing SkipWatermarkIterator, batch skipping operations that do not satisfy the watermark key may result in an inaccurate num_process_key, so we will use num_io to identify the progress of the compact task.
Besides, after introducing SkipWatermarkIterator, batch skipping operations that do not satisfy the watermark key may result in an inaccurate num_process_key, so we will use num_io to identify the progress of the compact task.
I think the problem here is not about using num_process_key but about not counting num_process_key correctly when skip operation happens. By swtiching to num_io, do you mean tracking the real S3 I/O? If yes, is it possible that read I/O remains unchanged for a while due to prefetch and write I/O remains unchanged for a while due to skip operations?
Background
Hummock manager may cancel the compact tasks in two situations
The current default expire_time value is
1min
. In other words, when the object store operation takes more than1min
, the task will expire. That is unreasonable and will cause the task to fall into a cycle of execution and cancellation.Improve
After #10584,
MonitoredObjectStore
will ensure that the operation duration does not exceed the timeout configured in the config. Consider inferring HB timeout via Operation Timeout config.The text was updated successfully, but these errors were encountered: