-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate task lifecycle from kubernetes/location lifecycle #15133
Conversation
...verlord-extensions/src/test/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerTest.java
Fixed
Show fixed
Hide fixed
return tasks.computeIfAbsent(task.getId(), k -> new KubernetesWorkItem(task, exec.submit(() -> runTask(task)))) | ||
.getResult(); | ||
return tasks.computeIfAbsent(task.getId(), k -> { | ||
ListenableFuture<TaskStatus> unused = exec.submit(() -> runTask(task)); |
Check notice
Code scanning / CodeQL
Unread local variable Note
return tasks.computeIfAbsent(task.getId(), k -> new KubernetesWorkItem(task, exec.submit(() -> joinTask(task)))) | ||
.getResult(); | ||
return tasks.computeIfAbsent(task.getId(), k -> { | ||
ListenableFuture<TaskStatus> unused = exec.submit(() -> joinTask(task)); |
Check notice
Code scanning / CodeQL
Unread local variable Note
...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
Show resolved
Hide resolved
indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractTask.java
Show resolved
Hide resolved
...overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesPeonLifecycle.java
Show resolved
Hide resolved
...etes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesWorkItem.java
Show resolved
Hide resolved
+1 the approach makes sense to me. Some small code suggestions.
With this change, can you test a few scenarios where a peon is killed directly from k8s while it is still processing data:
|
...ing-service/src/main/java/org/apache/druid/indexing/common/actions/UpdateLocationAction.java
Show resolved
Hide resolved
} | ||
|
||
shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can shutdown()
throw an exception? since it is making a call to the kubernetes client. If so, the stopTask should probably be in it's own finally block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i decided to group saveLogs and shutdown together since they are both k8s lifecycle cleanup actions (it is okay if one fails), and then moved stopTask to a finally block b/c it has to happen
@@ -172,10 +176,12 @@ private TaskStatus joinTask(Task task) | |||
@VisibleForTesting | |||
protected TaskStatus doTask(Task task, boolean run) | |||
{ | |||
TaskStatus taskStatus = TaskStatus.failure(task.getId(), "Task execution never started"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit weird to initialize status with a failure one, don't think we need it.
|
||
for (Map.Entry<String, KubernetesWorkItem> entry : tasks.entrySet()) { | ||
if (entry.getValue().isRunning()) { | ||
TaskRunnerUtils.notifyLocationChanged( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi I don't see the listener from supervisor is doing much work.
https://github.com/apache/druid/blob/28.0.0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java#L1663
i tested these situations and confirmed in the batch/compact case the subtasks/parent task fails as expected and no segments are written. for streaming the supervisor starts a new task with the original offeset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
With this patch task duration is being reported as -1 for successful tasks. |
…pache#15133)" This reverts commit dc0b163.
…pache#15133)" (apache#15346) This reverts commit dc0b163.
Description
When running higher volumes of ingestion on the KubernetesTaskRunner (especially streaming) there are some issues caused by the difference between the Kubernetes lifecycle (pod startup and completion) and the Druid Task lifecycle (when a peon JVM has spun up and is ready to serve requests and when it has shut down)
During streaming task startup, in AbstractTask.setup, the getChatHandlerProvider gets registered after the UpdateLocation action submission. This can cause issues if there is a lot of load on the overlord because the task will get stuck retrying these /action submissions even though its chat handler has not been registered and the supervisor can't actually communicate with the task yet.
Similarly, the UpdateLocation action during AbstractTask.cleanUp also frequently causes issues during streaming task cleanup when there is a lot of load on the overlord. The cleanUp method is called after the chat handler provider is deregistered, so when the task gets stuck doing cleanup, there is a risk of the supervisor trying to chat with the task while it is in the process of existing.
-On larger Kubernetes clusters, it can take a while for K8s to report that a pod has successfully exited, meaning there can be a significant lag between when a peon JVM exits and when the KubernetesTaskRunner can report a task as completed. In general this slows down how quickly tasks can be reported as successful and can also cause similar issues to the above UpdateLocation actions with streaming tasks.
There is a tradeoff between having the peon hit the /action endpoint on the overlord with UpdateStatusAction and UpdateLocationAction to give the K8s task runner a more accurate account of where the peon is in the task lifecycle vs the time/chance of failure that these requests add.
My overall approach was to let the KubernetesTaskRunner/KubernetesPeonLifeycle (stuff running on the overlord) handle the Kubernetes/TaskLocation lifecycle, but have the peon be directly responsible for the task lifecycle by using the UpdateStatusAction as a way to mark the task future as complete.
Following this approach, I made two significant changes
I also made a few small cleanup changes
Release note
Key changed/added classes in this PR
KubernetesTaskRunner
KubernetesPeonLifecycle
AbstractTask
KubernetesWorkItem
This PR has: