Separate task lifecycle from kubernetes/location lifecycle #15133

georgew5656 · 2023-10-11T18:50:27Z

Description

When running higher volumes of ingestion on the KubernetesTaskRunner (especially streaming) there are some issues caused by the difference between the Kubernetes lifecycle (pod startup and completion) and the Druid Task lifecycle (when a peon JVM has spun up and is ready to serve requests and when it has shut down)

During streaming task startup, in AbstractTask.setup, the getChatHandlerProvider gets registered after the UpdateLocation action submission. This can cause issues if there is a lot of load on the overlord because the task will get stuck retrying these /action submissions even though its chat handler has not been registered and the supervisor can't actually communicate with the task yet.
Similarly, the UpdateLocation action during AbstractTask.cleanUp also frequently causes issues during streaming task cleanup when there is a lot of load on the overlord. The cleanUp method is called after the chat handler provider is deregistered, so when the task gets stuck doing cleanup, there is a risk of the supervisor trying to chat with the task while it is in the process of existing.

-On larger Kubernetes clusters, it can take a while for K8s to report that a pod has successfully exited, meaning there can be a significant lag between when a peon JVM exits and when the KubernetesTaskRunner can report a task as completed. In general this slows down how quickly tasks can be reported as successful and can also cause similar issues to the above UpdateLocation actions with streaming tasks.

There is a tradeoff between having the peon hit the /action endpoint on the overlord with UpdateStatusAction and UpdateLocationAction to give the K8s task runner a more accurate account of where the peon is in the task lifecycle vs the time/chance of failure that these requests add.

My overall approach was to let the KubernetesTaskRunner/KubernetesPeonLifeycle (stuff running on the overlord) handle the Kubernetes/TaskLocation lifecycle, but have the peon be directly responsible for the task lifecycle by using the UpdateStatusAction as a way to mark the task future as complete.

Following this approach, I made two significant changes

I removed all the UpdateLocationAction calls in the peon and let the k8s task runner handle managing the tasks location itself. Specifically, the k8s task runner now notifies listeners that a task's location has changed when the k8s pod gets its ip and when the k8s pod lifecycle completes.
I separated the k8s lifecycle logic from the druid task lifecycle logic. The exec service in KubernetesTaskRunner is still responsible for submitting a K8s job for a task, waiting for the pod to come up, and then deleting the pod when it completes, but run(Task) no longer returns this future as the status of the Task. Instead, run(Task) returns a settable future that gets completed either when the Kubernetes lifecycle completes or when the UpdateStatusAction is sent from the peon. This means that tasks will now complete when the peon finishes its cleanup and notifies the k8s task runner that it has completed.

I also made a few small cleanup changes

registerListener should call notifyStatus on the new listener for running tasks.
Removed the PeonPhase class since it wasn't serving any purpose.

Release note

Cleanup lifecycle management of tasks in mm-less task scheduling

Key changed/added classes in this PR

KubernetesTaskRunner
KubernetesPeonLifecycle
AbstractTask
KubernetesWorkItem

This PR has:

...verlord-extensions/src/test/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerTest.java

...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java

-      return tasks.computeIfAbsent(task.getId(), k -> new KubernetesWorkItem(task, exec.submit(() -> runTask(task))))
-                  .getResult();
+      return tasks.computeIfAbsent(task.getId(), k -> {
+        ListenableFuture<TaskStatus> unused = exec.submit(() -> runTask(task));


...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java

-      return tasks.computeIfAbsent(task.getId(), k -> new KubernetesWorkItem(task, exec.submit(() -> joinTask(task))))
-                  .getResult();
+      return tasks.computeIfAbsent(task.getId(), k -> {
+        ListenableFuture<TaskStatus> unused = exec.submit(() -> joinTask(task));


...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java

indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractTask.java

...overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesPeonLifecycle.java

...etes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesWorkItem.java

suneet-s · 2023-10-13T18:12:24Z

+1 the approach makes sense to me. Some small code suggestions.

but have the peon be directly responsible for the task lifecycle by using the UpdateStatusAction as a way to mark the task future as complete.

With this change, can you test a few scenarios where a peon is killed directly from k8s while it is still processing data:

peon running a query_worker task
peon running a query_controller task
peon for a supervisor task (either kafka or kinesis should be fine)
peon running a compact task
peon running an index_parallel task
peon running a compact / index_parallel subtask (like partial_dimension_distribution or something)

...ing-service/src/main/java/org/apache/druid/indexing/common/actions/UpdateLocationAction.java

suneet-s · 2023-10-16T17:50:34Z

...overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesPeonLifecycle.java

      }
-
+      shutdown();


Can shutdown() throw an exception? since it is making a call to the kubernetes client. If so, the stopTask should probably be in it's own finally block

i decided to group saveLogs and shutdown together since they are both k8s lifecycle cleanup actions (it is okay if one fails), and then moved stopTask to a finally block b/c it has to happen

YongGang · 2023-10-16T18:51:33Z

...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java

@@ -172,10 +176,12 @@ private TaskStatus joinTask(Task task)
  @VisibleForTesting
  protected TaskStatus doTask(Task task, boolean run)
  {
+    TaskStatus taskStatus = TaskStatus.failure(task.getId(), "Task execution never started");


A bit weird to initialize status with a failure one, don't think we need it.

YongGang · 2023-10-16T19:15:02Z

...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java

+
+    for (Map.Entry<String, KubernetesWorkItem> entry : tasks.entrySet()) {
+      if (entry.getValue().isRunning()) {
+        TaskRunnerUtils.notifyLocationChanged(


fyi I don't see the listener from supervisor is doing much work.
https://github.com/apache/druid/blob/28.0.0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java#L1663

georgew5656 · 2023-10-16T22:13:17Z

+1 the approach makes sense to me. Some small code suggestions.

but have the peon be directly responsible for the task lifecycle by using the UpdateStatusAction as a way to mark the task future as complete.

With this change, can you test a few scenarios where a peon is killed directly from k8s while it is still processing data:

peon running a query_worker task

peon running a query_controller task

peon for a supervisor task (either kafka or kinesis should be fine)

peon running a compact task

peon running an index_parallel task

peon running a compact / index_parallel subtask (like partial_dimension_distribution or something)

i tested these situations and confirmed in the batch/compact case the subtasks/parent task fails as expected and no segments are written. for streaming the supervisor starts a new task with the original offeset

suneet-s

👍

cryptoe · 2023-11-08T13:50:42Z

With this patch task duration is being reported as -1 for successful tasks.
@georgew5656 Can you please confirm if you are seeing this issue ?

…pache#15133)" This reverts commit dc0b163.

…15133)" (#15346) This reverts commit dc0b163.

) * Separate k8s and druid task lifecycles * Remove extra log lines * Fix unit tests * fix unit tests * Fix unit tests * notify listeners on task completion * Fix unit test * unused var * PR changes * Fix unit tests * Fix checkstyle * PR changes

…pache#15133)" (apache#15346) This reverts commit dc0b163.

georgew5656 added 6 commits October 9, 2023 09:30

Separate k8s and druid task lifecycles

0a7db67

Remove extra log lines

107bf8f

Fix unit tests

d7b82cf

fix unit tests

7962f27

Fix unit tests

f4f3309

notify listeners on task completion

b0af37c

github-actions bot added Kubernetes Area - Ingestion labels Oct 11, 2023

github-advanced-security bot found potential problems Oct 11, 2023

View reviewed changes

...verlord-extensions/src/test/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerTest.java Fixed Show fixed Hide fixed

georgew5656 added 2 commits October 11, 2023 15:54

Fix unit test

7f68cb3

unused var

5bc3b11

github-advanced-security bot found potential problems Oct 12, 2023

View reviewed changes

georgew5656 requested a review from suneet-s October 12, 2023 16:58

suneet-s reviewed Oct 13, 2023

View reviewed changes

...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java Show resolved Hide resolved

indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractTask.java Show resolved Hide resolved

suneet-s reviewed Oct 13, 2023

View reviewed changes

...overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesPeonLifecycle.java Show resolved Hide resolved

suneet-s reviewed Oct 13, 2023

View reviewed changes

...etes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesWorkItem.java Show resolved Hide resolved

georgew5656 added 3 commits October 13, 2023 16:08

PR changes

8ddda62

Fix unit tests

9ca3f91

Fix checkstyle

3ae0327

suneet-s reviewed Oct 16, 2023

View reviewed changes

PR changes

cf05797

YongGang reviewed Oct 16, 2023

View reviewed changes

georgew5656 requested a review from suneet-s October 16, 2023 21:18

suneet-s approved these changes Oct 16, 2023

View reviewed changes

georgew5656 merged commit dc0b163 into apache:master Oct 17, 2023
81 checks passed

georgew5656 added a commit to georgew5656/druid that referenced this pull request Nov 8, 2023

Revert "Separate task lifecycle from kubernetes/location lifecycle (a…

ab9092d

…pache#15133)" This reverts commit dc0b163.

georgew5656 added a commit that referenced this pull request Nov 8, 2023

Revert "Separate task lifecycle from kubernetes/location lifecycle (#…

130bfbf

…15133)" (#15346) This reverts commit dc0b163.

CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023

Revert "Separate task lifecycle from kubernetes/location lifecycle (a…

b1f0ee1

…pache#15133)" (apache#15346) This reverts commit dc0b163.

LakshSingla added this to the 29.0.0 milestone Jan 29, 2024

LakshSingla mentioned this pull request Feb 13, 2024

[DRAFT] 29.0.0 release notes #15896

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate task lifecycle from kubernetes/location lifecycle #15133

Separate task lifecycle from kubernetes/location lifecycle #15133

georgew5656 commented Oct 11, 2023

suneet-s commented Oct 13, 2023

suneet-s Oct 16, 2023

georgew5656 Oct 16, 2023

YongGang Oct 16, 2023

YongGang Oct 16, 2023

georgew5656 commented Oct 16, 2023

suneet-s left a comment

cryptoe commented Nov 8, 2023 •

edited

Loading

Separate task lifecycle from kubernetes/location lifecycle #15133

Separate task lifecycle from kubernetes/location lifecycle #15133

Conversation

georgew5656 commented Oct 11, 2023

Description

Release note

Key changed/added classes in this PR

suneet-s commented Oct 13, 2023

suneet-s Oct 16, 2023

Choose a reason for hiding this comment

georgew5656 Oct 16, 2023

Choose a reason for hiding this comment

YongGang Oct 16, 2023

Choose a reason for hiding this comment

YongGang Oct 16, 2023

Choose a reason for hiding this comment

georgew5656 commented Oct 16, 2023

suneet-s left a comment

Choose a reason for hiding this comment

cryptoe commented Nov 8, 2023 • edited Loading

cryptoe commented Nov 8, 2023 •

edited

Loading