-
Notifications
You must be signed in to change notification settings - Fork 61
Add driver and executor pod template fields to SparkJob #450
Conversation
Signed-off-by: Andrew Dye <[email protected]>
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #450 +/- ##
==========================================
+ Coverage 75.92% 78.48% +2.55%
==========================================
Files 18 18
Lines 1458 1250 -208
==========================================
- Hits 1107 981 -126
+ Misses 294 212 -82
Partials 57 57
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
oneof executorPodValue { | ||
core.K8sPod executorPod = 12; | ||
} | ||
// Reference to an existing PodTemplate k8s resource to be used as the base configuration when creating the | ||
// executor Pods for this task. If this value is set, the specified PodTemplate will be used instead of, but applied | ||
// identically as, the default PodTemplate configured in FlytePropeller. | ||
// +optional | ||
string executorPodTemplateName = 13; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason these can't just be the PodTemplate / PodTemplateName from the TaskExecutionContext? IIRC this is how we built some of the other plugins (ie. Dask, KF operator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean use the same pair for both executor and driver, use the existing pair from TaskExecutionContext for one of these and plumb the other here, or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either or. Both of these are available in the task decorator in flytekit, so if we added them both at the spark_config
level -- ex:
@task(
task_config=Spark(
spark_config=[
executor_pod_template=...
driver_pod_template=...
]
)
pod_template=odTemplate(
primary_container_name="primary",
labels={"lKeyA": "lValA", "lKeyB": "lValB"},
annotations={"aKeyA": "aValA", "aKeyB": "aValB"},
pod_spec=V1PodSpec(
volumes=[V1Volume(name="volume")],
tolerations=[
V1Toleration(
key="num-gpus",
operator="Equal",
value=1,
effect="NoSchedule",
),
],
)
)
it doesn't make sense to me intuitively how they would be applied. For things like resources / etc that are already available in the task decorator we implemented (in other plugins) that if set at the task-level they would be applied to all replicas (ie. executor / driver) unless there was a specific override there. So IMO we should apply task-level to the executor / driver by default and put an override on the driver only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I waffled on this a bit since the spark on k8s op itself has moved to an abstraction of explicit configs for each driver and executor, but yeah I tend to agree here that utilizing the top level TaskTemplate.PodTemplate
as the primary override for both + a driver-specific PodTemplate
in the SparkJob is more consistent with flytekit and other plugins.
We'll need to clarify the order of precedence w.r.t. top level configs (i.e., TaskTemplate.Container), top level pod template (i.e., TaskTemplate.PodTemplate.PrimaryContainerName), and driver-specifc config (i.e., SparkJob.PodTemplate.PrimaryContainerName). The same set of clarifications are needed for other configs (i.e., memory/cores, which can also be specified as spark config -- spark.executor.requests.memory
).
So the proposed python devx would be
@task(
task_config=Spark(
spark_conf={
"spark.driver.memory": "4G",
"spark.executor.memory": "8G",
},
# driver only pod template overrides (takes precedence over top level pod_template)
driver_pod_template=...
),
# default pod template overrides for BOTH driver and executor
pod_template=odTemplate(
primary_container_name="primary",
labels={"lKeyA": "lValA", "lKeyB": "lValB"},
annotations={"aKeyA": "aValA", "aKeyB": "aValB"},
pod_spec=V1PodSpec(
volumes=[V1Volume(name="volume")],
tolerations=[
V1Toleration(
key="num-gpus",
operator="Equal",
value=1,
effect="NoSchedule",
),
],
)
),
)
@pingsutw @wild-endeavor any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's very confusing to have podTemplate
in both @task
and spark
config, but I still think we need worker_pod_template
in spark config.
People may only want to override the worker_pod_template
, for example, add gpu selector, or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd personally rather have the original where both pod templates are in the spark config, and then just error if the pod level config is specified. but i'm okay any way. I just feel like applying one could make equally convincing arguments that the normal top level config should be for executors or for drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thoughtful feedback. After weighing this + the suggestions in the migrated PR, I'd like to propose
- A shared RoleSpec definition (could be used by other plugins in the future)
- Inclusion of driver and executor role specs in SparkJob
- Error in spark plugin if TaskTemplate.PodTemplate is set
- Use this pattern as a deprecation path for TaskTemplate.PodTemplate for Dask, MPI, PyTorch, and Ray should we choose to have per-role PodTemplates/overrides (i.e., start erroring if both TaskTemplate.PodTemplate and worker RoleSpec are specified
Will summarize fully in flyteorg/flyte#4179 and try to move the remainder of the communication there
Closing in favor of flyteorg/flyte#4179 |
TL;DR
Add driver and executor pod template fields to SparkJob
Type
Are all requirements met?
Complete description
This change adds optional fields for pod and pod template name for the driver and executor pods. This mirrors the interface we have for container tasks in TaskTemplate and TaskMetadata.
Tracking Issue
flyteorg/flyte#4105