From 12bd353aa08bc8d24b1d79d6598e8859f761002a Mon Sep 17 00:00:00 2001 From: Daniel Sola <40698988+dansola@users.noreply.github.com> Date: Tue, 25 Jun 2024 14:56:12 -0700 Subject: [PATCH] Fix some nits in workflow lifecycle docs. (#5389) Signed-off-by: Daniel Sola --- docs/concepts/workflow_lifecycle.rst | 32 +++++++++++++--------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/docs/concepts/workflow_lifecycle.rst b/docs/concepts/workflow_lifecycle.rst index f549493e83..1019f7c124 100644 --- a/docs/concepts/workflow_lifecycle.rst +++ b/docs/concepts/workflow_lifecycle.rst @@ -135,7 +135,7 @@ used by Flytepropeller (Kubernetes Operator) to know “how” to execute this task. ``Interface`` contains information about what are the inputs and outputs -of our task. Flyte uses this interface to check if tasks are composible. +of our task. Flyte uses this interface to check if tasks are composable. ``Custom`` is a collection of arbitrary Key/Values, think of it as a Json dict that any plugin can define as it wishes. In this case the @@ -179,23 +179,23 @@ codebases. 3. Once user has packaged workflows and tasks then a registration step is needed. During registration Flyte adds these protocolbuffer files to its database, essentially making these tasks and workflows runnable for - the user. Registration is done via `Flytectl ` __ + the user. Registration is done via `Flytectl `__. -4. At somepoint a Flyte user will trigger a Workflow run. The workflow +4. At some point a Flyte user will trigger a Workflow run. The workflow run will start running the defined DAG. Eventually our Spark task - will need to run,. This is where the second step of a plugin kicks + will need to run. This is where the second step of a plugin kicks in. Flytepropeller (Kubernetes Operator) will realize that this is a Task of type ``Spark`` and it will handle it differently. - - FlytePropeller knows a task is of type Spark, because our ``TaskTemplate`` defined it so ``Type: Spark`` + - FlytePropeller knows a task is of type Spark, because our ``TaskTemplate`` defined it so ``Type: Spark``. - Flyte has a ``PluginRegistry`` which has a dictionary from ``Task Type`` to ``Plugin Handlers``. - At run time Flytepropeller will run our task, Flytepropeller will figure out it is a Spark task, and then call the method ``BuildResource`` in Spark's plugin implementation. ``BuildResource`` is a method that each plugin has to implement. - - `Plugin `__ is a Golang interface providing an important method ``BuildResource`` + - `Plugin `__ is a Golang interface providing an important method ``BuildResource``. - - Spark has its own Plugin defined `here in Flyteplugins repo `__ + - Spark has its own Plugin defined `here in the Flyteplugins repo `__. Inside Spark’s `BuildResource `__ @@ -212,19 +212,19 @@ method is where magic happens. At task runtime: 5. A pod with entrypoint to ``pyflyte-execute`` execute starts running (Spark App). - - ``pyflyte-execute`` provides all the plumbing magic that is needed. In this particular case, It will create a SparkSession and injects it somewhere so that it is ready for when the user defined python’s code starts running. Be aware that this is part of the SDK code (Flytekit). + - ``pyflyte-execute`` provides all the plumbing magic that is needed. In this particular case, it will create a SparkSession and injects it somewhere so that it is ready for when the user defined python’s code starts running. Be aware that this is part of the SDK code (Flytekit). - ``pyflyte-execute`` points to `execute_task_cmd `__. This entrypoint does a lot of things: - - Resolves the function that the user wants to run. i.e: where is the needed package where this function lives? . this is what ``"flytekit.core.python_auto_container.default_task_resolver"`` does + - Resolves the function that the user wants to run. i.e: where is the needed package where this function lives? This is what ``"flytekit.core.python_auto_container.default_task_resolver"`` does. - - Downloads needed inputs and do a transformation if need be. I.e: is this a Dataframe? if so we need to transform it into a Pandas DF from parquet. + - Downloads needed inputs and does a transformation if need be. i.e: is this a Dataframe? If so, we need to transform it into a Pandas DF from parquet. - - Calls `dispatch_execute `__ . This trigger the execution of our spark task. + - Calls `dispatch_execute `__. This triggers the execution of our spark task. - - `PysparkFunctionTask `__. defines what gets run just before the user's task code gets executed. It essentially creatse a spark session and then run the user function (The actual code we want to run!). + - `PysparkFunctionTask `__ defines what gets run just before the user's task code gets executed. It essentially creates a spark session and then runs the user function (The actual code we want to run!). ------------ @@ -232,14 +232,12 @@ Recap ----- - Flyte requires coordination between multiple pieces of code. In this - case the SDK and FlytePropeller (K8s operator) -- `Flyte IDL (Interface Language Definition) `__ provides some primitives - for services to talk with each other. Flyte uses Procolbuffer - representations of these primitives + case the SDK and FlytePropeller (K8s operator). +- `Flyte IDL (Interface Language Definition) `__ provides some primitives for services to talk with each other. Flyte uses Procolbuffer representations of these primitives. - Three important primitives are : ``Container``, ``K8sPod``, ``Sql``. At the end of the day all tasks boil down to one of those three. - github.com/flyteorg/FlytePlugins repository contains all code for plugins: - Spark, AWS Athena, BigQuery… + Spark, AWS Athena, BigQuery, etc. - Flyte entrypoints are the ones carrying out the heavy lifting: making sure that inputs are downloaded and/or transformed as needed. - When running workflows on Flyte, if we want to use Flyte underlying plumbing then