Skip to content

Commit

Permalink
Refactor docs for async mode execution (#1241)
Browse files Browse the repository at this point in the history
Following up on the documentation added in PRs #1224 and #1230, this PR
refactors the documentation for Async Execution mode, particularly the
limitations section.

It also addresses a couple of un-rendered items in the scheduling.rst
file, caused by missing blank lines after the code-block directive.
  • Loading branch information
pankajkoti authored Oct 3, 2024
1 parent 111d430 commit afa635d
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 9 deletions.
3 changes: 3 additions & 0 deletions docs/configuration/scheduling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ To schedule a dbt project on a time-based schedule, you can use Airflow's schedu
schedule="@daily",
)
.. _data-aware-scheduling:

Data-Aware Scheduling
---------------------
Expand Down Expand Up @@ -77,6 +78,7 @@ If using cosmos with an Airflow 2.9 or below, users will experience the followin
Example of scheduler logs:

.. code-block::
scheduler | [2023-09-08T10:18:34.252+0100] {scheduler_job_runner.py:1742} INFO - Orphaning unreferenced dataset 'postgres://0.0.0.0:5432/postgres.public.stg_customers'
scheduler | [2023-09-08T10:18:34.252+0100] {scheduler_job_runner.py:1742} INFO - Orphaning unreferenced dataset 'postgres://0.0.0.0:5432/postgres.public.stg_payments'
scheduler | [2023-09-08T10:18:34.252+0100] {scheduler_job_runner.py:1742} INFO - Orphaning unreferenced dataset 'postgres://0.0.0.0:5432/postgres.public.stg_orders'
Expand Down Expand Up @@ -105,5 +107,6 @@ For users to overcome this limitation in local tests, until the Airflow communit
they can set this configuration to ``False``. It can also be set in the ``airflow.cfg`` file:

.. code-block::
[cosmos]
enable_dataset_alias = False
18 changes: 9 additions & 9 deletions docs/getting_started/execution-modes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -267,17 +267,17 @@ machine it took approximately 25 seconds for the task to compile & upload the co
however, it is still a win as it is one-time overhead and the subsequent tasks run asynchronously utilising the Airflow's
deferrable operators and supplying to them those compiled SQLs.

Note that currently, the ``airflow_async`` execution mode has the following limitations and is released as Experimental:
Note that currently, the ``airflow_async`` execution mode has the following limitations and is released as **Experimental**:

1. This feature only works when using Airflow 2.8 and above
2. Only supports the ``dbt resource type`` models to be run asynchronously using Airflow deferrable operators. All other resources are executed synchronously using dbt commands as they are in the ``local`` execution mode.
3. Only supports BigQuery as the target database. If a profile target other than BigQuery is specified, Cosmos will error out saying that the target database is not supported with this execution mode.
4. Only works for ``full_refresh`` models. There is pending work to support other modes.
5. Only Support for the Bigquery profile type
6. Users need to provide ProfileMapping parameter in ProfileConfig
7. It does not support dataset
1. **Airflow 2.8 or higher required**: This mode relies on Airflow's `Object Storage <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/objectstorage.html>`__ feature, introduced in Airflow 2.8, to store and retrieve compiled SQLs.
2. **Limited to dbt models**: Only dbt resource type models are run asynchronously using Airflow deferrable operators. Other resource types are executed synchronously, similar to the local execution mode.
3. **BigQuery support only**: This mode only supports BigQuery as the target database. If a different target is specified, Cosmos will throw an error indicating the target database is unsupported in this mode.
4. **ProfileMapping parameter required**: You need to specify the ``ProfileMapping`` parameter in the ``ProfileConfig`` for your DAG. Refer to the example DAG below for details on setting this parameter.
5. **Supports only full_refresh models**: Currently, only ``full_refresh`` models are supported. To enable this, pass ``full_refresh=True`` in the ``operator_args`` of the ``DbtDag`` or ``DbtTaskGroup``. Refer to the example DAG below for details on setting this parameter.
6. **location parameter required**: You must specify the location of the BigQuery dataset in the ``operator_args`` of the ``DbtDag`` or ``DbtTaskGroup``. The example DAG below provides guidance on this.
7. **No dataset emission**: The async run operators do not currently emit datasets, meaning that :ref:`data-aware-scheduling` is not supported at this time. Future releases will address this limitation.

You can leverage async operator support by installing an additional dependency
To start leveraging async execution mode that is currently supported for the BigQuery profile type targets you need to install Cosmos with the below additional dependencies:

.. code:: bash
Expand Down

0 comments on commit afa635d

Please sign in to comment.