Skip to content

Commit

Permalink
Merge pull request #162 from Teradata/airflow
Browse files Browse the repository at this point in the history
Created documentation on How To use airflow
  • Loading branch information
satish-chinthanippu authored Feb 8, 2024
2 parents 4e67df4 + 83a1b6d commit 3ebf490
Show file tree
Hide file tree
Showing 4 changed files with 136 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
* Manage data
** xref::nos.adoc[]
** xref::select-the-right-data-ingestion-tools-for-teradata-vantage.adoc[]
** xref::airflow.adoc[]
** xref::dbt.adoc[]
** xref::advanced-dbt.adoc[]
** xref:modelops:using-feast-feature-store-with-teradata-vantage.adoc[]
Expand Down
135 changes: 135 additions & 0 deletions modules/ROOT/pages/airflow.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
= airflow with Teradata Vantage
:experimental:
:page-author: Satish Chinthanippu
:page-email: [email protected]
:page-revdate: February 06th, 2024
:description: Use Airflow with Teradata Vantage.
:keywords: data warehouses, compute storage separation, teradata, vantage, cloud data platform, object storage, business intelligence, enterprise analytics, elt, airflow, workflow.
:tabs:
:dir: airflow

== Overview

This tutorial demonstrates how to use airflow with Teradata Vantage. Airflow will be installed on Ubuntu System.

== Prerequisites

* Ubuntu 22.x
* Access to a Teradata Vantage instance.
+
include::ROOT:partial$vantage_clearscape_analytics.adoc[]
* Python *3.8*, *3.9*, *3.10* or *3.11* installed.

== Install apache airflow

1. Set Airflow home. Airflow requires a home directory, and uses ~/airflow by default, but you can set a different location if you prefer. The AIRFLOW_HOME environment variable is used to inform Airflow of the desired location.
+
[source, bash]
----
export AIRFLOW_HOME=~/airflow
----
2. Install `apache-airflow` stable version 2.8.1 from PyPI repository.:
+
[source, bash]
----
AIRFLOW_VERSION=2.8.1
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
----
3. Install airflow teradata provider stable version 1.0.0 from PyPI repository.
+
[source, bash]
----
pip install "apache-airflow-providers-teradata==1.0.0"
----

== Start Airflow Standalone

1. Run Airflow Standalone
+
[source, bash]
----
airflow standalone
----
2. Access the Airflow UI. Visit localhost:8080 in browser and log in with the admin account details shown in the terminal.

== Define Teradata Connection in Airflow UI

1. Open the Admin -> Connections section of the UI. Click the Create link to create a new connection.
+
image::{dir}/airflow-connection.png[Airflow admin dropdown, width=75%]
2. Fill below input details in New Connection Page.
+
image::{dir}/airflow-newconnection.png[Airflow New Connection, width=75%]
* Connection Id: Unique ID of Teradata Connection.
* Connection Type: Type of the system. Select Teradata.
* Database Server URL (required): Teradata instance hostname to connect to.
* Database (optional): Specify the name of the database to connect to
* Login (required): Specify the user name to connect.
* Password (required): Specify the password to connect.
* Click on Test and Save.

== Define a DAG in Airflow

1. In Airflow, DAGs are defined as Python code.
2. Create a DAG as python file like sample.py under DAG_FOLDER - $AIRFLOW_HOME/files/dags directory.
+
[source, python]
----
from datetime import datetime
from airflow import DAG
from airflow.providers.teradata.operators.teradata import TeradataOperator
CONN_ID = "Teradata_TestConn"
with DAG(
dag_id="example_teradata_operator",
max_active_runs=1,
max_active_tasks=3,
catchup=False,
start_date=datetime(2023, 1, 1),
) as dag:
create = TeradataOperator(
task_id="table_create",
conn_id=CONN_ID,
sql="""
CREATE TABLE my_users,
FALLBACK (
user_id decimal(10,0) NOT NULL GENERATED ALWAYS AS IDENTITY (
START WITH 1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 2147483647
NO CYCLE),
user_name VARCHAR(30)
) PRIMARY INDEX (user_id);
""",
)
----

== Load DAG

Airflow loads DAGs from Python source files, which it looks for inside its configured DAG_FOLDER - $AIRFLOW_HOME/files/dags directory.

== Run DAG
DAGs will run in one of two ways:
1. When they are triggered either manually or via the API
2. On a defined schedule, which is defined as part of the DAG
`example_teradata_operator` is defined to trigger as manually. To define schedule, any valid link:https://en.wikipedia.org/wiki/Cron[Crontab, window="_blank"] schedule value can be passed to schedule argument.
[source, python]
----
with DAG(
dag_id="my_daily_dag",
schedule="0 0 * * *"
) as dag:
----

== Summary

This tutorial demonstrated how to use airflow and airflow teradata provider with Teradata instance. The example DAG provided creates my_users table in teradata instance defined in Connection UI.

== Further reading
* link:https://airflow.apache.org/docs/apache-airflow/stable/start.html[airflow documentation]
* link:https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html[airflow DAGs]


include::ROOT:partial$community_link.adoc[]

0 comments on commit 3ebf490

Please sign in to comment.