-
Notifications
You must be signed in to change notification settings - Fork 24
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #162 from Teradata/airflow
Created documentation on How To use airflow
- Loading branch information
Showing
4 changed files
with
136 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
= airflow with Teradata Vantage | ||
:experimental: | ||
:page-author: Satish Chinthanippu | ||
:page-email: [email protected] | ||
:page-revdate: February 06th, 2024 | ||
:description: Use Airflow with Teradata Vantage. | ||
:keywords: data warehouses, compute storage separation, teradata, vantage, cloud data platform, object storage, business intelligence, enterprise analytics, elt, airflow, workflow. | ||
:tabs: | ||
:dir: airflow | ||
|
||
== Overview | ||
|
||
This tutorial demonstrates how to use airflow with Teradata Vantage. Airflow will be installed on Ubuntu System. | ||
|
||
== Prerequisites | ||
|
||
* Ubuntu 22.x | ||
* Access to a Teradata Vantage instance. | ||
+ | ||
include::ROOT:partial$vantage_clearscape_analytics.adoc[] | ||
* Python *3.8*, *3.9*, *3.10* or *3.11* installed. | ||
|
||
== Install apache airflow | ||
|
||
1. Set Airflow home. Airflow requires a home directory, and uses ~/airflow by default, but you can set a different location if you prefer. The AIRFLOW_HOME environment variable is used to inform Airflow of the desired location. | ||
+ | ||
[source, bash] | ||
---- | ||
export AIRFLOW_HOME=~/airflow | ||
---- | ||
2. Install `apache-airflow` stable version 2.8.1 from PyPI repository.: | ||
+ | ||
[source, bash] | ||
---- | ||
AIRFLOW_VERSION=2.8.1 | ||
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)" | ||
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt" | ||
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}" | ||
---- | ||
3. Install airflow teradata provider stable version 1.0.0 from PyPI repository. | ||
+ | ||
[source, bash] | ||
---- | ||
pip install "apache-airflow-providers-teradata==1.0.0" | ||
---- | ||
|
||
== Start Airflow Standalone | ||
|
||
1. Run Airflow Standalone | ||
+ | ||
[source, bash] | ||
---- | ||
airflow standalone | ||
---- | ||
2. Access the Airflow UI. Visit localhost:8080 in browser and log in with the admin account details shown in the terminal. | ||
|
||
== Define Teradata Connection in Airflow UI | ||
|
||
1. Open the Admin -> Connections section of the UI. Click the Create link to create a new connection. | ||
+ | ||
image::{dir}/airflow-connection.png[Airflow admin dropdown, width=75%] | ||
2. Fill below input details in New Connection Page. | ||
+ | ||
image::{dir}/airflow-newconnection.png[Airflow New Connection, width=75%] | ||
* Connection Id: Unique ID of Teradata Connection. | ||
* Connection Type: Type of the system. Select Teradata. | ||
* Database Server URL (required): Teradata instance hostname to connect to. | ||
* Database (optional): Specify the name of the database to connect to | ||
* Login (required): Specify the user name to connect. | ||
* Password (required): Specify the password to connect. | ||
* Click on Test and Save. | ||
|
||
== Define a DAG in Airflow | ||
|
||
1. In Airflow, DAGs are defined as Python code. | ||
2. Create a DAG as python file like sample.py under DAG_FOLDER - $AIRFLOW_HOME/files/dags directory. | ||
+ | ||
[source, python] | ||
---- | ||
from datetime import datetime | ||
from airflow import DAG | ||
from airflow.providers.teradata.operators.teradata import TeradataOperator | ||
CONN_ID = "Teradata_TestConn" | ||
with DAG( | ||
dag_id="example_teradata_operator", | ||
max_active_runs=1, | ||
max_active_tasks=3, | ||
catchup=False, | ||
start_date=datetime(2023, 1, 1), | ||
) as dag: | ||
create = TeradataOperator( | ||
task_id="table_create", | ||
conn_id=CONN_ID, | ||
sql=""" | ||
CREATE TABLE my_users, | ||
FALLBACK ( | ||
user_id decimal(10,0) NOT NULL GENERATED ALWAYS AS IDENTITY ( | ||
START WITH 1 | ||
INCREMENT BY 1 | ||
MINVALUE 1 | ||
MAXVALUE 2147483647 | ||
NO CYCLE), | ||
user_name VARCHAR(30) | ||
) PRIMARY INDEX (user_id); | ||
""", | ||
) | ||
---- | ||
|
||
== Load DAG | ||
|
||
Airflow loads DAGs from Python source files, which it looks for inside its configured DAG_FOLDER - $AIRFLOW_HOME/files/dags directory. | ||
|
||
== Run DAG | ||
DAGs will run in one of two ways: | ||
1. When they are triggered either manually or via the API | ||
2. On a defined schedule, which is defined as part of the DAG | ||
`example_teradata_operator` is defined to trigger as manually. To define schedule, any valid link:https://en.wikipedia.org/wiki/Cron[Crontab, window="_blank"] schedule value can be passed to schedule argument. | ||
[source, python] | ||
---- | ||
with DAG( | ||
dag_id="my_daily_dag", | ||
schedule="0 0 * * *" | ||
) as dag: | ||
---- | ||
|
||
== Summary | ||
|
||
This tutorial demonstrated how to use airflow and airflow teradata provider with Teradata instance. The example DAG provided creates my_users table in teradata instance defined in Connection UI. | ||
|
||
== Further reading | ||
* link:https://airflow.apache.org/docs/apache-airflow/stable/start.html[airflow documentation] | ||
* link:https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html[airflow DAGs] | ||
|
||
|
||
include::ROOT:partial$community_link.adoc[] |