Skip to content

Latest commit

 

History

History
91 lines (58 loc) · 3.52 KB

README.md

File metadata and controls

91 lines (58 loc) · 3.52 KB

Apache_Airflow_labs

Building data processing pipelines in Apache Airflow/ELT pattern

This repository showcases the results of the labs completed during my first semester at Igor Sikorsky Kyiv Polytechnic Institute, where I pursued a Master's degree in Informatics and Software Engineering🎓 - Link of faculty

The labs primarily focus on Apache Airflow and demonstrate data processing pipelines built using the ELT pattern. Through this repository, I aim to share my practical experiences and learnings from these labs with others interested in data engineering and workflow automation using Apache Airflow.

Installation

  1. Before proceeding, ensure you have Apache Airflow installed on your PC. If you are using a Windows system, you can use the Ubuntu subsystem available at https://www.microsoft.com/en-us/p/ubuntu/9nblggh4msv6. Make sure to enable developer mode in Windows Developer Settings and activate the Windows Subsystem for Linux component in Windows Features.

Ubuntu Subsystem

  1. Install the required packages by running the following commands:
sudo apt-get update
sudo apt-get install libmysqlclient-dev
sudo apt-get install libkrb5-dev
sudo apt-get install libsasl2-dev
sudo apt-get install postgresql postgresql-contrib
sudo service postgresql start
sudo nano /etc/postgresql/*/main/pg_hba.conf

PostgreSQL Configuration

sudo service postgresql restart
sudo apt install python3-pip
pip install apache-airflow

Install Apache Airflow

sudo pip install apache-airflow
airflow db init

Initialize Airflow Database

sudo apt-get install build-dep python-psycopg2
pip install psycopg2-binary

Install psycopg2-binary

Setting up DAGs

  1. Place your DAGs in the following folder path: C:/Users/vicwa/AppData/Local/Packages/CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc/LocalState/rootfs/home/vic/airflow/dags

DAGs Path

Creating the database

  1. Create a database using the following command:
psql -h 127.0.0.1 -d airflow -U vic

Create Database

Running Airflow

  1. Run the following commands in the Ubuntu console:
sudo service postgresql restart
airflow db init
airflow webserver -p 8080
airflow scheduler
sudo service postgresql restart

Start Airflow Services

Lab1 Results

  1. Results for Lab1:

Result 1

Result 2

Result 3