BI-DAGs is a component of the monitoring project developed for the RCS-SIS group. It plays an important role in tracking key performance indicators (KPIs) to monitor progress and analyze the current situation.
BI-DAGs operates as an Apache Airflow instance dedicated to managing data harvesting from various sources including CDS, ILS, and others. The harvested data is then processed and pushed into a PostgreSQL database. Subsequently, Apache Superset retrieves this data to present it in the desired format for analysis and reporting.
This README provides a step-by-step guide on setting up your environment for running BI-DAGs with Airflow.
Before you begin, ensure you have pyenv
installed on your system. If you don't have pyenv
installed, please follow the instructions here.
First, we'll set up a Python environment using pyenv
.
# Define the desired Python version
export PYTHON_VERSION=3.10.11
# Install the specified Python version using pyenv
pyenv install $PYTHON_VERSION
# Set the global Python version to the installed one
pyenv global $PYTHON_VERSION
# Create a virtual environment named 'bi-dags'
pyenv virtualenv $PYTHON_VERSION bi-dags
# Activate the virtual environment
pyenv activate bi-dags
Change your current working directory to bi-dags
.
cd bi-dags
With your virtual environment activated, install the necessary dependencies.
pip install -r requirements.txt
Configure the Airflow home environment variable.
export AIRFLOW_HOME=$PWD
Initialize and start Airflow using the standalone command.
airflow standalone
If you're using Docker to manage your Postgres database, start the service.
docker-compose -f docker-compose.standalone.yaml up
Lastly, add the necessary Airflow connections through the UI.
- Navigate to Admin -> Connections in the Airflow UI.
- Click on "Add" and fill in the details:
- Connection Id:
superset_qa
- Login:
airflow
- Database:
airflow
- Password:
airflow
- Host:
localhost
- Port:
5432
- Connection Type:
postgres
- Connection Id:
More information, how to manage db connections can be found here.
After completing these steps, your environment should be set up and ready for running BI-DAGs with Airflow.
- To start the services using Docker Compose, simply run:
docker-compose up
All the required environment variables are already configured in the docker-compose.yml
file.
Before logging into the Airflow UI, you need to create a user. Follow these steps to create a user in the Airflow web container from the command line:
- Ensure the Airflow services are running.
- Access the Airflow web container by running:
docker-compose exec airflow-web bash
- Create a new Airflow user with the following command (replace
<username>
,<password>
,<firstname>
,<lastname>
, and<email>
with your desired values):
airflow users create --username <username> --password <password> --firstname <firstname> --lastname <lastname> --role Admin --email <email>
Example:
airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@example.com
After creating the user, you can log in to the Airflow UI with the credentials you specified.
By following these guidelines, you can seamlessly manage and track database migrations within your Airflow environment. Database migrations are done by running migrations DAG. To create a new migration you need:
- Navigate to the Migrations Folder:
Open your terminal and change to the migrations directory by running:
cd $AIRFLOW_HOME/dags/migrations
- Create a New Migration Revision: Use the Alembic command to create a new revision. For example:
alembic revision -m "My Database Revision"
This command generates a new migration script.
-
Edit the Migration Script: Modify the newly created migration script to include your desired upgrade and downgrade actions.
-
Apply the Migration: To execute the migration, trigger the migrations DAG with the necessary parameters (as an example, revision number 64ac526a078b):
{"command": "upgrade", "revision": "64ac526a078b"}
This can be done through the API by passing the parameters, or via the UI by initiating the DAG with these settings.
- Push the Version File: Ensure to commit and push the updated version file to the main branch to apply the migrations in QA or PRODUCTION environments.