The purpose of this ETL pipeline is to pull the following macroeconomic indicators:
- Willshire 5000 on a nightly basis.
- Federal interest rates on a nightly basis.
- GDP rates on a quarterly basis. The frequency will be interpolated into a daily frequency (as opposed to weekdays) using a variety of methods (linear interpolation, knn, spline, ensemble methods).
Each rate will be pulled in such a fashion:
- Deploy on AWS.
- Calculate Buffett Indicator using (WILLSHIRE5000/Daily interpolated GDP)
- Estimation using fbprophet
- Docker
- Docker compose
- Run start.sh to execute the docker-compose (you can do this in command prompt by typing
./start.sh
). - Optional: In
/airflow-docker-hdfs-spark-example/mnt/airflow/airflow.cfg
, fill insmtp_host
,smtp_user
,smtp_password
,smtp_mail_from
under thesmtp
tag. - Type
docker ps
in command line to view running containers. Copy the Container ID for Airflow - You can access the Airflow UI by going to http://localhost:8080 . Turn the DAG on, and trigger it.
- In command prompt, type
docker exec -it <your Airflow Container ID> /bin/bash
. - You are now in your Airflow container. Type
cd usr/local/airflow/dags/airflow_local_connections
in command prompt, and start your connections by typing inpython3 airflow_connections.py
- Exit your Airflow container by pressing Ctrl + D.
- Run stop.sh ( type
./stop.sh
).