This repository consists in a solution for the challenge required for applying to Alkemy Data Analytics - Python acceleration program.
The project consumes 3 datasets from datos.gob.ar CKAN API with information of libraries, museums and cinemas from Argentina, saves them into csv files, and after normalizing them, populates a PostgreSQL database with 3 different tables:
- Sitios: A merge of the 3 datasets with selected columns formerly normalized.
- Totales: A table with unique value counts of selected columns from the original data: category, source, and province and category merged.
- Cines: A table with the sum of cinemas, screen, seats and registered INCAA spaces, aggregated by province.
- Python 3 (tested on Python 3.8.5)
- Git (tested on Git 2.31.1.windows.1)
- PostgreSQL (tested on PostgreSQL 14.1)
- PgAdmin 4 (optional)
Open CMD and execute the following commands:
-
Clone this repository into desired directory
git clone https://github.com/facundofacio/alkemy-python-challenge-solution
-
Go to project directory
cd alkemy-python-challenge-solution
-
Create virtual environment
python -m venv env
-
Activate virtual environment
env\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Configure SQLalchemy database settings
Edit lines 17-21 in \components\settings.ini file as needed:
USER = postgres PASS = admin DB_HOST = localhost PORT = 5432 DB_NAME = cultura
- USER: PostgreSQL user
- PASS: PostgreSQL password
- DB_HOST: Database host ip (Default localhost)
- PORT: Database port (Default 5432)
- DB_NAME: Database name to be created and populated with downloaded data.
In CMD, go to project directory and execute script.py
python script.py