-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #65 from sfu-discourse-lab/V7.0
Update code base to V7.0
- Loading branch information
Showing
89 changed files
with
3,415 additions
and
1,835 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# APIs for public-facing dashboards | ||
|
||
This section hosts code for the backend APIs that serve our public-facing dashboards for our partner organization, Informed Opinions. | ||
|
||
We have two APIs: one each serving the English and French dashboards (for the Gender Gap Tracker and the Radar de Parité, respectively). | ||
|
||
## Dashboards | ||
* English: https://gendergaptracker.informedopinions.org | ||
* French: https://radardeparite.femmesexpertes.ca | ||
|
||
### Front end code | ||
|
||
The front end code base, for clearer separation of roles and responsibilities, is hosted elsewhere in private repos. Access to these repos is restricted, so please reach out to [email protected] to get access to the code, if required. | ||
|
||
## Setup | ||
|
||
Both APIs are written using [FastAPI](https://fastapi.tiangolo.com/), a high-performance web framework for building APIs in Python. | ||
|
||
This code base has been tested in Python 3.9, but there shouldn't be too many problems if using a higher Python version. | ||
|
||
Install the required dependencies via `requirements.txt` as follows. | ||
|
||
Install a new virtual environment if it does not already exist: | ||
```sh | ||
$ python3.9 -m venv api_venv | ||
$ python3.9 -m pip install -r requirements.txt | ||
``` | ||
|
||
For further use, activate the virtual environment: | ||
|
||
```sh | ||
$ source api_venv/bin/activate | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Gender Gap Tracker: API | ||
|
||
This section contains the code for the API that serves the [Gender Gap Tracker public dashboard](https://gendergaptracker.informedopinions.org/). The dashboard itself is hosted externally, and its front end code is hosted on this [GitLab repo](https://gitlab.com/client-transfer-group/gender-gap-tracker). | ||
|
||
## API docs | ||
|
||
The docs can be accessed in one of two ways: | ||
|
||
* Swagger: https://gendergaptracker.informedopinions.org/docs | ||
* Useful to test out the API interactively on the browser | ||
* Redoc: https://gendergaptracker.informedopinions.org/redoc | ||
* Clean, modern UI to see the API structure in a responsive format | ||
|
||
## Extensibility | ||
|
||
The code base has been written with the intention that future developers can add endpoints for other functionality that can potentially serve other dashboards. | ||
|
||
* `db`: Contains MongoDB-specific code (config and queries) that help interact with the RdP data on our MongoDB database | ||
* `endpoints`: Add new functionality to process and serve results via RESTful API endpoints | ||
* `schemas`: Perform response data validation so that the JSON results from the endpoint are formatted properly in the docs | ||
* `utils`: Add utility functions that support data manipulation within the routers | ||
* `gunicorn_conf.py`: Contains deployment-specific instructions for the web server, explained below. | ||
|
||
## Deployment | ||
|
||
We perform a standard deployment of FastAPI in production, as per the best practices [shown in this blog post](https://www.vultr.com/docs/how-to-deploy-fastapi-applications-with-gunicorn-and-nginx-on-ubuntu-20-04/). | ||
|
||
* `uvicorn` is used as an async web server (compatible with the `gunicorn` web server for production apps) | ||
* `gunicorn` works as a process manager that starts multiple `uvicorn` processes via the `uvicorn.workers.UvicornWorker` class | ||
* `nginx` is used as a reverse proxy | ||
|
||
The deployment and maintenance of the web server is carried out by SFU's Research Computing Group (RCG). | ||
|
||
|
||
|
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
host = ["mongo0", "mongo1", "mongo2"] | ||
# host = "localhost" | ||
is_direct_connection = True if (host == "localhost") else False | ||
|
||
config = { | ||
"MONGO_HOST": host, | ||
"MONGO_PORT": 27017, | ||
"MONGO_ARGS": { | ||
"authSource": "admin", | ||
"readPreference": "primaryPreferred", | ||
"username": "username", | ||
"password": "password", | ||
"directConnection": is_direct_connection, | ||
}, | ||
"DB_NAME": "mediaTracker", | ||
"LOGS_DIR": "logs/", | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
def agg_total_per_outlet(begin_date: str, end_date: str): | ||
query = [ | ||
{"$match": {"publishedAt": {"$gte": begin_date, "$lte": end_date}}}, | ||
{ | ||
"$group": { | ||
"_id": "$outlet", | ||
"totalArticles": {"$sum": "$totalArticles"}, | ||
"totalFemales": {"$sum": "$totalFemales"}, | ||
"totalMales": {"$sum": "$totalMales"}, | ||
"totalUnknowns": {"$sum": "$totalUnknowns"}, | ||
} | ||
}, | ||
] | ||
return query | ||
|
||
|
||
def agg_total_by_week(begin_date: str, end_date: str): | ||
query = [ | ||
{"$match": {"publishedAt": {"$gte": begin_date, "$lte": end_date}}}, | ||
{ | ||
"$group": { | ||
"_id": { | ||
"outlet": "$outlet", | ||
"week": {"$week": "$publishedAt"}, | ||
"year": {"$year": "$publishedAt"}, | ||
}, | ||
"totalFemales": {"$sum": "$totalFemales"}, | ||
"totalMales": {"$sum": "$totalMales"}, | ||
"totalUnknowns": {"$sum": "$totalUnknowns"}, | ||
} | ||
}, | ||
] | ||
return query |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
import pandas as pd | ||
import utils.dateutils as dateutils | ||
from db.mongoqueries import agg_total_by_week, agg_total_per_outlet | ||
from fastapi import APIRouter, HTTPException, Request, Query | ||
from schemas.stats_by_date import TotalStatsByDate | ||
from schemas.stats_weekly import TotalStatsByWeek | ||
|
||
outlet_router = APIRouter() | ||
COLLECTION_NAME = "mediaDaily" | ||
LOWER_BOUNT_START_DATE = "2018-10-01" | ||
ID_MAPPING = {"Huffington Post": "HuffPost Canada"} | ||
|
||
|
||
@outlet_router.get( | ||
"/info_by_date", | ||
response_model=TotalStatsByDate, | ||
response_description="Get total and per outlet gender statistics for English outlets between two dates", | ||
) | ||
def expertwomen_info_by_date( | ||
request: Request, | ||
begin: str = Query(description="Start date in yyyy-mm-dd format"), | ||
end: str = Query(description="End date in yyyy-mm-dd format"), | ||
) -> TotalStatsByDate: | ||
if not dateutils.is_valid_date_range(begin, end, LOWER_BOUNT_START_DATE): | ||
raise HTTPException( | ||
status_code=416, | ||
detail=f"Date range error: Should be between {LOWER_BOUNT_START_DATE} and tomorrow's date", | ||
) | ||
begin = dateutils.convert_date(begin) | ||
end = dateutils.convert_date(end) | ||
|
||
query = agg_total_per_outlet(begin, end) | ||
response = request.app.connection[COLLECTION_NAME].aggregate(query) | ||
# Work with the data in pandas | ||
source_stats = list(response) | ||
df = pd.DataFrame.from_dict(source_stats) | ||
df["totalGenders"] = df["totalFemales"] + df["totalMales"] + df["totalUnknowns"] | ||
# Replace outlet names if necessary | ||
df["_id"] = df["_id"].replace(ID_MAPPING) | ||
# Take sums of total males, females, unknowns and articles and convert to dict | ||
result = df.drop("_id", axis=1).sum().to_dict() | ||
# Compute per outlet stats | ||
df["perFemales"] = df["totalFemales"] / df["totalGenders"] | ||
df["perMales"] = df["totalMales"] / df["totalGenders"] | ||
df["perUnknowns"] = df["totalUnknowns"] / df["totalGenders"] | ||
df["perArticles"] = df["totalArticles"] / result["totalArticles"] | ||
# Convert dataframe to dict prior to JSON serialization | ||
result["sources"] = df.to_dict("records") | ||
result["perFemales"] = result["totalFemales"] / result["totalGenders"] | ||
result["perMales"] = result["totalMales"] / result["totalGenders"] | ||
result["perUnknowns"] = result["totalUnknowns"] / result["totalGenders"] | ||
return result | ||
|
||
|
||
@outlet_router.get( | ||
"/weekly_info", | ||
response_model=TotalStatsByWeek, | ||
response_description="Get gender statistics per English outlet aggregated WEEKLY between two dates", | ||
) | ||
def expertwomen_weekly_info( | ||
request: Request, | ||
begin: str = Query(description="Start date in yyyy-mm-dd format"), | ||
end: str = Query(description="End date in yyyy-mm-dd format"), | ||
) -> TotalStatsByWeek: | ||
if not dateutils.is_valid_date_range(begin, end, LOWER_BOUNT_START_DATE): | ||
raise HTTPException( | ||
status_code=416, | ||
detail=f"Date range error: Should be between {LOWER_BOUNT_START_DATE} and tomorrow's date", | ||
) | ||
begin = dateutils.convert_date(begin) | ||
end = dateutils.convert_date(end) | ||
|
||
query = agg_total_by_week(begin, end) | ||
response = request.app.connection[COLLECTION_NAME].aggregate(query) | ||
# Work with the data in pandas | ||
df = ( | ||
pd.json_normalize(list(response), max_level=1) | ||
.sort_values(by="_id.outlet") | ||
.reset_index(drop=True) | ||
) | ||
df.rename( | ||
columns={ | ||
"_id.outlet": "outlet", | ||
"_id.week": "week", | ||
"_id.year": "year", | ||
}, | ||
inplace=True, | ||
) | ||
# Replace outlet names if necessary | ||
df["outlet"] = df["outlet"].replace(ID_MAPPING) | ||
# Construct DataFrame and handle begin/end dates as datetimes for summing by week | ||
df["w_begin"] = df.apply(lambda row: dateutils.get_week_bound(row["year"], row["week"], 0), axis=1) | ||
df["w_end"] = df.apply(lambda row: dateutils.get_week_bound(row["year"], row["week"], 6), axis=1) | ||
df["w_begin"], df["w_end"] = zip(*df.apply(lambda row: (pd.to_datetime(row["w_begin"]), pd.to_datetime(row["w_end"])), axis=1)) | ||
df = ( | ||
df.drop(columns=["week", "year"], axis=1) | ||
.sort_values(by=["outlet", "w_begin"]) | ||
) | ||
# In earlier versions, there was a bug due to which we returned weekly information for the same week begin date twice | ||
# This bug only occurred when the last week of one year spanned into the next year (partial week across a year boundary) | ||
# To address this, we perform summation of stats by week to avoid duplicate week begin dates being passed to the front end | ||
df = df.groupby(["outlet", "w_begin", "w_end"]).sum().reset_index() | ||
df["totalGenders"] = df["totalFemales"] + df["totalMales"] + df["totalUnknowns"] | ||
df["perFemales"] = df["totalFemales"] / df["totalGenders"] | ||
df["perMales"] = df["totalMales"] / df["totalGenders"] | ||
df["perUnknowns"] = df["totalUnknowns"] / df["totalGenders"] | ||
# Convert datetimes back to string for JSON serialization | ||
df["w_begin"] = df["w_begin"].dt.strftime("%Y-%m-%d") | ||
df["w_end"] = df["w_end"].dt.strftime("%Y-%m-%d") | ||
df = df.drop(columns=["totalGenders", "totalFemales", "totalMales", "totalUnknowns"], axis=1) | ||
|
||
# Convert dataframe to dict prior to JSON serialization | ||
weekly_data = dict() | ||
for outlet in df["outlet"]: | ||
per_outlet_data = df[df["outlet"] == outlet].to_dict(orient="records") | ||
# Remove the outlet key from weekly_data | ||
[item.pop("outlet") for item in per_outlet_data] | ||
weekly_data[outlet] = per_outlet_data | ||
output = {"outlets": weekly_data} | ||
return output |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# gunicorn_conf.py to point gunicorn to the uvicorn workers | ||
from multiprocessing import cpu_count | ||
|
||
# Socket path | ||
bind = 'unix:/path_to_code/GenderGapTracker/api/english/g-tracker.sock' | ||
|
||
# Worker Options | ||
workers = cpu_count() + 1 | ||
worker_class = 'uvicorn.workers.UvicornWorker' | ||
|
||
# Logging Options | ||
loglevel = 'debug' | ||
accesslog = '/path_to_code/GenderGapTracker/api/english/access_log' | ||
errorlog = '/path_to_code/GenderGapTracker/api/english/error_log' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
[loggers] | ||
keys=root, gunicorn.error, gunicorn.access | ||
|
||
[handlers] | ||
keys=console, error_file, access_file | ||
|
||
[formatters] | ||
keys=generic, access | ||
|
||
[logger_root] | ||
level=INFO | ||
handlers=console | ||
|
||
[logger_gunicorn.error] | ||
level=INFO | ||
handlers=error_file | ||
propagate=1 | ||
qualname=gunicorn.error | ||
|
||
[logger_gunicorn.access] | ||
level=INFO | ||
handlers=access_file | ||
propagate=0 | ||
qualname=gunicorn.access | ||
|
||
[handler_console] | ||
class=StreamHandler | ||
formatter=generic | ||
args=(sys.stdout, ) | ||
|
||
[handler_error_file] | ||
class=logging.FileHandler | ||
formatter=generic | ||
args=('/var/log/gunicorn/error.log',) | ||
|
||
[handler_access_file] | ||
class=logging.FileHandler | ||
formatter=access | ||
args=('/var/log/gunicorn/access.log',) | ||
|
||
[formatter_generic] | ||
format=%(asctime)s [%(process)d] [%(levelname)s] %(message)s | ||
datefmt=%Y-%m-%d %H:%M:%S | ||
class=logging.Formatter | ||
|
||
[formatter_access] | ||
format=%(message)s | ||
class=logging.Formatter |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
from pathlib import Path | ||
|
||
from fastapi import FastAPI | ||
from fastapi.responses import HTMLResponse | ||
from fastapi.staticfiles import StaticFiles | ||
from pymongo import MongoClient | ||
|
||
from db.config import config | ||
from endpoints.outlet_stats import outlet_router | ||
|
||
# Constants | ||
HOST = config["MONGO_HOST"] | ||
PORT = config["MONGO_PORT"] | ||
MONGO_ARGS = config["MONGO_ARGS"] | ||
DB = config["DB_NAME"] | ||
STATIC_PATH = "gender-gap-tracker" | ||
STATIC_HTML = "tracker.html" | ||
|
||
app = FastAPI( | ||
title="Gender Gap Tracker", | ||
description="RESTful API for the Gender Gap Tracker public-facing dashboard", | ||
version="1.0.0", | ||
) | ||
|
||
|
||
@app.get("/", include_in_schema=False) | ||
async def root() -> HTMLResponse: | ||
with open(Path(f"{STATIC_PATH}") / STATIC_HTML, "r") as f: | ||
html_content = f.read() | ||
return HTMLResponse(content=html_content, media_type="text/html") | ||
|
||
|
||
@app.on_event("startup") | ||
def startup_db_client() -> None: | ||
app.mongodb_client = MongoClient(HOST, PORT, **MONGO_ARGS) | ||
app.connection = app.mongodb_client[DB] | ||
print("Successfully connected to MongoDB!") | ||
|
||
|
||
@app.on_event("shutdown") | ||
def shutdown_db_client() -> None: | ||
app.mongodb_client.close() | ||
|
||
|
||
# Attach routes | ||
app.include_router(outlet_router, prefix="/expertWomen", tags=["info"]) | ||
# Add additional routers here for future endpoints | ||
# ... | ||
|
||
# Serve static files for front end from directory specified as STATIC_PATH | ||
app.mount("/", StaticFiles(directory=STATIC_PATH), name="static") | ||
|
||
|
||
if __name__ == "__main__": | ||
import uvicorn | ||
uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True) |
Oops, something went wrong.