Contributors: Nat Hillard, Steve Leibman, Chris Pimlott, Thad Batt, Michelle Park, Janell Schafer, Kelly Taylor
This repository was created by the Colorado Digital Service to automatically consolidate analytics for the Exposure Notifications system. Exposure Notifications is a service created by Apple and Google to notify people who may have been exposed to someone who tested positive for COVID-19. To inform and improve the service, this repository provides a general purpose framework for states and other organizations to automatically fetch Exposure Notifications metrics for their jurisdiction across a set of data sources, aggregate these metrics into a single database, and write consolidated metrics into a viewable Google sheet. Please note that we are currently not accepting pull requests.
The code here is written in python, and makes use of AWS Step Functions to orchestrate work on a scheduled basis with predefined triggers and dependencies. It was initially designed to pull data from the following sources:
- An API for an Association of Public Health Laboratories (APHL) key server, with statistics for the number of Exposure Notification (EN) App "codes issued" and "codes claimed".
- A Google Cloud storage bucket with information from Google Play on Android EN App adoption
- A Google Cloud storage bucket with information on iOS adoption of EN functionality
After applying filters and transformations, the initial implementation pushed results to:
- A centralized Google BigQuery database, and
- A Google Sheets spreadsheet
In the actual deployment for the initial use case, the approach was updated to bypass this toolset for a subset of the above sources and destinations, in cases where native tools were available. The steps moving from Google Cloud Storage to BigQuery can be accomplished using BigQuery's Data Transfers feature, and the act of pulling data from BigQuery to Google Sheets can be done using Google Sheets Data Connectors (note that this feature is only available for Sheets users inside a GSuite organization, and only if the org has the tool enabled). As of mid-April, 2021, The initial users continue to use the code provided in this repository for the task of pulling from third party APIs and pushing to BigQuery.
The application is structured as a collection of smaller functions that can be invoked as independent standalone steps (for development and testing) or orchestrated as an AWS Step Functions State Machine, using AWS SAM.
Current steps consist of the following:
This code uses the Google-provided Exposure Notifications Verification Server stats
API (documented here) to retrieve statistics about issued and claimed codes.
It currently uses the realm
statistics, defined above as:
Daily statistics for the realm, including codes issued, codes claimed, and daily active users (if enabled).
- Install
pip
if you have not already:curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py python get-pip.py
- Install the AWS SAM CLI using the instructions at https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html On Mac OSX if you already have Homebrew, this would be:
Otherwise:
brew tap aws/tap brew install aws-sam-cli
wget https://github.com/aws/aws-sam-cli/releases/latest/download/aws-sam-cli-linux-x86_64.zip unzip aws-sam-cli-linux-x86_64.zip -d sam-installation sudo ./sam-installation/install sam --version
- Install Poetry using the instructions at https://python-poetry.org/docs/ . On Mac OSX, this will be:
Otherwise:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python - source $HOME/.poetry/env
pip3 install poetry
- Install Docker. See: https://docs.docker.com/get-docker/
sudo yum update -y sudo amazon-linux-extras install docker sudo service docker start sudo usermod -a -G docker ec2-user docker ps
- Build requirements files and containers:
cd <directory_with_this_README> make
Instructions for building the software, testing locally, doing interactive local runs, and deploying to production can be found in functions/README.md
-
Either obtain an existing API key for pulling data from the APHL server at encv.org, or create a new API key on the encv site here:
User Menu -> API keys -> plus
- Select type
Stats (can view statistics)
Note: Admin type keys are not able to pull stats.
-
You'll need a file named
.env
with the following variables set, or an environment that has them set, at the top level of this directory:ENCV_API_KEY=xxxxxxxxxxx LOGLEVEL=xxxxxxxxxxx
ENCV_API_KEY
: The API key for encv.orgLOGLEVEL
: [optional] one ofERROR
,INFO
, orDEBUG
(default isINFO
)
-
Additionally, for the tests that update google sheets, you can add a file named
.env
in the<top_level>/functions/json_to_sheets/tests
directory with the variableEXTRA_USER
set. This can be necessary if debugging sheets-related tests, because by default sheets created by a service account are not visible to other accounts. If you setEXTRA_USER
, this user will also be granted read permissons and can view the sheet created by the test by going to https://drive.google.com/drive/u/0/shared-with-me . -
A file named
service.json
needs to be saved as<top_level>/functions/json_to_sheets/service.json
. This should contain the credentials for the service account associated with the drive/sheets API, and can downloaded from the Google console by: 1. Getting added as anowner
to the associated Google Developer Console project. 2. going to IAM Admin in the Google console here: https://console.developers.google.com/iam-admin/serviceaccounts , then selecting your project, and going to“…” -> “Create key”
, or by using an already-generated key (you can only generate a given key once).
-
Rate limiting We've seen the permission-setting portion of the test code cause issues with rate limiting. If you get a message like the following:
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/xxx/permissions?alt=json returned "Rate limit exceeded. User message: "Sorry, you have exceeded your sharing quota."". Details: "Rate limit exceeded. User message: "Sorry, you have exceeded your sharing quota."">
You can remove the
EXTRA_USER
entry fromtests/.env
and the tests should run without trying to set permissions. -
Date Formats The date format on your target sheet may trip up the tests. If you run into problems, double check
source_date_format
anddestination_date_format
and make sure they match expectations.