1. Configure dfe-analytics
Follow the instructions in the README, and set it up: #99
Ask in Slack on the #twd_data_insights
channel for someone to help you
procure a BigQuery instance in the digital.education.gov.uk
Google Cloud
Organisation.
Ask for your @digital.education.gov.uk
Google account to be setup as an owner
via the IAM and Admin settings. Add other team members as necessary.
You also need to set up your BigQuery instance with paid billing. This is
because dfe-analytics
uses streaming, and streaming isn't allowed in the free
tier:
accessDenied: Access Denied: BigQuery BigQuery: Streaming insert is not allowed
in the free tier
You should create separate data sets for each environment (dev/preprod/prod).
- Select the BigQuery instance
- Go to the Analysis -> SQL Workspace section
- Tap on the 3 dots next to the project name, "Create data set"
- Name it
events_ENVIRONMENT
, such asevents_local
for local development testing, and set the location toeurope-west2 (London)
- Select your new
events_local
data set - Create a table
- Name it
events
- Set the schema to match the one below (including the nested fields inside
request_query
anddata
) - Set Partitioning to
occurred_at
- Set Partitioning type to
By day
- Set Clustering order to
event_type
- Click on "Create table"
Tip: You can copy this empty table between environments to save time and not have to do the last few steps over and over.
Field name | Type | Mode |
---|---|---|
occurred_at | TIMESTAMP | REQUIRED |
event_type | STRING | REQUIRED |
environment | STRING | REQUIRED |
namespace | STRING | NULLABLE |
user_id | STRING | NULLABLE |
request_uuid | STRING | NULLABLE |
request_method | STRING | NULLABLE |
request_path | STRING | NULLABLE |
request_user_agent | STRING | NULLABLE |
request_referer | STRING | NULLABLE |
request_query | RECORD | REPEATED |
request_query.key | STRING | REQUIRED |
request_query.value | STRING | REPEATED |
response_content_type | STRING | NULLABLE |
response_status | STRING | NULLABLE |
data | RECORD | REPEATED |
data.key | STRING | REQUIRED |
data.value | STRING | REPEATED |
entity_table_name | STRING | NULLABLE |
event_tags | STRING | REPEATED |
anonymised_user_agent_and_ip | STRING | NULLABLE |
If you edit as text, you can paste this:
[
{
"name": "occurred_at",
"type": "TIMESTAMP",
"mode": "REQUIRED"
},
{
"name": "event_type",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "environment",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "namespace",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "user_id",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "request_uuid",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "request_method",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "request_path",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "request_user_agent",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "request_referer",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "request_query",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "key",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "value",
"type": "STRING",
"mode": "REPEATED"
}
]
},
{
"name": "response_content_type",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "response_status",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "key",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "value",
"type": "STRING",
"mode": "REPEATED"
}
]
},
{
"name": "entity_table_name",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "event_tags",
"type": "STRING",
"mode": "REPEATED"
},
{
"name": "anonymised_user_agent_and_ip",
"type": "STRING",
"mode": "NULLABLE"
}
]
- Go to IAM and Admin settings > Roles
- Click on "+ Create role"
- Create the 3 roles outlined below
Field | Value |
---|---|
Title | BigQuery Analyst Custom |
Description | Assigned to accounts used by performance analysts. |
ID | bigquery_analyst_custom |
Role launch stage | General Availability |
+ Add permissions | See below |
Permissions for bigquery_analyst_custom
bigquery.datasets.get bigquery.datasets.getIamPolicy bigquery.datasets.updateTag bigquery.jobs.create bigquery.jobs.get bigquery.jobs.list bigquery.jobs.listAll bigquery.models.export bigquery.models.getData bigquery.models.getMetadata bigquery.models.list bigquery.routines.get bigquery.routines.list bigquery.savedqueries.create bigquery.savedqueries.delete bigquery.savedqueries.get bigquery.savedqueries.list bigquery.savedqueries.update bigquery.tables.createSnapshot bigquery.tables.export bigquery.tables.get bigquery.tables.getData bigquery.tables.getIamPolicy bigquery.tables.list bigquery.tables.restoreSnapshot resourcemanager.projects.getField | Value |
---|---|
Title | BigQuery Developer Custom |
Description | Assigned to accounts used by developers. |
ID | bigquery_developer_custom |
Role launch stage | General Availability |
+ Add permissions | See below |
Permissions for bigquery_developer_custom
bigquery.connections.create bigquery.connections.delete bigquery.connections.get bigquery.connections.getIamPolicy bigquery.connections.list bigquery.connections.update bigquery.connections.updateTag bigquery.connections.use bigquery.datasets.create bigquery.datasets.delete bigquery.datasets.get bigquery.datasets.getIamPolicy bigquery.datasets.update bigquery.datasets.updateTag bigquery.jobs.create bigquery.jobs.delete bigquery.jobs.get bigquery.jobs.list bigquery.jobs.listAll bigquery.jobs.update bigquery.models.create bigquery.models.delete bigquery.models.export bigquery.models.getData bigquery.models.getMetadata bigquery.models.list bigquery.models.updateData bigquery.models.updateMetadata bigquery.models.updateTag bigquery.routines.create bigquery.routines.delete bigquery.routines.get bigquery.routines.list bigquery.routines.update bigquery.routines.updateTag bigquery.savedqueries.create bigquery.savedqueries.delete bigquery.savedqueries.get bigquery.savedqueries.list bigquery.savedqueries.update bigquery.tables.create bigquery.tables.createSnapshot bigquery.tables.delete bigquery.tables.deleteSnapshot bigquery.tables.export bigquery.tables.get bigquery.tables.getData bigquery.tables.getIamPolicy bigquery.tables.list bigquery.tables.restoreSnapshot bigquery.tables.setCategory bigquery.tables.update bigquery.tables.updateData bigquery.tables.updateTag resourcemanager.projects.getField | Value |
---|---|
Title | BigQuery Appender Custom |
Description | Assigned to accounts used by appenders (apps and scripts). |
ID | bigquery_appender_custom |
Role launch stage | General Availability |
+ Add permissions | See below |
Permissions for bigquery_appender_custom
bigquery.datasets.get bigquery.tables.get bigquery.tables.updateData- Go to IAM and Admin settings > Create service account
- Name it like "Appender NAME_OF_SERVICE ENVIRONMENT", so "Appender Apply Local"
- Add a description, like "Used when developing locally."
- Grant the service account access to the project, use the "BigQuery Appender Custom" role you set up earlier
- Access the service account you previously setup
- Go to the keys tab, click on "Add key > Create new key"
- Create a JSON private key
The full contents of this JSON file is your BIGQUERY_API_JSON_KEY
.
Putting the previous things together, to finish setting up dfe-analytics
, you
need these environment variables:
BIGQUERY_TABLE_NAME=events
BIGQUERY_PROJECT_ID=apply-for-qts-in-england
BIGQUERY_DATASET=events_local
BIGQUERY_API_JSON_KEY=<contents of the JSON, make sure to strip or escape newlines>
Follow the instructions in the README, and set it up: https://github.com/DFE-Digital/dfe-analytics-dataform#how-to-install