The site is being developed for Bradford 2025 by Open Innovations, and this guide is primarily aimed at OI developers working on adding or updating new features.
Source data should not be checked in to this repository, if at all possible. The primary source of the data is a private GitHub repository in the Bradford 2025 organisation. This is currently only visible to approved users, as there is a possibility of sensitive data being leaked during development. This risk extends to data imported into this repository, hence the principle.
During development of the visualisation, we should also not check in derived data, as prior to formal publication approval, data may be redacted. This would mean that we'd need to remove history, and it's easier not to check in in first place.
Python dependencies are documented in a pipenv
Pipfile
. To set up an
environment, first ensure that pipenv
is installed on your development
machine. Once this is done, you can run the pipenv install
to set up the
environment. Periodically, particularly if others have added new dependencies,
you should run pipenv sync
to keep your environment up to spec. To start a
shell in the environment, run pipenv shell
.
The data provisioning requires you to have a working GitHub SSH configuration, and to be added to the relevant Bradford 2025 organisation.
To add SSH keys, follow the guidance at Adding a new SSH key to your GitHub account on the GitHub documentation. Note that if you're using Windows, and develop in WSL, you will need to set up the key to be accessible in that subsystem rather than the main Windows host.
It might also be helpful to set up an
SSH config file which
associates the key with GitHub. A simple config
file to achieve this would
look like this:
Host github.com
IdentityFile _Path_To_SSH_Private_Key_
Data is provisioned using DVC commands. The source data has been imported from
the Bradford 2025 data pipelines repository using two dvc import
commands. It
should not be necessary to add further imports, as each of the current ones will
mirror the entire tree to data/published
and data/metadata
.
This stores two .dvc
files in the data
directory, each containing a
reference to a specific commit in the data repository.
To get the currently referenced data, you can run a dvc pull
command. This
will not get the latest data from the remote repository, but will fetch the data
contained in the commit (md5 hash) referenced in the .dvc
file.
dvc pull data/published.dvc data/metadata.dvc
If you wish to update to the latest available data, you can run dvc update
.
This will also update the .dvc
files with the latest available commit
consistent with the rev
(i.e. branch name) specified in that file (or the
default branch if that's not specified).
dvc update data/published.dvc data/metadata.dvc
These commands will need to be run in an environment where dvc is available. The
easiest way to do this is to start a pipenv shell
.
These commands have been encapsulated as Deno tasks in deno.json
, and can be
run as follows:
Get the currently referenced data.
deno task data:pull
Update to the latest available data.
deno task data:update
Once the data is downloaded, you'll need to run some pipelines to update the
data used for visualisations. Again, this is managed via dvc
.
dvc repro pipelines/dvc.yaml
This will run any pipelines that have changed dependencies (data).
This is also encapsualed in the deno task data:pipeline
task.
Development should be performed against the dev
branch, and only merged into
main
when it has been confirmed.
It may be helpful for new visualisations to work on on a separate branch, named
something like feature/<theme>
(e.g feature/volunteers
), particularly if
there are other people working on the code. These should be merged back into
dev
once ready for review with the client.
Care needs to be taken when building new pages to protect against data leakage.
Working against the dev
branch will mean that the published site does not
include data before it's been approved. The following guidelines will help
ensure that no leakage takes place.
- Any new pages should include the
draft: true
data either in frontmatter or in a_data.*
file. This will prevent the page being generated unless theLUME_DRAFTS=true
environment variable is set. Runningdeno task dev
will set this environment variable. - Until such time as the data release is approved, no data derived from the
source data (via pipelines) should be published. The easy way to do this is
to create a
.gitignore
file in the same folder as the page with the a general exclusion of/_data/*
. Exceptions can be added if needed. Once the data is approved for publication, this file can be removed.