Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for new developers #256

Merged
merged 23 commits into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ To get access to the GAE, [see the documentation on Slab](https://uclh.slab.com/

## Services

### [PIXL core](./pixl_core/README.md)

The `core` module contains the functionality shared by the other PIXL modules.

### [PIXL CLI](./cli/README.md)

Primary interface to the PIXL system.
Expand All @@ -20,13 +24,20 @@ Primary interface to the PIXL system.

HTTP API to securely hash an identifier using a key stored in Azure Key Vault.

### [Orthanc Raw](./orthanc/orthanc-raw/README.md)
### [Orthanc](./orthanc/README.md)

#### [Orthanc Raw](./orthanc/orthanc-raw/README.md)

A DICOM node which receives images from the upstream hospital systems and acts as cache for PIXL.

### [Orthanc Anon](./orthanc/orthanc-anon/README.md)
#### [Orthanc Anon](./orthanc/orthanc-anon/README.md)

A DICOM node which wraps our de-identifcation process and uploading of the images to their final
destination.

#### [PIXL DICOM de-identifier](./pixl_dcmd/README.md)

A DICOM node which wraps our de-identifcation and cloud transfer components.
Provides helper functions for de-identifying DICOM data

### PostgreSQL

Expand Down Expand Up @@ -214,6 +225,7 @@ select count(*) from emap_data.ehr_anon where xray_report is not null;
## Develop

See each service's README for instructions for individual developing and testing instructions.
Most modules require [`docker`](https://docs.docker.com/desktop/) and `docker-compose` to be installed to run tests.

For Python development we use [ruff](https://docs.astral.sh/ruff/) alongside [pytest](https://www.pytest.org/).
There is support (sometimes through plugins) for these tools in most IDEs & editors.
Expand Down
67 changes: 60 additions & 7 deletions cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,33 @@ EMAP star database and the PACS image system. Once a set of queues are
populated the consumers can be started, updated and the system extractions
stopped cleanly.

## Prerequisites

`PIXL CLI` requires Python version 3.10.
milanmlft marked this conversation as resolved.
Show resolved Hide resolved

The CLI requires a `pixl_config.yml` file in the current working directory. A [sample
file](../pixl_config.yml.sample) is provided in the root of the repository. For local testing, we
recommend running `pixl` from the [`./tests/`](./tests/) directory, which contains a mock
`pixl_config.yml` file.
milanmlft marked this conversation as resolved.
Show resolved Hide resolved

Running the tests requires [docker](https://docs.docker.com/get-docker/) to be installed.

## Installation

```bash
pip install -e ../pixl_core/ .
```
We recommend installing in a project specific virtual environment created using a environment
management tool such as [conda](https://docs.conda.io/en/latest/) or [virtualenv](https://virtualenv.pypa.io/en/latest/).

## Test
Then install in editable mode by running

```bash
./tests/run-tests.sh
pip install -e ../pixl_core/ .
```

## Usage

> **Note**
> Services must be started prior to using the CLI
> The `rabbitmq`, `ehr-api` and `pacs-api` services must be started prior to using the CLI
milanmlft marked this conversation as resolved.
Show resolved Hide resolved
> This is typically done by spinning up the necessary Docker containers through `docker compose`.
milanmlft marked this conversation as resolved.
Show resolved Hide resolved

See the commands and subcommands with

Expand All @@ -47,7 +58,7 @@ parquet_dir
└── PROCEDURE_OCCURRENCE.parquet
```

Start the PACS extraction
Start the imaging extraction

```bash
pixl start --queues pacs
Expand All @@ -66,3 +77,45 @@ Stop PACS and EHR database extraction
```bash
pixl stop
```

## Development

The CLI is created using [click](https://click.palletsprojects.com/en/8.0.x/), and curently provides
the following commands:

```sh
$ pixl --help
Usage: pixl [OPTIONS] COMMAND [ARGS]...

PIXL command line interface

Options:
--debug / --no-debug
--help Show this message and exit.

Commands:
az-copy-ehr Copy the EHR data to azure
extract-radiology-reports Export processed radiology reports to...
kill Stop all the PIXL services
populate Populate a (set of) queue(s) from a parquet...
start Start consumers for a set of queues
status Get the status of the PIXL consumers
stop Stop extracting images and/or EHR data.
update Update one or a list of consumers with a...
```

Install locally in editable mode with the development and testing dependencies by running

```bash
pip install -e ../pixl_core/[test] .[test]
```

### Running tests

The CLI tests require a running instance of the `rabbitmq` service, for which we provide a
`docker-compose` [file](./tests/docker-compose.yml). Spinning up the service and running `pytest`
can be done by running

```bash
./tests/run-tests.sh
```
2 changes: 1 addition & 1 deletion cli/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ authors = [
]
description = "PIXL command line interface"
readme = "README.md"
requires-python = ">=3.10"
requires-python = "<3.11"
milanmlft marked this conversation as resolved.
Show resolved Hide resolved
classifiers = [
"Programming Language :: Python :: 3"
]
Expand Down
100 changes: 64 additions & 36 deletions hasher/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,64 @@
# Hasher API

_The secure hashing service_.

_The secure hashing service_
This package provides a _FastAPI_ service that can be used to generate secure hashes.
It is used by the [PIXL EHR API](../pixl_ehr/README.md) (for EHR anonymisation) and [PIXL Orthanc
Anon](../orthanc_anon/README.md) (for DICOM image anonymisation) services.

## Local development

### Dependencies

It is assumed you have a Python virtual environment configured using a tool like Conda or pyenv.
Install the dependencies from inside the _PIXL/hasher/src_ directory:

```bash
pip install -e .
```

### Setup

Create a _local.env_ file in _PIXL.hasher/src/hasher_ from _local.env.sample_ in the same location.
Use the credentials stored in the `Hasher API dev secrets` note in LastPass to populate the
environment variables.
Set `LOG_ROOT_DIR` to anywhere convenient.

### Run

from the _PIXL/hasher/src_ directory:

```bash
uvicorn hasher.main:app --host=0.0.0.0 --port=8000 --reload
```

### Test

From this directory:

```bash
pytest
```

to skip linting and run only the last failed test.
milanmlft marked this conversation as resolved.
Show resolved Hide resolved

----

<details><summary>Azure setup</summary>

## Azure setup

_This is done for the _UCLH_DIF_ `dev` tenancy, will need to be done once in the _UCLHAZ_ `prod` tenancy when ready to deploy to production._

An Azure Key Vault is required to hold the secret key used in the
hashing process. This Key Vault and secret must persist any infrastructure changes so
should be separate from disposable infrastructure services.
hashing process. This Key Vault and secret must persist any infrastructure changes so
should be separate from disposable infrastructure services.
ServicePrincipal is required to connect to the Key Vault.

The application uses the ServicePrincipal and password to authenticates with Azure via
The application uses the ServicePrincipal and password to authenticates with Azure via
environment variables. See [here](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential?view=azure-python) for more info.

The Key Vault and ServicePrincipal have already been created for the `dev` environment and details are stored in
The Key Vault and ServicePrincipal have already been created for the `dev` environment and details are stored in
the `Hasher API dev secrets` note in the shared FlowEHR folder on LastPass.

The process for doing so using the `az` CLI tool is described below.
Expand All @@ -23,17 +67,23 @@ This can be converted into a Terraform template but given that we need a single,
This process must be repeated for `staging` & `prod` environments.

### Step 1

Create the Azure Key Vault in an appropriate resource group:

```bash
az keyvault create --resource-group <resource-group-name> --name <key-vault-name> --location "UKSouth"
```

### Step 2

Create Service Principal & grant access as per

```bash
az ad sp create-for-rbac -n hasher-api --skip-assignment
```

This will produce the following output

```json
{
"appId": "<generated-app-ID>",
Expand All @@ -46,26 +96,34 @@ This will produce the following output
```

### Step 3

Assign correct permissions to the newly created ServicePrincipal

```bash
az keyvault set-policy --name <key-vault-name> --spn <generated-app-ID> --secret-permissions backup delete get list set
```

### Step 4

Create a secret and store in the Key Vault

Use Python to create a secret:

```python
import secrets
secrets.token_urlsafe(32)
```

copy the secret and paste as <secret-value> below

```bash
az keyvault secret set --vault-name "<key-vault-name>" --name "<secret-name>" --value "<secret-value>"
```

### Step 5

Save credentials in `.env` and a LastPass `Hasher API <environment> secrets` note.

```
HASHER_API_AZ_CLIENT_ID=<generated-app-ID>
HASHER_API_AZ_CLIENT_PASSWORD=<generated-password>
Expand All @@ -74,34 +132,4 @@ HASHER_API_AZ_KEY_VAULT_NAME=<key-vault-name>
HASHER_API_AZ_KEY_VAULT_SECRET_NAME=<secret-name>
```


----


## Local development
### Dependencies
It is assumed you have a Python virtual environment configured using a tool like Conda or pyenv.
Install the dependencies from inside the _PIXL/hasher/src_ directory:
```bash
pip install -e .
```

### Setup
Create a _local.env_ file in _PIXL.hasher/src/hasher_ from _local.env.sample_ in the same location.
Use the credentials stored in the `Hasher API dev secrets` note in LastPass to populate the environment variables.
Set `LOG_ROOT_DIR` to anywhere convenient.

### Run
from the _PIXL/hasher/src_ directory:
```bash
uvicorn hasher.main:app --host=0.0.0.0 --port=8000 --reload
```

### Test
From this directory:
```bash
pytest
```
to skip linting and run only the last failed test.

----
</details>
11 changes: 11 additions & 0 deletions orthanc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# ORTHANC instances

PIXL defines 2 types of ORTHANC instances:

- `orthanc-raw`: This instance is used to store the raw DICOM files, acting as a cache before
transfering the images to the `orthanc-anon` instance
- `orthanc-anon`: This instance is used to de-identify the DICOM images and upload them to their
final destination

For both instances we define a plugin in `orthanc-*/plugin/pixl.py` that implements the custom
functionality .
Loading
Loading