Skip to content

Commit

Permalink
Update core README
Browse files Browse the repository at this point in the history
  • Loading branch information
milanmlft committed Jan 29, 2024
1 parent 50f96a5 commit d9a4866
Showing 1 changed file with 39 additions and 27 deletions.
66 changes: 39 additions & 27 deletions pixl_core/README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,72 @@
# Core

This directory contains a Python module with core PIXL functionality utilised by both the EHR and PACS APIs to
interact with RabbitMQ and ensure suitable rate limiting of requests to the upstream services.
This module contains the core PIXL functionality utilised by both the EHR and Imaging APIs to
interact with the RabbitMQ messaging queues and ensure suitable rate limiting of requests to the
upstream services.

Specifically, it defines:

- The [Token buffer](#token-buffer) for rate limiting requests to the upstream services
- The [RabbitMQ queue](#patient-queue) implementation shared by the EHR and Imaging APIs
- The PIXL `postgres` internal database for storing exported images and extracts from the messages
processed by the CLI driver
- The [`ParquetExport`](./src/core/exports.py) class for exporting OMOP and EMAP extracts to parquet files
- Handling of [uploads over FTPS](./src/core/upload.py), used to transfer images and parquet files
to the DSH (Data Safe Haven)

## Installation

### Install
```bash
pip install -e .
```

### Test
## Testing

```bash
pip install -e .[test]
pytest
pytest
```

## Token buffer

The token buffer is needed to limit the download rate for images from PAX/VNA. Current specification suggests that a
rate limit of five images per second should be sufficient, however that may have to be altered dynamically through
command line interaction.

The current implementation of the token buffer uses the
[token bucket implementation from Falconry](https://github.com/falconry/token-bucket/). Furthermore, the token buffer is
not set up as a service as it is only needed for the image download rate.
The token buffer is needed to limit the download rate for images from PAX/VNA. Current specification
suggests that a rate limit of five images per second should be sufficient, however that may have to
be altered dynamically through command line interaction.

The current implementation of the token buffer uses the [token bucket implementation from
Falconry](https://github.com/falconry/token-bucket/). Furthermore, the token buffer is not set up as
a service as it is only needed for the image download rate.

## Patient queue

Mechanism that allows driver to populate queues that can then be consumed by different services, e.g. patient data
or image download.
We use [RabbitMQ](https://www.rabbitmq.com/) as a message broker to transfer messages between the
different PIXL services. Currently, we define two queues:

Two queues are currently planned:
1. for download and de-identification of image data (default "pacs")
2. for download and de-identification of EHR demographic data (default "ehr")
1. `pacs` for downloading and de-identifying images
2. `ehr` for downloading and de-identifying the EHR data

The image anonymisation will be triggered automatically once the image has been downloaded to the raw Orthanc server.

### RabbitMQ

RabbitMQ is used for the queue implementation.

The client of choice for RabbitMQ at this point in time is [pika](https://pika.readthedocs.io/en/stable/), which provides both a synchronous and
asynchronous way of transferring messages. The former is geared towards high data throughput whereas the latter is geared towards stability.
The asynchronous mode of transferring messages is a lot more complex as it is based on the
[asyncio event loop](https://docs.python.org/3/library/asyncio-eventloop.html).
RabbitMQ is used for the queue implementation.

The client of choice for RabbitMQ at this point in time is
[pika](https://pika.readthedocs.io/en/stable/), which provides both a synchronous and asynchronous
way of transferring messages. The former is geared towards high data throughput whereas the latter
is geared towards stability. The asynchronous mode of transferring messages is a lot more complex
as it is based on the [asyncio event loop](https://docs.python.org/3/library/asyncio-eventloop.html).

### OMOP ES files

Public parquet exports from OMOP ES that should be transferred outside the hospital are copied to the `exports` directory at the repository base.
Public parquet exports from OMOP ES that should be transferred outside the hospital are copied to
the `exports` directory at the repository base.

Within this directory each project has a directory, with all extracts stored in `all_extracts` and the `latest` directory
contains a symlink to the most recent extract. This symlinking means that during the export stage it is clear which export should be sent.
Within this directory each project has a directory, with all extracts stored in `all_extracts` and
the `latest` directory contains a symlink to the most recent extract. This symlinking means that
during the export stage it is clear which export should be sent.

```
```sh
└── project-1
├── all_extracts
│ ├── 2020-06-10t18-00-00
Expand Down

0 comments on commit d9a4866

Please sign in to comment.