Update core README

SAFEHR-data · Jan 29, 2024 · d9a4866 · d9a4866
1 parent 50f96a5
commit d9a4866
Showing 1 changed file with 39 additions and 27 deletions.
diff --git a/pixl_core/README.md b/pixl_core/README.md
@@ -1,60 +1,72 @@
 # Core
 
-This directory contains a Python module with core PIXL functionality utilised by both the EHR and PACS APIs to
-interact with RabbitMQ and ensure suitable rate limiting of requests to the upstream services.
+This module contains the core PIXL functionality utilised by both the EHR and Imaging APIs to
+interact with the RabbitMQ messaging queues and ensure suitable rate limiting of requests to the
+upstream services.
+
+Specifically, it defines:
+
+- The [Token buffer](#token-buffer) for rate limiting requests to the upstream services
+- The [RabbitMQ queue](#patient-queue) implementation shared by the EHR and Imaging APIs
+- The PIXL `postgres` internal database for storing exported images and extracts from the messages
+  processed by the CLI driver
+- The [`ParquetExport`](./src/core/exports.py) class for exporting OMOP and EMAP extracts to parquet files
+- Handling of [uploads over FTPS](./src/core/upload.py), used to transfer images and parquet files
+  to the DSH (Data Safe Haven)
+
+## Installation
 
-### Install
 ```bash
 pip install -e .
 ```
 
-### Test
+## Testing
 
 ```bash
 pip install -e .[test]
-pytest 
+pytest
 ```
 
 ## Token buffer
 
-The token buffer is needed to limit the download rate for images from PAX/VNA. Current specification suggests that a 
-rate limit of five images per second should be sufficient, however that may have to be altered dynamically through 
-command line interaction. 
-
-The current implementation of the token buffer uses the 
-[token bucket implementation from Falconry](https://github.com/falconry/token-bucket/). Furthermore, the token buffer is
-not set up as a service as it is only needed for the image download rate. 
+The token buffer is needed to limit the download rate for images from PAX/VNA. Current specification
+suggests that a rate limit of five images per second should be sufficient, however that may have to
+be altered dynamically through command line interaction.
 
+The current implementation of the token buffer uses the [token bucket implementation from
+Falconry](https://github.com/falconry/token-bucket/). Furthermore, the token buffer is not set up as
+a service as it is only needed for the image download rate.
 
 ## Patient queue
 
-Mechanism that allows driver to populate queues that can then be consumed by different services, e.g. patient data
-or image download.
+We use [RabbitMQ](https://www.rabbitmq.com/) as a message broker to transfer messages between the
+different PIXL services. Currently, we define two queues:
 
-Two queues are currently planned: 
-1. for download and de-identification of image data (default "pacs")
-2. for download and de-identification of EHR demographic data (default "ehr")
+1. `pacs` for downloading and de-identifying images
+2. `ehr` for downloading and de-identifying the EHR data
 
 The image anonymisation will be triggered automatically once the image has been downloaded to the raw Orthanc server.
 
 ### RabbitMQ
 
-RabbitMQ is used for the queue implementation. 
-
-The client of choice for RabbitMQ at this point in time is [pika](https://pika.readthedocs.io/en/stable/), which provides both a synchronous and 
-asynchronous way of transferring messages. The former is geared towards high data throughput whereas the latter is geared towards stability. 
-The asynchronous mode of transferring messages is a lot more complex as it is based on the 
-[asyncio event loop](https://docs.python.org/3/library/asyncio-eventloop.html).
+RabbitMQ is used for the queue implementation.
 
+The client of choice for RabbitMQ at this point in time is
+[pika](https://pika.readthedocs.io/en/stable/), which provides both a synchronous and asynchronous
+way of transferring messages. The former is geared towards high data throughput whereas the latter
+is geared towards stability.  The asynchronous mode of transferring messages is a lot more complex
+as it is based on the [asyncio event loop](https://docs.python.org/3/library/asyncio-eventloop.html).
 
 ### OMOP ES files
 
-Public parquet exports from OMOP ES that should be transferred outside the hospital are copied to the `exports` directory at the repository base.
+Public parquet exports from OMOP ES that should be transferred outside the hospital are copied to
+the `exports` directory at the repository base.
 
-Within this directory each project has a directory, with all extracts stored in `all_extracts` and the `latest` directory
-contains a symlink to the most recent extract. This symlinking means that during the export stage it is clear which export should be sent.
+Within this directory each project has a directory, with all extracts stored in `all_extracts` and
+the `latest` directory contains a symlink to the most recent extract. This symlinking means that
+during the export stage it is clear which export should be sent.
 
-```
+```sh
 └── project-1
     ├── all_extracts
     │     ├── 2020-06-10t18-00-00