Skip to content

Commit

Permalink
Get rid of pixl_config.yml (#311)
Browse files Browse the repository at this point in the history
* Rename `config_from_log_file` -> `project_info` + add docstring

Avoid overuse of the word `config`

* Better variable name for resources path

The path doesn't contain only parquet files and we might use something else in the future.

* Move `APIConfig` class to `cli._config`

Keep the config stuff together

* Get queue API config values form `.env` instead of `pixl_config.yaml`

* Better variable name

* Get service settings from env variables instead of `pixl_config.yml`

Also use better naming: `cli_config` -> `SERVICE_SETTINGS`

* Update sample env file

* Replace more refrences to `cli_config`

* Remove `pixl_config.yml` files

* Build config objects in place

* Add missing env variable

* docs: Remove references to `pixl_config.yml`, describe config with env vars

* Fix `API_CONFIGS` dict keys

* Make API_CONFIG dict in one go

* Remove default values for env variables and update tests

* Rate should be a float, also update docstrings

* Add missing `.env` vars to sample and remove another default

* Add default for `PIXL_QUERY_TIMEOUT` back in

* Make sure `.env` is loaded prior to setting API configs

* Use `decouple` to load local `.env` file

* Load local .env file only if it exists

Avoids import errors during tests

* Revert default query timeout for tests back to 10
  • Loading branch information
milanmlft authored Feb 26, 2024
1 parent 2db6331 commit 00ca23e
Show file tree
Hide file tree
Showing 13 changed files with 161 additions and 175 deletions.
14 changes: 13 additions & 1 deletion .env.sample
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
ENV=dev
DEBUG=True
PIXL_DICOM_TRANSFER_TIMEOUT=120
PIXL_QUERY_TIMEOUT=120

# PIXL PostgreSQL instance
PIXL_DB_HOST=postgres
PIXL_DB_NAME=pixl
PIXL_DB_USER=pixl
PIXL_DB_PASSWORD=
SKIP_ALEMBIC=false

# EMAP UDS
EMAP_UDS_HOST=
Expand All @@ -26,6 +29,15 @@ PIXL_EHR_API_PORT=
RABBITMQ_PORT=
RABBITMQ_ADMIN_PORT=
PIXL_IMAGING_API_PORT=
FTP_PORT=

# PIXL EHR API
PIXL_EHR_API_HOST=localhost
PIXL_EHR_API_RATE=1

# PIXL Imaging API
PIXL_IMAGING_API_HOST=localhost
PIXL_IMAGING_API_RATE=1

# Hasher API
HASHER_API_AZ_CLIENT_ID=
Expand Down Expand Up @@ -77,6 +89,7 @@ PIXL_EHR_API_AZ_STORAGE_CONTAINER_NAME=
PIXL_EHR_COGSTACK_REDACT_URL=

# RABBIT MQ queue. UI available at localhost:$RABBITMQ_ADMIN_PORT
RABBITMQ_HOST=localhost
RABBITMQ_USERNAME=
RABBITMQ_PASSWORD=

Expand All @@ -88,4 +101,3 @@ PIXL_QUERY_TIMEOUT=10
FTP_HOST=
FTP_USER_NAME=
FTP_USER_PASSWORD=
FTP_PORT=
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,10 @@ This is one of `dev|test|staging|prod` and referred to as `<environment>` in the

### 2. Initialise environment configuration

Create a local `.env` and `pixl_config.yml` file in the _PIXL_ directory:
Create a local `.env` file in the _PIXL_ directory:

```bash
cp .env.sample .env && cp pixl_config.yml.sample pixl_config.yml
cp .env.sample .env
```

Add the missing configuration values to the new files:
Expand Down
45 changes: 36 additions & 9 deletions cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,6 @@ stopped cleanly.

`PIXL CLI` requires Python version 3.10.

The CLI requires a `pixl_config.yml` file in the current working directory. A
[sample file](../pixl_config.yml.sample) is provided in the root of the repository. If you want to
run locally during development, we recommend running `pixl` from the [`./tests/`](./tests/)
directory, which contains a mock `pixl_config.yml` file.

Running the tests requires [docker](https://docs.docker.com/get-docker/) to be installed.

## Installation
Expand All @@ -40,6 +35,39 @@ See the commands and subcommands with
```bash
pixl --help
```
### Configuration

The `rabbitmq` and `postgress` services are configured by setting the following environment variables
(default values shown):

```sh
RABBITMQ_HOST=localhost
RABBITMQ_PORT=7008
RABBITMQ_USERNAME=rabbitmq_username
RABBITMQ_PASSWORD=rabbitmq_password

POSTGRES_HOST=localhost
POSTGRES_PORT=7001
PIXL_DB_USER=pixl_db_username
PIXL_DB_PASSWORD=pixl_db_password
PIXL_DB_NAME=pixl
```

The `rabbitmq` queues for the `ehr` and `imaging` APIs are configured by setting:

```sh
PIXL_EHR_API_HOST=localhost
PIXL_EHR_API_PORT=7006
PIXL_EHR_API_RATE=1

PIXL_IMAGING_API_HOST=localhost
PIXL_IMAGING_API_PORT=7007
PIXL_IMAGING_API_RATE=1
```

where the `*_RATE` variables set the default querying rate for the message queues.

### Running the pipeline

Populate queue for Imaging and EHR extraction

Expand Down Expand Up @@ -113,10 +141,9 @@ pip install -e ../pixl_core/ -e .[test]

### Running tests

The CLI tests require a running instance of the `rabbitmq` service, for which we provide a
`docker-compose` [file](./tests/docker-compose.yml). Spinning up the service and running `pytest`
can be done by running
Tests can be run with `pytest` from the `tests` directory.

```bash
./tests/run-tests.sh
cd tests
pytest
```
71 changes: 60 additions & 11 deletions cli/src/pixl_cli/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,70 @@
# See the License for the specific language governing permissions and
# limitations under the License.

"""Configuration of CLI from config file."""
"""Configuration of CLI from environment variables."""

from pathlib import Path

import yaml
from decouple import Config, RepositoryEmpty, RepositoryEnv

env_file = Path.cwd() / ".env"
config = Config(RepositoryEnv(env_file)) if env_file.exists() else Config(RepositoryEmpty())

SERVICE_SETTINGS = {
"rabbitmq": {
"host": config("RABBITMQ_HOST"),
"port": int(config("RABBITMQ_PORT")),
"username": config("RABBITMQ_USERNAME"),
"password": config("RABBITMQ_PASSWORD"),
},
"postgres": {
"host": config("POSTGRES_HOST"),
"port": int(config("POSTGRES_PORT")),
"username": config("PIXL_DB_USER"),
"password": config("PIXL_DB_PASSWORD"),
"database": config("PIXL_DB_NAME"),
},
} # type: dict


class APIConfig:
"""API Configuration"""

def __init__(self, host: str, port: int, default_rate: float = 1) -> None:
"""Initialise the APIConfig class"""
self.host = host
self.port = port
self.default_rate = default_rate

@property
def base_url(self) -> str:
"""Return the base url for the API"""
return f"http://{self.host}:{self.port}"


API_CONFIGS = {
"ehr_api": APIConfig(
host=config("PIXL_EHR_API_HOST"),
port=int(config("PIXL_EHR_API_PORT")),
default_rate=float(config("PIXL_EHR_API_RATE", default=1)),
),
"imaging_api": APIConfig(
host=config("PIXL_IMAGING_API_HOST"),
port=int(config("PIXL_IMAGING_API_PORT")),
default_rate=float(config("PIXL_IMAGING_API_RATE", default=1)),
),
}

def _load_config(filename: str = "pixl_config.yml") -> dict:
"""CLI configuration generated from a .yaml file"""
if not Path(filename).exists():
msg = f"Failed to find {filename}. It must be present in the current working directory"
raise FileNotFoundError(msg)

with Path(filename).open() as config_file:
config_dict = yaml.safe_load(config_file)
return dict(config_dict)
def api_config_for_queue(queue_name: str) -> APIConfig:
"""Configuration for an API associated with a queue"""
api_name = f"{queue_name}_api"

if api_name not in API_CONFIGS:
msg = (
f"Cannot update the rate for {queue_name}. {api_name} was"
f" not specified in the configuration"
)
raise ValueError(msg)

cli_config = _load_config()
return API_CONFIGS[api_name]
4 changes: 2 additions & 2 deletions cli/src/pixl_cli/_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@
from sqlalchemy import URL, create_engine
from sqlalchemy.orm import Session, sessionmaker

from pixl_cli._config import cli_config
from pixl_cli._config import SERVICE_SETTINGS

connection_config = cli_config["postgres"]
connection_config = SERVICE_SETTINGS["postgres"]

url = URL.create(
drivername="postgresql+psycopg2",
Expand Down
14 changes: 9 additions & 5 deletions cli/src/pixl_cli/_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,19 +47,23 @@ def messages_from_state_file(filepath: Path) -> list[Message]:
return [deserialise(line) for line in filepath.open().readlines() if string_is_non_empty(line)]


def config_from_log_file(parquet_path: Path) -> tuple[str, datetime]:
log_file = parquet_path / "extract_summary.json"
def project_info(resources_path: Path) -> tuple[str, datetime]:
"""
Get the project name and extract timestamp from the extract summary log file.
:param resources_path: path to the input resources
"""
log_file = resources_path / "extract_summary.json"
logs = json.load(log_file.open())
project_name = logs["settings"]["cdm_source_name"]
omop_es_timestamp = datetime.fromisoformat(logs["datetime"])
return project_name, omop_es_timestamp


def copy_parquet_return_logfile_fields(parquet_path: Path) -> tuple[str, datetime]:
def copy_parquet_return_logfile_fields(resources_path: Path) -> tuple[str, datetime]:
"""Copy public parquet file to extracts directory, and return fields from logfile"""
project_name, omop_es_timestamp = config_from_log_file(parquet_path)
project_name, omop_es_timestamp = project_info(resources_path)
extract = ParquetExport(project_name, omop_es_timestamp, HOST_EXPORT_ROOT_DIR)
project_name_slug = extract.copy_to_exports(parquet_path)
project_name_slug = extract.copy_to_exports(resources_path)
return project_name_slug, omop_es_timestamp


Expand Down
68 changes: 8 additions & 60 deletions cli/src/pixl_cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@
from core.patient_queue.producer import PixlProducer
from core.patient_queue.subscriber import PixlBlockingConsumer

from pixl_cli._config import cli_config
from pixl_cli._config import SERVICE_SETTINGS, api_config_for_queue
from pixl_cli._database import filter_exported_or_add_to_db
from pixl_cli._io import (
config_from_log_file,
copy_parquet_return_logfile_fields,
messages_from_parquet,
messages_from_state_file,
project_info,
)
from pixl_cli._logging import logger, set_log_level
from pixl_cli._utils import clear_file, remove_file_if_it_exists
Expand Down Expand Up @@ -98,7 +98,7 @@ def populate(parquet_dir: Path, *, restart: bool, queues: str) -> None:
sorted_messages = filter_exported_or_add_to_db(
sorted_messages, messages[0].project_name
)
with PixlProducer(queue_name=queue, **cli_config["rabbitmq"]) as producer:
with PixlProducer(queue_name=queue, **SERVICE_SETTINGS["rabbitmq"]) as producer:
producer.publish(sorted_messages)


Expand All @@ -113,7 +113,7 @@ def extract_radiology_reports(parquet_dir: Path) -> None:
PARQUET_DIR: Directory containing the extract_summary.json
log file defining which extract to export radiology reports for.
"""
project_name, omop_es_datetime = config_from_log_file(parquet_dir)
project_name, omop_es_datetime = project_info(parquet_dir)

# Call the EHR API
api_config = api_config_for_queue("ehr")
Expand Down Expand Up @@ -143,10 +143,9 @@ def extract_radiology_reports(parquet_dir: Path) -> None:
"--rate",
type=float,
default=None,
help="Rate at which to process items from a queue (in items per second)."
"If None then will use the default rate defined in the config file",
help="Rate at which to process items from a queue (in items per second).",
)
def start(queues: str, rate: Optional[int]) -> None:
def start(queues: str, rate: Optional[float]) -> None:
"""Start consumers for a set of queues"""
if rate == 0:
msg = "Cannot start extract with a rate of 0. Must be >0"
Expand Down Expand Up @@ -186,10 +185,7 @@ def _update_extract_rate(queue_name: str, rate: Optional[float]) -> None:

if rate is None:
if api_config.default_rate is None:
msg = (
"Cannot update the rate for %s. No default rate was specified.",
queue_name,
)
msg = f"Cannot update the rate for {queue_name}. No valid rate was specified."
raise ValueError(msg)
rate = float(api_config.default_rate)
logger.info(f"Using the default extract rate of {rate}/second")
Expand Down Expand Up @@ -299,7 +295,7 @@ def consume_all_messages_and_save_csv_file(queue_name: str, timeout_in_seconds:
f"{timeout_in_seconds} seconds"
)

with PixlBlockingConsumer(queue_name=queue_name, **cli_config["rabbitmq"]) as consumer:
with PixlBlockingConsumer(queue_name=queue_name, **SERVICE_SETTINGS["rabbitmq"]) as consumer:
state_filepath = state_filepath_for_queue(queue_name)
if consumer.message_count > 0:
logger.info("Found messages in the queue. Clearing the state file")
Expand All @@ -325,51 +321,3 @@ def inform_user_that_queue_will_be_populated_from(path: Path) -> None: # noqa:
f"state files should be ignored, or delete this file to ignore. Press "
f"Ctrl-C to exit and any key to continue"
)


class APIConfig:
"""
Class to represent the configuration for an API
Attributes
----------
host : str
Hostname for the API
port : int
Port for the API
default_rate : int
Default rate for the API
Methods
-------
base_url()
Return the base url for the API
"""

def __init__(self, kwargs: dict) -> None:
"""Initialise the APIConfig class"""
self.host: Optional[str] = None
self.port: Optional[int] = None
self.default_rate: Optional[int] = None

self.__dict__.update(kwargs)

@property
def base_url(self) -> str:
"""Return the base url for the API"""
return f"http://{self.host}:{self.port}"


def api_config_for_queue(queue_name: str) -> APIConfig:
"""Configuration for an API associated with a queue"""
config_key = f"{queue_name}_api"

if config_key not in cli_config:
msg = (
f"Cannot update the rate for {queue_name}. {config_key} was"
f" not specified in the configuration"
)
raise ValueError(msg)

return APIConfig(cli_config[config_key])
Loading

0 comments on commit 00ca23e

Please sign in to comment.