Setting up and using ricu
to access the publicly available ICU database
AmsterdamUMCdb of
Amsterdam UMC.
With release of ricu
version 0.2.0, this repository has become
obsolete and only serves for illustration purposes on how to set up a
new data source with ricu
.
In order to use the AmsterdamUMCdb as ricu
-external dataset, some
configuration is necessary to be set up before loading ricu
. A
configuration file data-sources.json
provides the necessary info on
the involved tables and concept-dict.json
is required for loading
ricu
clinical concepts from the new dataset. Finally some
dataset-specific implementations of certain S3 generic functions
exported from ricu
are required (see r/ricu.R
) as might be some
callback functions used in concept specification (see r/callback.R
).
This repository contains configuration files and required functions for
setting up AmsterdamUMCdb in two different ways: aumc_ext
creates an
ricu
-external version of AmsterdamUMCdb with ID types and column
defaults as implemented in the ricu
-internal version of
AmsterdamUMCdb, whereas aumc_min
provides a minimal version of
AmsterdamUMCdb with only a single ID type and no per-table column
defaults, thereby significantly simplifying initial setup.
Several environment variables can be set for ricu
to facilitate the
integration of AmsterdamUMCdb as external dataset:
RICU_DATA_PATH
: (optionally) point this to a directory containing a folderaumc_ext
which holds the AmsterdamUMCdb dataRICU_CONFIG_PATH
: point this to the./config
directory of this projectRICU_SRC_LOAD
: Comma separated list of data sources to automatically load when attachingricu
(e.g.mimic,mimic_demo,aumc_ext
)
For now, we leave RICU_DATA_PATH
at default and set the other
environment variables accordingly before loading ricu
sources <- c("mimic_demo", "eicu_demo", "aumc", "aumc_ext", "aumc_min")
Sys.setenv(RICU_SRC_LOAD = paste(sources, collapse = ","),
RICU_CONFIG_PATH = "config")
library(ricu)
#>
#> ── ricu 0.4.1.9000 ─────────────────────────────────────────────────────────────────────────────────
#>
#> The following data sources are configured to be attached:
#> (the environment variable `RICU_SRC_LOAD` controls this)
#>
#> ✔ mimic_demo: 25 of 25 tables available
#> ✔ eicu_demo: 31 of 31 tables available
#> ✔ aumc: 7 of 7 tables available
#> ✔ aumc_ext: 7 of 7 tables available
#> ✔ aumc_min: 7 of 7 tables available
#>
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
for (file in file.path("r", c("ricu.R", "callback.R"))) {
source(file)
}
As aumc
is already fully supported by ricu
, all configuration in
this repo has been renamed to set up the AmsterdamUMCdb data as external
dataset aumc_ext
.
In order to download and set up the data as aumc_ext
, but using ricu
functionality for downloading and importing aumc
, run:
ext_dir <- src_data_dir("aumc_ext")
download_src("aumc", ext_dir)
import_src("aumc", ext_dir)
attach_src("aumc_ext")
file.symlink(ext_dir, src_data_dir("aumc_min"))
This mimics manual download and conversion to .fst
files (as one might
do for setting up a new data source manually) and mimimizes the amount
of configuration information during set-up (compare aumc_ext
of
data-sources.json
provided by this repo with aumc
of the ricu
provided data-sources.json
.)
Upon successful set-up, data can be loaded from aumc_ext
as:
aumc_ext$processitems
#> # <src_tbl>: [256,715 ✖ 6]
#> # ID options: patientid (patient) < admissionid (icustay)
#> # Defaults: `start` (index), `item` (val)
#> # Time vars: `start`, `stop`
#> admissionid itemid item start stop duration
#> <int> <int> <chr> <dbl> <dbl> <int>
#> 1 0 9159 Arterielijn Radialis 20520000 148800000 2138
#> 2 0 9166 Trilumen Jugularis 20520000 148800000 2138
#> 3 0 9174 Swan Ganz Jugularis 20520000 139020000 1975
#> 4 0 9328 Beademen 20520000 69120000 810
#> 5 0 9399 Wonddrain 1 20520000 94320000 1230
#> …
#> 256,711 23551 9422 Perifeer infuus 3900000 165660000 2696
#> 256,712 23551 12634 Tube 3900000 165660000 2696
#> 256,713 23551 13009 Parenchymdrain 3960000 165660000 2695
#> 256,714 23552 9422 Perifeer infuus 60000 64980000 1082
#> 256,715 23552 9422 Perifeer infuus 60000 73320000 1221
#> # … with 256,705 more rows
load_ts(aumc_ext$processitems, itemid == 9159, id_var = "patientid")
load_ts(aumc_ext$processitems, itemid == 9159, id_var = "admissionid")
gluc <- concept("gluc", unit = "mmol/l",
item("aumc_ext", "numericitems", "itemid", list(c(9947L, 6833L, 9557L)))
)
load_concepts(gluc, id_type = "patient", verbose = FALSE)
#> # A `ts_tbl`: 587,791 ✖ 3
#> # Id var: `patientid`
#> # Units: `gluc` [mmol/l]
#> # Index var: `measuredat` (1 hours)
#> patientid measuredat gluc
#> <int> <drtn> <dbl>
#> 1 0 -271 hours 6.40
#> 2 0 -249 hours 7
#> 3 0 5 hours 11.4
#> 4 0 6 hours 10.7
#> 5 0 9 hours 9.60
#> …
#> 587,787 20325 38 hours 5.30
#> 587,788 20325 39 hours 6.40
#> 587,789 20325 41 hours 5.70
#> 587,790 20325 42 hours 5.40
#> 587,791 20326 13 hours 4.40
#> # … with 587,781 more rows
load_concepts(gluc, id_type = "icustay", verbose = FALSE)
#> # A `ts_tbl`: 771,720 ✖ 3
#> # Id var: `admissionid`
#> # Units: `gluc` [mmol/l]
#> # Index var: `measuredat` (1 hours)
#> admissionid measuredat gluc
#> <int> <drtn> <dbl>
#> 1 0 -271 hours 6.40
#> 2 0 -249 hours 7
#> 3 0 5 hours 11.4
#> 4 0 6 hours 10.7
#> 5 0 9 hours 9.60
#> …
#> 771,716 23551 38 hours 5.30
#> 771,717 23551 39 hours 6.40
#> 771,718 23551 41 hours 5.70
#> 771,719 23551 42 hours 5.40
#> 771,720 23552 13 hours 4.40
#> # … with 771,710 more rows
concept_availability(concepts = c("glu", "alb", "weight"))
#> aumc aumc_ext aumc_min eicu_demo mimic_demo
#> glu TRUE TRUE TRUE TRUE TRUE
#> alb TRUE TRUE TRUE TRUE TRUE
#> weight TRUE FALSE FALSE TRUE TRUE
load_concepts(c("glu", "alb"), "aumc_ext", verbose = FALSE)
#> # A `ts_tbl`: 780,378 ✖ 4
#> # Id var: `admissionid`
#> # Units: `glu` [mg/dL], `alb` [g/dL]
#> # Index var: `measuredat` (1 hours)
#> admissionid measuredat glu alb
#> <int> <drtn> <dbl> <dbl>
#> 1 0 -271 hours 115. NA
#> 2 0 -249 hours 126. NA
#> 3 0 5 hours 205. NA
#> 4 0 6 hours 193. 2.2
#> 5 0 9 hours 173. NA
#> …
#> 780,374 23551 38 hours 95.5 2
#> 780,375 23551 39 hours 115. NA
#> 780,376 23551 41 hours 103. NA
#> 780,377 23551 42 hours 97.3 NA
#> 780,378 23552 13 hours 79.3 NA
#> # … with 780,368 more rows
load_concepts(c("alb", "weight"), "aumc_ext", verbose = FALSE)
#> # A `ts_tbl`: 106,139 ✖ 4
#> # Id var: `admissionid`
#> # Units: `alb` [g/dL], `weight` [kg]
#> # Index var: `measuredat` (1 hours)
#> admissionid measuredat alb weight
#> <int> <drtn> <dbl> <dbl>
#> 1 0 6 hours 2.2 NA
#> 2 1 0 hours 2.9 NA
#> 3 2 0 hours 2.8 NA
#> 4 3 2 hours 2.8 NA
#> 5 4 41 hours 2.5 NA
#> …
#> 106,135 23549 9 hours 2.6 NA
#> 106,136 23550 2 hours 2.2 NA
#> 106,137 23551 1 hours 2.6 NA
#> 106,138 23551 14 hours 2.3 NA
#> 106,139 23551 38 hours 2 NA
#> # … with 106,129 more rows
and from aumc_min
as
aumc_min$processitems
#> # <src_tbl>: [256,715 ✖ 6]
#> admissionid itemid item start stop duration
#> <int> <int> <chr> <dbl> <dbl> <int>
#> 1 0 9159 Arterielijn Radialis 20520000 148800000 2138
#> 2 0 9166 Trilumen Jugularis 20520000 148800000 2138
#> 3 0 9174 Swan Ganz Jugularis 20520000 139020000 1975
#> 4 0 9328 Beademen 20520000 69120000 810
#> 5 0 9399 Wonddrain 1 20520000 94320000 1230
#> …
#> 256,711 23551 9422 Perifeer infuus 3900000 165660000 2696
#> 256,712 23551 12634 Tube 3900000 165660000 2696
#> 256,713 23551 13009 Parenchymdrain 3960000 165660000 2695
#> 256,714 23552 9422 Perifeer infuus 60000 64980000 1082
#> 256,715 23552 9422 Perifeer infuus 60000 73320000 1221
#> # … with 256,705 more rows
load_ts(aumc_min$processitems, itemid == 9159, id_var = "admissionid",
index_var = "start", time_vars = c("start", "stop"))
gluc <- concept("gluc", unit = "mmol/l",
item("aumc_min", "numericitems", "itemid", list(c(9947L, 6833L, 9557L)),
index_var = "measuredat", val_var = "value", unit_var = "unit",
time_vars = list(c("measuredat", "registeredat", "updatedat")))
)
load_concepts(gluc, id_type = "icustay", verbose = FALSE)
#> # A `ts_tbl`: 771,720 ✖ 3
#> # Id var: `admissionid`
#> # Units: `gluc` [mmol/l]
#> # Index var: `index_var` (1 hours)
#> admissionid index_var gluc
#> <int> <drtn> <dbl>
#> 1 0 -271 hours 6.40
#> 2 0 -249 hours 7
#> 3 0 5 hours 11.4
#> 4 0 6 hours 10.7
#> 5 0 9 hours 9.60
#> …
#> 771,716 23551 38 hours 5.30
#> 771,717 23551 39 hours 6.40
#> 771,718 23551 41 hours 5.70
#> 771,719 23551 42 hours 5.40
#> 771,720 23552 13 hours 4.40
#> # … with 771,710 more rows
load_concepts(c("glu", "alb"), "aumc_min", verbose = FALSE)
#> # A `ts_tbl`: 780,378 ✖ 4
#> # Id var: `admissionid`
#> # Units: `glu` [mg/dL], `alb` [g/dL]
#> # Index var: `index_var` (1 hours)
#> admissionid index_var glu alb
#> <int> <drtn> <dbl> <dbl>
#> 1 0 -271 hours 115. NA
#> 2 0 -249 hours 126. NA
#> 3 0 5 hours 205. NA
#> 4 0 6 hours 193. 2.2
#> 5 0 9 hours 173. NA
#> …
#> 780,374 23551 38 hours 95.5 2
#> 780,375 23551 39 hours 115. NA
#> 780,376 23551 41 hours 103. NA
#> 780,377 23551 42 hours 97.3 NA
#> 780,378 23552 13 hours 79.3 NA
#> # … with 780,368 more rows
Note that for aumc_min
, data can only be queried using the icustay
ID type and calls to load_ts()
and related functions, as well as
instantiation of data items is required to be more verbose, as
corresponding default values are not available from configuration info.
Similarly, aumc_min
items in concept-dict.json
also repeatedly
require this information.