ScottishLabData - A pipeline for harmonising NHS Scotland laboratory data to enable national-level analyses

Background

In Scotland, laboratory data is captured, stored and reported from local health board systems with significant heterogeneity. For researchers or other users, working on laboratory datasets across regional cohorts requires effort and time.

Requirements

The over-arching requirement is to develop an open-source software pipeline to generate a harmonised laboratory dataset.

Functional

Some additional functional requirements are:

Merging different data tables from the database into the same structure table.
Standardising the censored results.
Changing all test codes to read codes.
Based on the test frequency, focusing on the top 100 recorded Read Codes of each Safe Haven (180 unique Read Codes).
Standardising test unit, change test with the same code into having the same unit
Technical validation of the harmonised and cleaned data by visualisation

Non-Functional

Two non-functional requirements were also identified:

Operate in an environment where compute resources may be limited.
Handling the big scale of the data, utilising tools/methods for best performance

Description

We completed the required data governance applications and extracted laboratory tests (biochemistry, haematology, immunology, virology, pathology and microbiology) from all registered SHARE participants who were residents from 01/01/2015 to 31/12/2021 (or until death) in the region of each of the four regional Safe Havens.

The data was provided as flat files from the Safe Havens to HIC and loaded into a Microsoft Structured Query Language (SQL) Server database (version 16.0.1) for ease of handling the large data volumes. Using WhiteRabbit, sampling one million rows in each flat file, we tested the data types, completeness (fraction empty) and consistency (N unique and fraction unique) of each Safe Haven’s laboratory results. We used the Rabbit-In-a-Hat software to read and display WhiteRabbit scan results. Rabbit-In-a-Hat allows a user to connect source data to tables and columns within the target data structure through a graphical user interface. Based on SQL skeleton auto-generated by Rabbit-In-a-Hat, further editing of the mapping was made based on consulting local Safe Havens to establish the final SQL script for structural harmonisation.

Data exploration and harmonisation were conducted using a mixture of SQL and R (version 4.1.3).

This repository

Repository sub-directories:

SQL code for table structure/ SQL code for harmonising the structure of data, standardising the censored results as well as changing all text codes to read codes
R code for data harmonisation/ R scripts to select the 180 codes, standardising the test unit as well as visualisation.
Paper Appendix/ Appendix for published paper related to this work

Running SQL scripts

Tables been loaded into the database:

Tables Name	Description
dbo.DaSH	DaSH all data
dbo.SCI_Store_20220802	Glasgow all data
dbo.Labs_Biochem	HIC biochemistry data
dbo.HaematologyRestructured	HIC haematology data
dbo.ImmunologyRestructured	HIC immunology data
dbo.VirologyRestructured	HIC virology data
dbo.Lothian	Lothian all data
dbo.Lothian_TestCode2ReadCode	Lothian local code to read code mapping

Scripts should be run in sequence as marked as "a", "b" and "c" in the name. It does not matter which one to run first between the same mark. For example, either run "a DaSH2FHIRscot.sql" before or after "a Glasgow2FHIRscot.sql" is OK.

Running R code

Scripts should be run in sequence as marked as "0", "1", "2" to "6" in the name. It does not matter which one to run first between the same mark.

Acknowledgments

This is research data for only a subset of health board data and no inference or implication should be made regarding relative quality, accuracy, or performance of individual health boards.

We acknowledge the support of the Grampian Data Safe Haven (DaSH) facility within the Aberdeen Centre for Health Data Science and the associated financial support of the University of Aberdeen, and NHS Research Scotland (through NHS Grampian investment in DaSH). This work was supported by DataLoch (dataloch.org), which is core-funded by the Data-Driven Innovation programme within the Edinburgh and South East Scotland City Region Deal (ddi.ac.uk) and the Chief Scientist Office, Scottish Government. The authors would like to acknowledge the support of the DataLoch service for their involvement in obtaining approvals and provisioning access to data from the Lothian region. The authors would like to acknowledge the work of the West of Scotland Safe Haven team at NHS Greater Glasgow and Clyde in supporting the extractions and linkage to de-identified NHS patient datasets. We acknowledge the support of the Health Informatics Centre, University of Dundee for managing and supplying the anonymised data and NHS Tayside/Fife the original data source. This project was supported by funding from Research Data Scotland (RDS) through the Systems Development Fund.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Paper Appendix		Paper Appendix
R code for data harmonisation		R code for data harmonisation
SQL code for table structure		SQL code for table structure
.RData		.RData
.Rhistory		.Rhistory
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScottishLabData - A pipeline for harmonising NHS Scotland laboratory data to enable national-level analyses

Background

Requirements

Functional

Non-Functional

Description

This repository

Running SQL scripts

Running R code

Acknowledgments

About

Releases 1

Packages

Contributors 2

Languages

License

HicResearch/ScottishLabData

Folders and files

Latest commit

History

Repository files navigation

ScottishLabData - A pipeline for harmonising NHS Scotland laboratory data to enable national-level analyses

Background

Requirements

Functional

Non-Functional

Description

This repository

Running SQL scripts

Running R code

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages