From 5f89ab63c7b3db579c81a6a1d7d0b8f414f9f067 Mon Sep 17 00:00:00 2001 From: Ray Pomponio Date: Thu, 27 Aug 2020 11:12:54 -0400 Subject: [PATCH] version 1.0.x with working NIFTI functionality --- README.rst | 114 ++++++++++++++++++++++++++++++++--------------------- setup.py | 4 +- 2 files changed, 71 insertions(+), 47 deletions(-) diff --git a/README.rst b/README.rst index 9047284..630aa4f 100644 --- a/README.rst +++ b/README.rst @@ -3,7 +3,7 @@ neuroHarmonize ============== Harmonization tools for multi-site neuroimaging analysis. Part of the work -reported in our first paper with data from the ISTAGING consoritum [1]_. +reported in our paper with data from the ISTAGING consoritum [1]_. contact: raymond (dot) pomponio (at) outlook (dot) com @@ -13,35 +13,23 @@ Overview This package extends the functionality of the package developed by Nick Cullen [2]_, ``neuroCombat``, which is hosted on GitHub: https://github.com/ncullen93/neuroCombat -The previously-released package, ``neuroCombat``, allows the user to perform a +Cullen's package, ``neuroCombat``, allows the user to perform a harmonization procedure using the ComBat [3]_ algorithm for correcting multi-site data. -This package, ``neuroHarmonize``, has similar functionality, but also allows the -user to perform additional procedures: - -1. Train a harmonization model on a subset of data, then apply the model to the - new set. For example, in longitudinal analyses, one may wish to train a - harmonization model on baseline cases and apply the model to follow-up cases, - to avoid double-counting subjects. -2. Specify covariates with nonlinear effects. Age tends to exhibit nonlinear - relationships with brain volumes. Nonlinear effects are implemented using - Generalized Additive Models (GAMs) via the ``statsmodels`` package. -3. Apply a pre-trained harmonization model to NIFTI images. When performing - image-level harmonization, loading the entire set of images may exceed - memory capacity. In such cases, it is still possible to harmonize images by - sequentially adjusting images one-by-one. This functionality is made - available via the ``nibabel`` package. -4. Train a harmonization model without the empirical Bayes (EB) step of ComBat. - -*Note: Biological covariate effects are modeled but not explicitly removed, as -per the original formulation of ComBat. Removing covariate effects can be -achieved with an external model, or by using the argument* `return_s_data` +This package, ``neuroHarmonize``, provides similar functionality, with additional +features: + +1. Support for working with NIFTI images. Implemented with the ``nibabel`` package. +2. Separate train/test datasets. +3. Specify covariates with generic nonlinear effects. Implemented using + Generalized Additive Models (GAMs) from the ``statsmodels`` package. +4. Skip the empirical Bayes (EB) step of ComBat, if desired. Installation ------------ -Latest version: ``0.4.x`` (August 2020) +Latest version: ``1.0.x`` (August 27, 2020) Requirements: @@ -58,7 +46,7 @@ cannot be released on PyPI until a developer version of statsmodels is released. **Option 2: Install from GitHub** -1. Install the developer version of ``statsmodels``. This package depends on ``statsmodels v0.12.0.dev0``. Until the dev version is released, the current workaround is to run the following in the command line: +1. Install the developer version of ``statsmodels``. This package depends on ``statsmodels v0.12.0.dev0``. Until the dev version is released on PyPI, the current workaround is to run the following in the command line: >>> pip install git+https://github.com/statsmodels/statsmodels @@ -69,6 +57,10 @@ cannot be released on PyPI until a developer version of statsmodels is released. Quick Start ----------- +*Please note that the ComBat [3]_ algorithm corrects for site effects but +intentionally preserves covariate effects.* If you wish to remove covariate +effects as well you can use the argument ``return_s_data``. + You must provide a **data matrix** which is a ``numpy.array`` containing the features to be harmonized. For example, an ``array`` of brain volumes: @@ -117,12 +109,53 @@ Example usage: >>> # run harmonization and store the adjusted data >>> my_model, my_data_adj = harmonizationLearn(my_data, covars) -If you wish to skip the empirical Bayes step of ComBat, simply pass the optional -argument ``eb=False`` to ``harmonizationLearn``. +The dimensionality of the matrix ``my_data_adj`` will be identical to +``my_data``: **N_samples x N_features** + +Working with NIFTI Images +------------------------- + +To work with NIFTI images, you compute a mask using a ``pandas.DataFrame`` which +contains file paths for all of the images in the **training set**. + + >>> import pandas as pd + >>> import numpy as np + >>> from neuroHarmonize.harmonizationNIFTI import createMaskNIFTI + >>> nifti_list = pd.read_csv('brain_images_paths.csv') + >>> nifti_avg, nifti_mask, affine = createMaskNIFTI(nifti_list, threshold=0) + +After the mask is created, you can flatten the images to a 2D ``numpy.array`` +very similar to what is done with the tabular data example above. + + >>> from neuroHarmonize.harmonizationNIFTI import flattenNIFTIs + >>> nifti_array = flattenNIFTIs(nifti_list, 'thresholded_mask.nii.gz') + +The next step is identical to working with tabular data. You simply pass the 2D +array to ``neuroHarmonize.harmonizationLearn``. + + >>> import neuroHarmonize as nh + >>> covars = pd.read_csv('subject_info.csv') + >>> my_model, nifti_array_adj = nh.harmonizationLearn(nifti_array, covars) + >>> nh.saveHarmonizationModel(my_model, 'MY_MODEL') + +Lastly, you can apply the model sequentially to images in a larger dataset with +``applyModelNIFTIs``. When performing NIFTI harmonization, loading the entire set +of images may exceed memory capacity. This function will reduce the burden on +memory by applying the model to images one-by-one and saving the results as NIFTIs. + + >>> from neuroHarmonize.harmonizationNIFTI import applyModelNIFTIs + >>> # load pre-trained model + >>> my_model = nh.loadHarmonizationModel('MY_MODEL') + >>> applyModelNIFTIs(covars, my_model, nifti_list, 'thresholded_mask.nii.gz') Applying Pre-Trained Models to New Data --------------------------------------- +This feature allows you to train a harmonization model on a subset of data, then +apply the model to the entire set. For example, in longitudinal analyses, one may +wish to train a harmonization model on baseline cases and apply the model to +follow-up cases, to avoid double-counting subjects. + If you have previously trained a harmonization model with ``harmonizationLearn``, you may apply the model parameters to new data with ``harmonizationApply``. @@ -146,23 +179,13 @@ After preparing the holdout data simply apply the model: >>> covars = pd.read_csv('subject_info_holdout.csv') >>> my_holdout_data_adj = harmonizationApply(my_holdout_data, covars, my_model) -Empirical Bayes ---------------- - -Note the default behavior is to run the empirical Bayes (EB) step of ComBat, which -is useful for harmonizing multiple features that are similar such as genes or -brain regional volumes. - -To run without EB, specify ``eb=False`` in ``harmonizationLearn``. This is -convenient when harmonizing a small number of features, e.g. fewer than 10. - Specifying Nonlinear Covariate Effects -------------------------------------- You may specify nonlinear covariate effects with the optional argument: ``smooth_terms``. For example, you may want to specify age as a nonlinear -term in the harmonization model. This can be done easily with -``harmonizationLearn``: +term in the harmonization model, if age exhibits nonlinear relationships with +brain volumes. This can be done easily with ``harmonizationLearn``: >>> from neuroHarmonize import harmonizationLearn >>> import pandas as pd @@ -181,15 +204,16 @@ The current workaround is to use the optional argument: ``smooth_term_bounds``, which controls the boundary knots for nonlinear estimation. You should specify boundaries that contain the limits of the entire dataset, including holdout data. -Working with NIFTI Images -------------------------- - -*This feature is currently in development.* +Empirical Bayes +--------------- - >>> from neuroHarmonize.harmonizationNIFTI import create_NIFTI_mask, flatten_NIFTIs +Note the default behavior is to run the empirical Bayes (EB) step of ComBat, which +is useful for harmonizing multiple features that are similar such as genes or +brain regional volumes. -Visualize Fits of EB Priors ---------------------------- +To run without EB, simply pass the optional argument ``eb=False`` to +``harmonizationLearn``. This is convenient when harmonizing a small number of +features, e.g. fewer than 10. When ``eb=True``, ComBat uses Empirical Bayes to fit a prior distribution for the site effects for each site. You may wish to visualize fit of the prior diff --git a/setup.py b/setup.py index 0cbb5b6..ad648a9 100644 --- a/setup.py +++ b/setup.py @@ -5,11 +5,11 @@ def readme(): return f.read() setup(name='neuroHarmonize', - version='0.4.1', + version='1.0.0', description='Harmonization tools for multi-center neuroimaging studies.', long_description=readme(), url='https://github.com/rpomponio/neuroHarmonize', - author='Ray Pomponio', + author='Raymond Pomponio', author_email='raymond.pomponio@outlook.edu', license='MIT', packages=['neuroHarmonize'],