Merge branch 'main' into 20240508/add_assessments

neurodatascience · May 8, 2024 · 3a80be5 · 3a80be5
2 parents bc9ad1e + 752dc9e
commit 3a80be5
Show file tree

Hide file tree

Showing 42 changed files with 1,171 additions and 226 deletions.
diff --git a/.gitignore b/.gitignore
@@ -33,3 +33,7 @@ env/
 
 # VS Code
 .vscode/
+
+# docs
+nipoppy_cli/docs/build
+nipoppy_cli/docs/source/schemas/*.json
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,27 @@
+# .readthedocs.yaml
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the OS, Python version and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.11"
+  jobs:
+    pre_build:
+      - python nipoppy_cli/docs/scripts/pydantic_to_jsonschema.py
+
+python:
+  install:
+  - method: pip
+    path: nipoppy_cli
+    extra_requirements:
+    - doc
+
+# Build documentation with Sphinx
+sphinx:
+  configuration: nipoppy_cli/docs/source/conf.py
+  fail_on_warning: true
diff --git a/README.md b/README.md
@@ -1,26 +1,30 @@
-# Nipoppy: Parkinson's Progression Markers Initiative dataset
+# Nipoppy
 
-This repository contains code to process tabular and imaging data from the Parkinson's Progression Markers Initiative (PPMI) dataset. It is a fork of the main [Nipoppy](https://github.com/neurodatascience/nipoppy) repository. Nipoppy is a lightweight workflow management and harmonization tools for MRI and clinical data. This fork adds scripts, configuration files, and downstream analyses that are specific to PPMI.
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8084759.svg)](https://doi.org/10.5281/zenodo.8084759)
+[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/license/mit)
+[![codecov](https://codecov.io/gh/neurodatascience/nipoppy/graph/badge.svg?token=SN38ITRO4M)](https://codecov.io/gh/neurodatascience/nipoppy)
+[![https://github.com/psf/black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://black.readthedocs.io/en/stable/)
+[![Documentation Status](https://readthedocs.org/projects/nipoppy/badge/?version=latest)](https://nipoppy.readthedocs.io/en/latest/?badge=latest)
 
-## BIDS data file naming
+Nipoppy is a lightweight framework for standardized organization and processing of neuroimaging-clinical datasets. Its goal is to help users adopt the
+[FAIR](https://www.go-fair.org/fair-principles/) principles
+and improve the reproducibility of studies.
 
-<!-- TODO: update link/path once tabular is moved under workflow -->
-The [tabular/ppmi_imaging_descriptions.json](https://github.com/neurodatascience/nipoppy-ppmi/blob/main/nipoppy/workflow/tabular/ppmi_imaging_descriptions.json) file is used to determine the BIDS datatype and suffix (contrast) associated with an image's MRI series description. It will be updated as new data is processed.
+The framework includes three components:
 
-Here is a description of the available BIDS data and the tags that can appear in their filenames:
+1. A specification for dataset organization that extends the [Brain Imaging Data Structure (BIDS) standard](https://bids.neuroimaging.io/) by providing additional guidelines for tabular (e.g., phenotypic) data and imaging derivatives.
 
-- `anat`
-  - The available suffixes are: `T1w`, `T2w`, `T2starw`, and `FLAIR`
-  - Most images have an `acq` tag:
-    - Non-neuromelanin images: `acq-<plane><type>`, where
-        - `<plane>` is one of: `sag`, `ax`, or `cor` (for sagittal, axial, or coronal scans respectively)
-        - `<type>` is one of: `2D`, or `3D`
-    - Neuromelanin images: `acq-NM`
-  - For some images, the acquisition plane (`sag`/`ax`/`cor`) or type (`2D`/`3D`) cannot be easily obtained. In those cases, the filename will not contain an `acq` tag.
-- `dwi`
-  - All imaging files have the `dwi` suffix.
-  - Most images have a `dir` tag corresponding to the phase-encoding direction. This is one of: `LR`, `RL`, `AP`, or `PA`
-  - Images where the phase-encoding direction cannot be easily inferred from the series description string do not have a `dir` tag.
-  - Some participants have multi-shell sequences for their diffusion data. These files will have an additional `acq-B<value>` tag, where `value` is the b-value for that sequence.
+    ![Nipoppy specification](nipoppy_cli/docs/source/_static/img/nipoppy_specification.jpg)
 
-Currently, only structural (`anat`) and diffusion (`dwi`) MRI data are supported. Functional (`func`) data has not been converted to the BIDS format yet.
+2. A protocol for data organization, curation and processing, with steps that include the following:
+    - **Organization** of raw data, including conversion of raw DICOMs (or NIfTIs) to [BIDS](https://bids.neuroimaging.io/)
+    - **Processing** of imaging data with existing or custom pipelines
+    - **Tracking** of data availability and processing status
+    - **Extraction** of imaging-derived phenotypes (IDPs) for downstream statistical modelling and analysis
+
+    ![Nipoppy protocol](nipoppy_cli/docs/source/_static/img/nipoppy_protocol.jpg)
+
+3. A **command-line interface** and **Python package** that provide user-friendly tools for applying the framework. The tools build upon existing technologies such as the [Apptainer container platform](https://apptainer.org/) and the [Boutiques descriptor framework](https://boutiques.github.io/). Several existing containerized pipelines are supported out-of-the-box, and new pipelines can be added easily by the user.
+    - We have also developed a [**web dashboard**](https://digest.neurobagel.org) for interactive visualizations of imaging and phenotypic data availability.
+
+See the [documentation website](https://neurobagel.org/nipoppy/overview/) for more information!
diff --git a/README_PPMI.md b/README_PPMI.md
@@ -0,0 +1,26 @@
+# Nipoppy: Parkinson's Progression Markers Initiative dataset
+
+This repository contains code to process tabular and imaging data from the Parkinson's Progression Markers Initiative (PPMI) dataset. It is a fork of the main [Nipoppy](https://github.com/neurodatascience/nipoppy) repository. Nipoppy is a lightweight workflow management and harmonization tools for MRI and clinical data. This fork adds scripts, configuration files, and downstream analyses that are specific to PPMI.
+
+## BIDS data file naming
+
+<!-- TODO: update link/path once tabular is moved under workflow -->
+The [tabular/ppmi_imaging_descriptions.json](https://github.com/neurodatascience/nipoppy-ppmi/blob/main/nipoppy/workflow/tabular/ppmi_imaging_descriptions.json) file is used to determine the BIDS datatype and suffix (contrast) associated with an image's MRI series description. It will be updated as new data is processed.
+
+Here is a description of the available BIDS data and the tags that can appear in their filenames:
+
+- `anat`
+  - The available suffixes are: `T1w`, `T2w`, `T2starw`, and `FLAIR`
+  - Most images have an `acq` tag:
+    - Non-neuromelanin images: `acq-<plane><type>`, where
+        - `<plane>` is one of: `sag`, `ax`, or `cor` (for sagittal, axial, or coronal scans respectively)
+        - `<type>` is one of: `2D`, or `3D`
+    - Neuromelanin images: `acq-NM`
+  - For some images, the acquisition plane (`sag`/`ax`/`cor`) or type (`2D`/`3D`) cannot be easily obtained. In those cases, the filename will not contain an `acq` tag.
+- `dwi`
+  - All imaging files have the `dwi` suffix.
+  - Most images have a `dir` tag corresponding to the phase-encoding direction. This is one of: `LR`, `RL`, `AP`, or `PA`
+  - Images where the phase-encoding direction cannot be easily inferred from the series description string do not have a `dir` tag.
+  - Some participants have multi-shell sequences for their diffusion data. These files will have an additional `acq-B<value>` tag, where `value` is the b-value for that sequence.
+
+Currently, only structural (`anat`) and diffusion (`dwi`) MRI data are supported. Functional (`func`) data has not been converted to the BIDS format yet.
diff --git a/docs/README.md b/docs/README.md
diff --git a/nipoppy/extractors/fmriprep/run_FC.py b/nipoppy/extractors/fmriprep/run_FC.py
@@ -229,7 +229,7 @@ def run(participant_id: str,
 	if output_dir is None:
 		output_dir = f"{DATASET_ROOT}/derivatives/"
 
-	fmriprep_dir = f"{DATASET_ROOT}/derivatives/fmriprep/{FMRIPREP_VERSION}/output"
+	fmriprep_dir = f"{DATASET_ROOT}/derivatives/fmriprep/v{FMRIPREP_VERSION}/output"
 	DKT_dir = f"{DATASET_ROOT}/derivatives/networks/0.9.0/output"
 	FC_dir = f"{output_dir}/FC"
 
@@ -290,4 +290,4 @@ def run(participant_id: str,
 	with open(FC_config_file, 'r') as f:
 		FC_configs = json.load(f)
 
-	run(participant_id, global_configs, FC_configs, session_id, output_dir)
+	run(participant_id, global_configs, FC_configs, session_id, output_dir)
diff --git a/nipoppy/extractors/freesurfer/run_structural_measures.py b/nipoppy/extractors/freesurfer/run_structural_measures.py
@@ -14,28 +14,30 @@
 # Globals
 # Brainload has two separate functions to extract aseg data. 
 measure_column_names = ["StructName","Structure","Description","Volume_mm3", "unit"]
-aseg_cols = ["StructName", "Volume_mm3"]
-dkt_cols = ["StructName", "ThickAvg"]
 
 def get_aseg_stats(participant_stats_dir, aseg_cols):
+    """ Parses the aseg.stats file
+    """
     aseg_cols = ["StructName", "Volume_mm3"]
     aseg_stats = bl.stat(f'{participant_stats_dir}/aseg.stats')
     table_df = pd.DataFrame(aseg_stats["table_data"], columns=aseg_stats["table_column_headers"])[aseg_cols]
     measure_df = pd.DataFrame(data=aseg_stats["measures"], columns=measure_column_names)[aseg_cols]
     _df = pd.concat([table_df,measure_df],axis=0)
     return _df
 
-def get_aparc_stats(participant_stats_dir, aparc_cols, parcel="aparc.DKTatlas"):
+def get_DKT_stats(participant_stats_dir, dkt_cols, parcel="aparc.DKTatlas"):
+    """ Parses the <>.aparc.DKTatlas.stats file
+    """
     hemi = "lh"
     stat_file = f"{hemi}.{parcel}.stats"
     lh_dkt_stats = bl.stat(f'{participant_stats_dir}/{stat_file}')
-    lh_df = pd.DataFrame(lh_dkt_stats["table_data"], columns=lh_dkt_stats["table_column_headers"])[aparc_cols]
+    lh_df = pd.DataFrame(lh_dkt_stats["table_data"], columns=lh_dkt_stats["table_column_headers"])[dkt_cols]
     lh_df["hemi"] = hemi
 
     hemi = "rh"
     stat_file = f"{hemi}.{parcel}.stats"
-    rh_dkt_stats = bl.stat(f'{participant_stats_dir}/rh.aparc.DKTatlas.stats')
-    rh_df = pd.DataFrame(rh_dkt_stats["table_data"], columns=rh_dkt_stats["table_column_headers"])[aparc_cols]
+    rh_dkt_stats = bl.stat(f'{participant_stats_dir}/{stat_file}')
+    rh_df = pd.DataFrame(rh_dkt_stats["table_data"], columns=rh_dkt_stats["table_column_headers"])[dkt_cols]
     rh_df["hemi"] = hemi
 
     _df = pd.concat([lh_df,rh_df], axis=0)
@@ -52,17 +54,16 @@ def get_aparc_stats(participant_stats_dir, aparc_cols, parcel="aparc.DKTatlas"):
 parser.add_argument('--FS_config', type=str, help='path to freesurfer configs for a given nipoppy dataset', required=True)
 parser.add_argument('--participants_list', default=None, help='path to participants list (csv or tsv')
 parser.add_argument('--session_id', type=str, help='session id for the participant', required=True)    
-parser.add_argument('--save_dir', default='./', help='path to save_dir')
+parser.add_argument('--output_dir', default=None, help='path to save extracted output (default: derivatives/freesurfer/<version>/IDP/<session>)')
 
 args = parser.parse_args()
 
 global_config_file = args.global_config
 FS_config_file = args.FS_config
 participants_list = args.participants_list
 session_id = args.session_id
-save_dir = args.save_dir
-
 session = f"ses-{session_id}"
+output_dir = args.output_dir
 
 # Read global configs
 with open(global_config_file, 'r') as f:
@@ -77,9 +78,12 @@ def get_aparc_stats(participant_stats_dir, aparc_cols, parcel="aparc.DKTatlas"):
 stat_configs = FS_configs["stat_configs"]
 stat_config_names = stat_configs.keys()
 
-print(f"Using dataset root: {DATASET_ROOT} and FreeSurfer version: {FS_version}")
+print(f"Using dataset root: {DATASET_ROOT} and FreeSurfer version: v{FS_version}")
 print(f"Using stat configs: {stat_config_names}")
 
+if output_dir == None:
+    output_dir = f"{DATASET_ROOT}/derivatives/freesurfer/v{FS_version}/IDP/{session}/"
+
 if participants_list == None:
     # use doughnut
     doughnut_file = f"{DATASET_ROOT}/scratch/raw_dicom/doughnut.csv"
@@ -97,17 +101,17 @@ def get_aparc_stats(participant_stats_dir, aparc_cols, parcel="aparc.DKTatlas"):
 
 
 # Extract stats for each participant
-fs_output_dir = f"{DATASET_ROOT}/derivatives/freesurfer/{FS_version}/output/{session}/"
+fs_output_dir = f"{DATASET_ROOT}/derivatives/freesurfer/v{FS_version}/output/{session}/"
 
 aseg_df = pd.DataFrame()
-aparc_df = pd.DataFrame()
+dkt_df = pd.DataFrame()
 for participant_id in bids_participants:
     participant_stats_dir = f"{fs_output_dir}{participant_id}/stats/"
     print(f"Extracting stats for participant: {participant_id}")
 
     for config_name, config_cols in stat_configs.items():
         print(f"Extracting data for config: {config_name}")
-        if config_name.strip() == "aseg":
+        if config_name.strip().lower() == "aseg":
             try:
                 _df = get_aseg_stats(participant_stats_dir, config_cols) 
                 # transpose it to wideform               
@@ -122,36 +126,35 @@ def get_aparc_stats(participant_stats_dir, aparc_cols, parcel="aparc.DKTatlas"):
             except:
                 print(f"Error parsing aseg data for {participant_id}")
 
-        elif config_name.strip() == "aparc":
+        elif config_name.strip().lower() == "dkt":
             try:
-                _df = get_aparc_stats(participant_stats_dir, config_cols)
+                _df = get_DKT_stats(participant_stats_dir, config_cols)
                 # transpose it to wideform               
                 names_col = config_cols[0]
                 values_col = config_cols[1]                
                 cols = ["participant_id"] + list(_df["hemi"] + "." + _df[names_col])
                 vals = [participant_id] + list(_df[values_col])                
                 _df_wide = pd.DataFrame(columns=cols)
                 _df_wide.loc[0] = vals
-                aparc_df = pd.concat([aparc_df,_df_wide], axis=0)
+                dkt_df = pd.concat([dkt_df,_df_wide], axis=0)
 
             except Exception as e:
-                print(f"Error parsing aparc data for {participant_id} with exception: {e}")
+                print(f"Error parsing dkt data for {participant_id} with exception: {e}")
 
         else:
             print(f"Unknown stat config: {config_name}")
 
 # Save configs
-print(f"Saving collated stat tables at: {save_dir}")
-aseg_csv = f"{save_dir}/aseg.csv"
-aparc_csv = f"{save_dir}/aparc.csv"
+print(f"Saving collated stat tables at: {output_dir}")
+aseg_csv = f"{output_dir}/aseg.csv"
+dkt_csv = f"{output_dir}/dkt.csv"
 
 if len(aseg_df) > 0: 
     aseg_df.to_csv(aseg_csv, index=None)
 else:
     print("aseg_df is empty")
 
-if len(aparc_df) > 0:
-    aparc_df.to_csv(aparc_csv, index=None)
+if len(dkt_df) > 0:
+    dkt_df.to_csv(dkt_csv, index=None)
 else:
-    print("aparc_df is empty")
-
+    print("dkt_df is empty")