Skip to content

Commit

Permalink
Merge branch 'main' into 20240508/add_assessments
Browse files Browse the repository at this point in the history
  • Loading branch information
michellewang committed May 8, 2024
2 parents bc9ad1e + 752dc9e commit 3a80be5
Show file tree
Hide file tree
Showing 42 changed files with 1,171 additions and 226 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,7 @@ env/

# VS Code
.vscode/

# docs
nipoppy_cli/docs/build
nipoppy_cli/docs/source/schemas/*.json
27 changes: 27 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.11"
jobs:
pre_build:
- python nipoppy_cli/docs/scripts/pydantic_to_jsonschema.py

python:
install:
- method: pip
path: nipoppy_cli
extra_requirements:
- doc

# Build documentation with Sphinx
sphinx:
configuration: nipoppy_cli/docs/source/conf.py
fail_on_warning: true
44 changes: 24 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,30 @@
# Nipoppy: Parkinson's Progression Markers Initiative dataset
# Nipoppy

This repository contains code to process tabular and imaging data from the Parkinson's Progression Markers Initiative (PPMI) dataset. It is a fork of the main [Nipoppy](https://github.com/neurodatascience/nipoppy) repository. Nipoppy is a lightweight workflow management and harmonization tools for MRI and clinical data. This fork adds scripts, configuration files, and downstream analyses that are specific to PPMI.
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8084759.svg)](https://doi.org/10.5281/zenodo.8084759)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/license/mit)
[![codecov](https://codecov.io/gh/neurodatascience/nipoppy/graph/badge.svg?token=SN38ITRO4M)](https://codecov.io/gh/neurodatascience/nipoppy)
[![https://github.com/psf/black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://black.readthedocs.io/en/stable/)
[![Documentation Status](https://readthedocs.org/projects/nipoppy/badge/?version=latest)](https://nipoppy.readthedocs.io/en/latest/?badge=latest)

## BIDS data file naming
Nipoppy is a lightweight framework for standardized organization and processing of neuroimaging-clinical datasets. Its goal is to help users adopt the
[FAIR](https://www.go-fair.org/fair-principles/) principles
and improve the reproducibility of studies.

<!-- TODO: update link/path once tabular is moved under workflow -->
The [tabular/ppmi_imaging_descriptions.json](https://github.com/neurodatascience/nipoppy-ppmi/blob/main/nipoppy/workflow/tabular/ppmi_imaging_descriptions.json) file is used to determine the BIDS datatype and suffix (contrast) associated with an image's MRI series description. It will be updated as new data is processed.
The framework includes three components:

Here is a description of the available BIDS data and the tags that can appear in their filenames:
1. A specification for dataset organization that extends the [Brain Imaging Data Structure (BIDS) standard](https://bids.neuroimaging.io/) by providing additional guidelines for tabular (e.g., phenotypic) data and imaging derivatives.

- `anat`
- The available suffixes are: `T1w`, `T2w`, `T2starw`, and `FLAIR`
- Most images have an `acq` tag:
- Non-neuromelanin images: `acq-<plane><type>`, where
- `<plane>` is one of: `sag`, `ax`, or `cor` (for sagittal, axial, or coronal scans respectively)
- `<type>` is one of: `2D`, or `3D`
- Neuromelanin images: `acq-NM`
- For some images, the acquisition plane (`sag`/`ax`/`cor`) or type (`2D`/`3D`) cannot be easily obtained. In those cases, the filename will not contain an `acq` tag.
- `dwi`
- All imaging files have the `dwi` suffix.
- Most images have a `dir` tag corresponding to the phase-encoding direction. This is one of: `LR`, `RL`, `AP`, or `PA`
- Images where the phase-encoding direction cannot be easily inferred from the series description string do not have a `dir` tag.
- Some participants have multi-shell sequences for their diffusion data. These files will have an additional `acq-B<value>` tag, where `value` is the b-value for that sequence.
![Nipoppy specification](nipoppy_cli/docs/source/_static/img/nipoppy_specification.jpg)

Currently, only structural (`anat`) and diffusion (`dwi`) MRI data are supported. Functional (`func`) data has not been converted to the BIDS format yet.
2. A protocol for data organization, curation and processing, with steps that include the following:
- **Organization** of raw data, including conversion of raw DICOMs (or NIfTIs) to [BIDS](https://bids.neuroimaging.io/)
- **Processing** of imaging data with existing or custom pipelines
- **Tracking** of data availability and processing status
- **Extraction** of imaging-derived phenotypes (IDPs) for downstream statistical modelling and analysis

![Nipoppy protocol](nipoppy_cli/docs/source/_static/img/nipoppy_protocol.jpg)

3. A **command-line interface** and **Python package** that provide user-friendly tools for applying the framework. The tools build upon existing technologies such as the [Apptainer container platform](https://apptainer.org/) and the [Boutiques descriptor framework](https://boutiques.github.io/). Several existing containerized pipelines are supported out-of-the-box, and new pipelines can be added easily by the user.
- We have also developed a [**web dashboard**](https://digest.neurobagel.org) for interactive visualizations of imaging and phenotypic data availability.

See the [documentation website](https://neurobagel.org/nipoppy/overview/) for more information!
26 changes: 26 additions & 0 deletions README_PPMI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Nipoppy: Parkinson's Progression Markers Initiative dataset

This repository contains code to process tabular and imaging data from the Parkinson's Progression Markers Initiative (PPMI) dataset. It is a fork of the main [Nipoppy](https://github.com/neurodatascience/nipoppy) repository. Nipoppy is a lightweight workflow management and harmonization tools for MRI and clinical data. This fork adds scripts, configuration files, and downstream analyses that are specific to PPMI.

## BIDS data file naming

<!-- TODO: update link/path once tabular is moved under workflow -->
The [tabular/ppmi_imaging_descriptions.json](https://github.com/neurodatascience/nipoppy-ppmi/blob/main/nipoppy/workflow/tabular/ppmi_imaging_descriptions.json) file is used to determine the BIDS datatype and suffix (contrast) associated with an image's MRI series description. It will be updated as new data is processed.

Here is a description of the available BIDS data and the tags that can appear in their filenames:

- `anat`
- The available suffixes are: `T1w`, `T2w`, `T2starw`, and `FLAIR`
- Most images have an `acq` tag:
- Non-neuromelanin images: `acq-<plane><type>`, where
- `<plane>` is one of: `sag`, `ax`, or `cor` (for sagittal, axial, or coronal scans respectively)
- `<type>` is one of: `2D`, or `3D`
- Neuromelanin images: `acq-NM`
- For some images, the acquisition plane (`sag`/`ax`/`cor`) or type (`2D`/`3D`) cannot be easily obtained. In those cases, the filename will not contain an `acq` tag.
- `dwi`
- All imaging files have the `dwi` suffix.
- Most images have a `dir` tag corresponding to the phase-encoding direction. This is one of: `LR`, `RL`, `AP`, or `PA`
- Images where the phase-encoding direction cannot be easily inferred from the series description string do not have a `dir` tag.
- Some participants have multi-shell sequences for their diffusion data. These files will have an additional `acq-B<value>` tag, where `value` is the b-value for that sequence.

Currently, only structural (`anat`) and diffusion (`dwi`) MRI data are supported. Functional (`func`) data has not been converted to the BIDS format yet.
67 changes: 0 additions & 67 deletions docs/README.md

This file was deleted.

4 changes: 2 additions & 2 deletions nipoppy/extractors/fmriprep/run_FC.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ def run(participant_id: str,
if output_dir is None:
output_dir = f"{DATASET_ROOT}/derivatives/"

fmriprep_dir = f"{DATASET_ROOT}/derivatives/fmriprep/{FMRIPREP_VERSION}/output"
fmriprep_dir = f"{DATASET_ROOT}/derivatives/fmriprep/v{FMRIPREP_VERSION}/output"
DKT_dir = f"{DATASET_ROOT}/derivatives/networks/0.9.0/output"
FC_dir = f"{output_dir}/FC"

Expand Down Expand Up @@ -290,4 +290,4 @@ def run(participant_id: str,
with open(FC_config_file, 'r') as f:
FC_configs = json.load(f)

run(participant_id, global_configs, FC_configs, session_id, output_dir)
run(participant_id, global_configs, FC_configs, session_id, output_dir)
51 changes: 27 additions & 24 deletions nipoppy/extractors/freesurfer/run_structural_measures.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,28 +14,30 @@
# Globals
# Brainload has two separate functions to extract aseg data.
measure_column_names = ["StructName","Structure","Description","Volume_mm3", "unit"]
aseg_cols = ["StructName", "Volume_mm3"]
dkt_cols = ["StructName", "ThickAvg"]

def get_aseg_stats(participant_stats_dir, aseg_cols):
""" Parses the aseg.stats file
"""
aseg_cols = ["StructName", "Volume_mm3"]
aseg_stats = bl.stat(f'{participant_stats_dir}/aseg.stats')
table_df = pd.DataFrame(aseg_stats["table_data"], columns=aseg_stats["table_column_headers"])[aseg_cols]
measure_df = pd.DataFrame(data=aseg_stats["measures"], columns=measure_column_names)[aseg_cols]
_df = pd.concat([table_df,measure_df],axis=0)
return _df

def get_aparc_stats(participant_stats_dir, aparc_cols, parcel="aparc.DKTatlas"):
def get_DKT_stats(participant_stats_dir, dkt_cols, parcel="aparc.DKTatlas"):
""" Parses the <>.aparc.DKTatlas.stats file
"""
hemi = "lh"
stat_file = f"{hemi}.{parcel}.stats"
lh_dkt_stats = bl.stat(f'{participant_stats_dir}/{stat_file}')
lh_df = pd.DataFrame(lh_dkt_stats["table_data"], columns=lh_dkt_stats["table_column_headers"])[aparc_cols]
lh_df = pd.DataFrame(lh_dkt_stats["table_data"], columns=lh_dkt_stats["table_column_headers"])[dkt_cols]
lh_df["hemi"] = hemi

hemi = "rh"
stat_file = f"{hemi}.{parcel}.stats"
rh_dkt_stats = bl.stat(f'{participant_stats_dir}/rh.aparc.DKTatlas.stats')
rh_df = pd.DataFrame(rh_dkt_stats["table_data"], columns=rh_dkt_stats["table_column_headers"])[aparc_cols]
rh_dkt_stats = bl.stat(f'{participant_stats_dir}/{stat_file}')
rh_df = pd.DataFrame(rh_dkt_stats["table_data"], columns=rh_dkt_stats["table_column_headers"])[dkt_cols]
rh_df["hemi"] = hemi

_df = pd.concat([lh_df,rh_df], axis=0)
Expand All @@ -52,17 +54,16 @@ def get_aparc_stats(participant_stats_dir, aparc_cols, parcel="aparc.DKTatlas"):
parser.add_argument('--FS_config', type=str, help='path to freesurfer configs for a given nipoppy dataset', required=True)
parser.add_argument('--participants_list', default=None, help='path to participants list (csv or tsv')
parser.add_argument('--session_id', type=str, help='session id for the participant', required=True)
parser.add_argument('--save_dir', default='./', help='path to save_dir')
parser.add_argument('--output_dir', default=None, help='path to save extracted output (default: derivatives/freesurfer/<version>/IDP/<session>)')

args = parser.parse_args()

global_config_file = args.global_config
FS_config_file = args.FS_config
participants_list = args.participants_list
session_id = args.session_id
save_dir = args.save_dir

session = f"ses-{session_id}"
output_dir = args.output_dir

# Read global configs
with open(global_config_file, 'r') as f:
Expand All @@ -77,9 +78,12 @@ def get_aparc_stats(participant_stats_dir, aparc_cols, parcel="aparc.DKTatlas"):
stat_configs = FS_configs["stat_configs"]
stat_config_names = stat_configs.keys()

print(f"Using dataset root: {DATASET_ROOT} and FreeSurfer version: {FS_version}")
print(f"Using dataset root: {DATASET_ROOT} and FreeSurfer version: v{FS_version}")
print(f"Using stat configs: {stat_config_names}")

if output_dir == None:
output_dir = f"{DATASET_ROOT}/derivatives/freesurfer/v{FS_version}/IDP/{session}/"

if participants_list == None:
# use doughnut
doughnut_file = f"{DATASET_ROOT}/scratch/raw_dicom/doughnut.csv"
Expand All @@ -97,17 +101,17 @@ def get_aparc_stats(participant_stats_dir, aparc_cols, parcel="aparc.DKTatlas"):


# Extract stats for each participant
fs_output_dir = f"{DATASET_ROOT}/derivatives/freesurfer/{FS_version}/output/{session}/"
fs_output_dir = f"{DATASET_ROOT}/derivatives/freesurfer/v{FS_version}/output/{session}/"

aseg_df = pd.DataFrame()
aparc_df = pd.DataFrame()
dkt_df = pd.DataFrame()
for participant_id in bids_participants:
participant_stats_dir = f"{fs_output_dir}{participant_id}/stats/"
print(f"Extracting stats for participant: {participant_id}")

for config_name, config_cols in stat_configs.items():
print(f"Extracting data for config: {config_name}")
if config_name.strip() == "aseg":
if config_name.strip().lower() == "aseg":
try:
_df = get_aseg_stats(participant_stats_dir, config_cols)
# transpose it to wideform
Expand All @@ -122,36 +126,35 @@ def get_aparc_stats(participant_stats_dir, aparc_cols, parcel="aparc.DKTatlas"):
except:
print(f"Error parsing aseg data for {participant_id}")

elif config_name.strip() == "aparc":
elif config_name.strip().lower() == "dkt":
try:
_df = get_aparc_stats(participant_stats_dir, config_cols)
_df = get_DKT_stats(participant_stats_dir, config_cols)
# transpose it to wideform
names_col = config_cols[0]
values_col = config_cols[1]
cols = ["participant_id"] + list(_df["hemi"] + "." + _df[names_col])
vals = [participant_id] + list(_df[values_col])
_df_wide = pd.DataFrame(columns=cols)
_df_wide.loc[0] = vals
aparc_df = pd.concat([aparc_df,_df_wide], axis=0)
dkt_df = pd.concat([dkt_df,_df_wide], axis=0)

except Exception as e:
print(f"Error parsing aparc data for {participant_id} with exception: {e}")
print(f"Error parsing dkt data for {participant_id} with exception: {e}")

else:
print(f"Unknown stat config: {config_name}")

# Save configs
print(f"Saving collated stat tables at: {save_dir}")
aseg_csv = f"{save_dir}/aseg.csv"
aparc_csv = f"{save_dir}/aparc.csv"
print(f"Saving collated stat tables at: {output_dir}")
aseg_csv = f"{output_dir}/aseg.csv"
dkt_csv = f"{output_dir}/dkt.csv"

if len(aseg_df) > 0:
aseg_df.to_csv(aseg_csv, index=None)
else:
print("aseg_df is empty")

if len(aparc_df) > 0:
aparc_df.to_csv(aparc_csv, index=None)
if len(dkt_df) > 0:
dkt_df.to_csv(dkt_csv, index=None)
else:
print("aparc_df is empty")

print("dkt_df is empty")
Loading

0 comments on commit 3a80be5

Please sign in to comment.