Skip to content
Nathan Muncy edited this page Mar 15, 2022 · 21 revisions

Description

The cli section is the entrypoint for those who wish to control the workflows (and therefore the resources) from the command line. Each script (save one) corresponds to a separate pipeline, will self orient, and submit a specified number of jobs to the slurm scheduler. For each of the pipeline scripts, a scheduled coordinator job determines which participants to submit work for, and then a parent job is submitted for each subject that meets the criteria. A default number of participants (n=8) are submitted at a time, to avoid hogging resources. Finally, intermediates can be kept via the [--keep-interm] option for the subject-level scripts.

Checks - checks.py

Rather than corresponding to a dedicated pipeline, checks.py is a preparatory script. This script will check the state of the specified project directory against a dictionary of expected files in order to generate logs/completed_preprocessing.tsv. The remaining scripts orient with tsv output. checks.py is written to be executed both locally, to index data found on the NAS, and remotely, to index data found on the HPC. Each execution initiates a git pull, then the completed_preprocessing.tsv is updated, and the script concludes with a git commit and push. In this way, the GitHub repo can serve as the go-between for the NAS and HPC. The value of each cell is the timestamp from when the detected file was generated.

checks.py only orients to participants who have not been excluded or withdrawn, according to github.com/emu-project/data_pulling/data_pulling/data/pseudo_guid_list.csv. This was decided instead of orienting to all subjects in a certain directory, say BIDS/dset, to help make sure that only consented data is included in analyses. Given the note below, logs/completed_preprocessing should be regenerated periodically to make sure that current consent is reflected in this document.

Note -- checks.py only checks blank fields, only appends. In this way you can index partial datasets in two locations (HPC, NAS) and have one comprehensive file. This note is particularly relevant when new fields are added to logs/completed_preprocessing.tsv, requiring a new dataframe to be generated.

Finally, as a special feature, the module resources.reports.check_complete.check_preproc can also be used to check if a single subject has all the required data. The report will be reflected in an updated logs/completed_preprocessing.tsv. Maybe this is useful?

It is intended that this script is ran prior to each pipeline submission, and if updated records are required, it should again be executed once the submitted work has finished. As it requires an internet connection, use the login node if working on the HPC (it runs very lightly).

Usage

Usage ought to be quite simple, the only required argument [-t] needing a personal access token to github.com/emu-project (for cloning pseudo_guid_list.csv as well as managing logs/completed_preprocessing.tsv). It is recommended to set the PAT as a global variable in your environment for ease of use. The [-p] option will also be useful for deciding which directory to index.

python checks.py -t $TOKEN_GITHUB_EMU

ASHS - ashs.py

Controlling automated segmentations of hippocampal subfields (ASHS) is accomplished through ashs.py.

  • First, this script will read in logs/completed_preprocessing.tsv and from that dataframe make a list of potential subjects.
  • Second, it will search through those potential subjects for (a) those who have a blank field in the log, and (b) have both T1- and T2-weighted files in the project "dset" directory.
  • Third, for each subject in a batch of size N, the script will then submit a workflow.control_ashs. This is accomplished via the submit_jobs function.
  • Fourth, output is saved to project directory derivatives/ashs.

A singularity image of this container is required, as are the location of a set of ASHS atlases.

Usage

Submit an sbatch job, capture the stdout/err in logs. Passing the path to the code directory is required.

code_dir="$(dirname "$(pwd)")"

sbatch --job-name=runAshs \
    --output=${code_dir}/logs/runAshs_log \
    --mem-per-cpu=4000 \
    --partition=IB_44C_512G \
    --account=iacc_madlab \
    --qos=pq_madlab \
    ashs.py \
    -c $code_dir \
    -s /path/to/ashs_latest.simg

Refacing - reface.py

Controlling AFNI's @afni_refacer_run is accomplished through reface.py. All refacing methods (deface, reface, reface_plus) are supported via the method option (default=reface). The works steps are as follows:

  1. This script will read in logs/completed_preprocessing.tsv and from that dataframe make a list of potential subjects.
  2. Determine which potential subjects have a T1-weighted file but not a refaced file.
  3. For each subject in a batch of size N, the script will then submit a workflow.control_reface. This is accomplished via the submit_jobs function.
  4. Output is saved to project directory derivatives/<reface>.

Usage

Submit an sbatch job, capture the stdout/err in logs. Passing the path to the code directory is required.

code_dir="$(dirname "$(pwd)")"

sbatch --job-name=runReface \
    --output=${code_dir}/logs/runReface_log \
    --mem-per-cpu=4000 \
    --partition=IB_44C_512G \
    --account=iacc_madlab \
    --qos=pq_madlab \
    reface.py \
    -c $code_dir

fMRIprep - fmriprep.py

Take data from dset and produce freesurfer and fmriprep derivatives. This work is a precursor to the various AFNI workflows (below). Steps are as follow:

  1. Update the templateflow directory in /scratch to combat the purge.
  2. Determine which participants in dset do not have the pre-processed T1-weighted file output by fMRIprep.
  3. Submit workflow.control_fmriprep for a batch of said subjects.

Output will be saved to the respective derivatives folder in the project directory.

Usage

Submit an sbatch job, capture the stdout/err in logs. Specifying the code directory is required.

code_dir="$(dirname "$(pwd)")"

sbatch --job-name=runPrep \
    --output=${code_dir}/logs/runPrep_log \
    --mem-per-cpu=4000 \
    --partition=IB_44C_512G \
    --account=iacc_madlab \
    --qos=pq_madlab \
    fmriprep.py \
    -c $code_dir

AFNI subject task - afni_task_subj.py

Extra pre-processing steps not done by fMRIprep, as well as a setup for an AFNI-style deconvolution, are conducted by afni_task_subj.py. Steps are as follow:

  1. This script will read in logs/completed_preprocessing.tsv and from that dataframe make a list of potential subjects.
  2. Determine whether a decon_plan has been supplied, and if so read in the json
  3. Determine which subjects have fMRIprep output AND are missing an eroded white matter mask, session-task intersection mask, session-task scaled files, and/or the intended deconvolution file.
  4. For each subject in a batch of size N, the script will then submit a workflow.control_afni.control_preproc to build afni_data, and also workflow.control_afni.control_deconvolution. This is accomplished via the submit_jobs function.
  5. Certain output is saved to project directory derivatives/afni, and housekeeping is conducted (see submit_jobs).

Naming conventions largely follow the BIDS specification, but some AFNI-esque filenames are also employed.

Usage

Submit an sbatch job, capture the stdout/err in logs. Specifying the session, task, and code directory is required.

code_dir="$(dirname "$(pwd)")"

sbatch --job-name=runAfniTask \
    --output=${code_dir}/logs/runAfniTask_log \
    --mem-per-cpu=4000 \
    --partition=IB_44C_512G \
    --account=iacc_madlab \
    --qos=pq_madlab \
    afni_task_subj.py \
    -s ses-S2 \
    -t task-test \
    -c $code_dir \
    --blur

AFNI subject resting - afni_resting_subj.py

Extra pre-processing steps not done by fMRIprep, as well as a setup for an AFNI-style resting state regression, are conducted by afni_resting_subj.py. Steps are as follow:

  1. This script will read in logs/completed_preprocessing.tsv and from that dataframe make a list of potential subjects.
  2. Determine which subjects have fMRIprep output AND are missing an eroded white matter mask, session-task intersection mask, session-task scaled files, and/or the intended correlation matrix.
  3. For each subject in a batch of size N, the script will then submit a workflow.control_afni.control_preproc to build afni_data, and also workflow.control_afni.control_resting. This is accomplished via the submit_jobs function.
  4. Certain output is saved to project directory derivatives/afni, and housekeeping is conducted (see submit_jobs).

Naming conventions largely follow the BIDS specification, but some AFNI-esque filenames are also employed.

Usage

Submit an sbatch job, capture the stdout/err in logs. Specifying the code directory is required, and --blur not recommended.

code_dir="$(dirname "$(pwd)")"

sbatch --job-name=runAfniRest \
    --output=${code_dir}/logs/runAfniRest_log \
    --mem-per-cpu=4000 \
    --partition=IB_44C_512G \
    --account=iacc_madlab \
    --qos=pq_madlab \
    afni_resting_subj.py \
    -c $code_dir

AFNI group task - afni_task_group.py

Group pairwise comparisons are conducted via 3dttest++ implementation of Equitable Thresholding and Clustering (ETAC). The steps employed are as follows:

  1. This script will read in logs/completed_preprocessing.tsv and from that dataframe make a list of potential subjects.
  2. Determine which subjects have both an EPI-anat intersection mask and a deconvolved file.
  3. Submit the group-level module workflow.control_afni.control_task_group. This is accomplished via the submit_jobs function.
  4. Output is saved to project directory derivatives/afni/analyses.

Usage

Submit an sbatch job, capture the stdout/err in logs. The --blur option should be identical to that used in cli/afni_task_subj.py. Specifying the code directory, session, task, decon filename, and list of behaviors is required.

code_dir="$(dirname "$(pwd)")"

sbatch --job-name=runTaskGroup \
    --output=${code_dir}/logs/runAfniTaskGroup_log \
    --mem-per-cpu=4000 \
    --partition=IB_44C_512G \
    --account=iacc_madlab \
    --qos=pq_madlab \
    afni_task_group.py \
    --blur \
    -c $code_dir \
    -s ses-S1 \
    -t task-study \
    -d decon_task-study_UniqueBehs_stats_REML+tlrc \
    -b neg neu

AFNI group resting - afni_resting_group.py

Conduct and A vs not-A analysis via 3dttest++ ETAC methods. The steps employed are:

  1. This script will read in logs/completed_preprocessing.tsv and from that dataframe make a list of potential subjects.
  2. Determine which subjects have both an EPI-anat intersection mask and a Z-transformed file.
  3. Submit the group-level module workflow.control_afni.control_resting_group. This is accomplished via the submit_jobs function.
  4. Output is saved to project directory derivatives/afni/analyses.

Usage

Submit an sbatch job, capture the stdout/err in logs. The --blur option should be identical to that used in cli/afni_resting_subj.py. Specifying the code directory and seed name (used to generate the Z-transformed matrix) is required.

code_dir="$(dirname "$(pwd)")"

sbatch --job-name=runRSGroup \
    --output=${code_dir}/logs/runAfniRestGroup_log \
    --mem-per-cpu=4000 \
    --partition=IB_44C_512G \
    --account=iacc_madlab \
    --qos=pq_madlab \
    afni_resting_group.py \
    -c $code_dir \
    -s rPCC

Are you still awake?