Skip to content

Getting Started

Nathan Muncy edited this page Mar 9, 2022 · 9 revisions

Step One: Update logs

The project is oriented around logs/preprocessing_completed.tsv, which is generated by cli/checks.py. To start, then, the first step is to execute cli/checks.py. This will read the subjects found in the project directory (supplied via the [-p] argument) and look for a set of expected files. The file logs/preprocessing_completed.tsv will be updated with timestamps of file generation for any encountered files.

If additional resources are added in the future, such as FreeSurfer and fMRIprep, the dictionary expected_dict and list col_names in resources.reports.check_complete.check_preproc should be updated, and a new logs/preprocessing_completed.tsv generated via the new_df argument. Also, if only part of your data exist in one location (e.g. the HPC), you can fill your logs/preprocessing_completed.tsv by also running cli/checks.py locally.

Note -- a personal access token to github.com/emu-project is required for checks.py.

Step Two: Run cli script

A set of scripts exist in cli for the following workflows:

  • ASHS - ashs.py
  • Refacing data - reface.py
  • AFNI subject task - afni_task_subj.py
  • AFNI subject resting state - afni_resting_subj.py
  • AFNI group task - afni_task_group.py
  • AFNI group resting state - afni_resting_group.py

These scripts are intended to be executed from the command line. Access the help built into these scripts via python script_name.py. This will print out usage examples, default/optional options, and required options. Default use should work via copy-and-pasting the example into the terminal.

The cli scripts will import logs/preprocessing_completed.csv and use this log to determine for which participants the script should submit jobs. The cli scripts will then schedule a job with SLURM for each of a set of participants; as resources are limited and shared the batch size is set to 8 participants but can be adjusted via an optional argument. This parent job for each participant (p1234) is the scheduled workflow, and will spawn children jobs as needed (1234name).

Stdout, stderr

Output can be found in three locations, according to whether it comes from the cli script, parent job, or child job:

  1. Output from cli/script.py will be located in a user-specified location, the examples use the logs directory. This is controlled by the sbatch submission command.
  2. Output of the parent job (p1234) will be written to slurm_out directory, most likely located in /scratch/madlab/McMakin_EMUR01/derivatives/<foo>. Here, stdout/err of the workflow, as well as the generated python workflow script will be written to a time-stamped directory. This is controlled via the slurm_dir variable in main() of cli/script.py.
  3. Output from child jobs (1234name) will be written to an sbatch_out location, most likely in /scratch/madlab/McMakin_EMUR01/derivatives/<foo>/sub-1234/ses-S?/sbatch_out. This is controlled by resources.afni.submit.submit_hpc_sbatch.

Step Three: Check output

For individual subject workflows, intermediates are written to /scratch on the HPC when using default options. If all assertion checks pass, certain files will then be copied to the main project directory and the directories removed from /scratch. This is controlled by the submit_jobs function in cli/script.py.

Group-level jobs (afni_[task|resting]_group.py) have output written directly to the main project directory.

If assertion checks do not pass, check stdout/err in locations described above. Output from the parent job (number 2 above) should point to the resource module that failed (e.g. AssertionError some file not found: check resources.afni.foo.bar).

Clone this wiki locally