-
Notifications
You must be signed in to change notification settings - Fork 5
Getting Started
The project is oriented around logs/preprocessing_completed.tsv
, which is generated by cli/checks.py
. To start, then, the first step is to execute cli/checks.py
. This will read the subjects found in the project directory (supplied via the [-p]
argument) and look for a set of expected files. The file logs/preprocessing_completed.tsv
will be updated with timestamps of file generation for any encountered files.
If additional resources are added in the future, such as FreeSurfer and fMRIprep, the dictionary expected_dict
and list col_names
in resources.reports.check_complete.check_preproc
should be updated, and a new logs/preprocessing_completed.tsv
generated via the new_df
argument. Also, if only part of your data exist in one location (e.g. the HPC), you can fill your logs/preprocessing_completed.tsv
by also running cli/checks.py
locally.
Note -- a personal access token to github.com/emu-project
is required for checks.py
.
A set of scripts exist in cli
for the following workflows:
- ASHS - ashs.py
- Refacing data - reface.py
- AFNI subject task - afni_task_subj.py
- AFNI subject resting state - afni_resting_subj.py
- AFNI group task - afni_task_group.py
- AFNI group resting state - afni_resting_group.py
These scripts are intended to be executed from the command line. Access the help built into these scripts via python script_name.py
. This will print out usage examples, default/optional options, and required options. Default use should work via copy-and-pasting the example into the terminal.
The cli
scripts will import logs/preprocessing_completed.csv
and use this log to determine for which participants the script should submit jobs. The cli
scripts will then schedule a job with SLURM for each of a set of participants; as resources are limited and shared the batch size is set to 8 participants but can be adjusted via an optional argument. This parent job for each participant (p1234) is the scheduled workflow, and will spawn children jobs as needed (1234name).
Output can be found in three locations, according to whether it comes from the cli
script, parent job, or child job:
- Output from
cli/script.py
will be located in a user-specified location, the examples use the logs directory. This is controlled by the sbatch submission command. - Output of the parent job (p1234) will be written to slurm_out directory, most likely located in
/scratch/madlab/McMakin_EMUR01/derivatives/<foo>
. Here, stdout/err of the workflow, as well as the generated python workflow script will be written to a time-stamped directory. This is controlled via theslurm_dir
variable inmain()
ofcli/script.py
. - Output from child jobs (1234name) will be written to an sbatch_out location, most likely in
/scratch/madlab/McMakin_EMUR01/derivatives/<foo>/sub-1234/ses-S?/sbatch_out
. This is controlled byresources.afni.submit.submit_hpc_sbatch
.
For individual subject workflows, intermediates are written to /scratch
on the HPC when using default options. If all assertion checks pass, certain files will then be copied to the main project directory and the directories removed from /scratch
. This is controlled by the submit_jobs
function in cli/script.py
.
Group-level jobs (afni_[task|resting]_group.py
) have output written directly to the main project directory.
If assertion checks do not pass, check stdout/err in locations described above. Output from the parent job (number 2 above) should point to the resource module that failed (e.g. AssertionError some file not found: check resources.afni.foo.bar
).