Skip to content
Nathan Muncy edited this page Mar 9, 2022 · 5 revisions

The logs directory serves two purposes. First, it is a convenient output location for stdout and stderr, this is the default output location for the cli/name.py scripts when executing according to the examples.

Second, and much more importantly, logs contains completed_preprocessing.tsv. This file is generated by cli/checks.py, and more specifically the module resources.reports.check_complete.check_preproc. Each script in cli, aside from checks.py, will first load completed_preprocessing.tsv as a dataframe in order to determine which participants are missing data and need to potentially be included in the next batch. The rationale for this is that it obviates the need to keep all files remotely and allows for them to be brought locally, freeing up remote space, while still allowing for remote scripts to determine missing data.

The logs/completed_preprocessing.tsv is always synchronized with the GitHub project repo, and updates are only added to empty cells. In this way cli/checks.py could be executed locally to index the files that have been offloaded, and then the remote would pull the updated file to determine which subjects to submit.

completed_preprocessing.tsv is a wide-formatted dataframe, containing one row per subject and one column per endpoint file (which are files that reflect the end of certain pipeline streams and are needed for future analyses). Additional fields can be added by updating the module, but a new dataframe would have to be generated when fields are added (via new_df=True). The cells contain a timestamp of when the endpoint file was generated.

Clone this wiki locally