-
Notifications
You must be signed in to change notification settings - Fork 5
resources reports
The sub-package reports
contains the single module check_complete
which consists of two functions: clone_guid
and check_preproc
.
clone_guid
is used to clone the pseudo_guid_list.csv
from the private repository github.com/data_pulling.git. This csv file contains subject identifiers, NDA guid keys, consent information, and extra notes. clone_guid
returns the csv as a dataframe.
check_preproc
is the function that is called by the script cli/checks.py
. It contains the dictionary expected_dict
that is specified by the researcher; this dictionary contains keys corresponding to derivatives sub-directories (e.g. ashs, afni), and the value of each key is a list of tuples. Tuple[0] is used to create the column names of logs/completed_preprocessing.tsv
, and tuple[1] is a unique string that is used by glob
to find a single file.
For instance:
{
"ashs": [
("ashs_L", "left_lfseg_corr_usegray"),
("ashs_R", "right_lfseg_corr_usegray"),
]
}
uses the key ashs
to search for data in derivatives/ashs
. The value of ashs
is a list of 2 tuples. The first tuple ("ashs_L", "left_lfseg_corr_usegray")
will make a column in logs/completed_preprocessing.tsv
called ashs_L
, and left_lfseg_corr_usegray
will be used by glob
to identify the output of the ASHS workflow.
The list col_names
is also specified by the researcher, and contains the tuple[0] values. This is used when generating new dataframes.
Next, the dataframe returned by clone_guid
is used to make a list of subjects, not including an participants who have been excluded or withdrawn consent. The previous logs/completed_preprocessing.tsv
is loaded as a dataframe (or a new one is generated, according to the new_df
value), and, using the subject, list, the completed_preprocessing dataframe is iterated through, using the tuple[0] values to determine the column index and the subject identifier the row index. If a cell does not have a value, the script will then search in the appropriate derivatives value to match the tuple[1] value. If this occurs, the cell of the subject row and tuple[0] column is updated with the write date of the file.
New participants will be added to the dataframe, and the dataframe will be sorted by subject IDs. To maintain versions, the script starts by initiating a git pull of logs/completed_preprocessing.tsv
, and after updating the log will git add/commit/push the same.