Skip to content

resources reports

Nathan Muncy edited this page Mar 10, 2022 · 4 revisions

Description

The sub-package reports contains the single module check_complete which consists of two functions: clone_guid and check_preproc.

clone_guid

clone_guid is used to clone the pseudo_guid_list.csv from the private repository github.com/data_pulling.git. This csv file contains subject identifiers, NDA guid keys, consent information, and extra notes. clone_guid returns the csv as a dataframe.

check_preproc

check_preproc is the function that is called by the script cli/checks.py. It contains the dictionary expected_dict that is specified by the researcher; this dictionary contains keys corresponding to derivatives sub-directories (e.g. ashs, afni), and the value of each key is a list of tuples. Tuple[0] is used to create the column names of logs/completed_preprocessing.tsv, and tuple[1] is a unique string that is used by glob to find a single file.

For instance:

{
  "ashs": [
     ("ashs_L", "left_lfseg_corr_usegray"),
     ("ashs_R", "right_lfseg_corr_usegray"),
  ]
}

uses the key ashs to search for data in derivatives/ashs. The value of ashs is a list of 2 tuples. The first tuple ("ashs_L", "left_lfseg_corr_usegray") will make a column in logs/completed_preprocessing.tsv called ashs_L, and left_lfseg_corr_usegray will be used by glob to identify the output of the ASHS workflow.

The list col_names is also specified by the researcher, and contains the tuple[0] values. This is used when generating new dataframes.

Next, the dataframe returned by clone_guid is used to make a list of subjects, not including an participants who have been excluded or withdrawn consent. The previous logs/completed_preprocessing.tsv is loaded as a dataframe (or a new one is generated, according to the new_df value), and, using the subject, list, the completed_preprocessing dataframe is iterated through, using the tuple[0] values to determine the column index and the subject identifier the row index. If a cell does not have a value, the script will then search in the appropriate derivatives value to match the tuple[1] value. If this occurs, the cell of the subject row and tuple[0] column is updated with the write date of the file.

New participants will be added to the dataframe, and the dataframe will be sorted by subject IDs. To maintain versions, the script starts by initiating a git pull of logs/completed_preprocessing.tsv, and after updating the log will git add/commit/push the same.

Clone this wiki locally