Skip to content

Psytools

Dimitri Papadopoulos Orfanos edited this page Mar 15, 2017 · 42 revisions

We use Psytools questionnaires to collect demographics, clinical and environmental data. Tasks may differ in each age band:

Age band ID Age range # Psytools tasks
C1 6-11 29
C2 12-17 33
C3 18-23 33

Here we merely describe how questionnaires are downloaded from the Delosis server as CSV files, then anonymized and further pseudonymized.

Download questionnaires

Store identifiers for the Delosis server in the ~/.netrc file of the service account on the c-VEDA server.

Install Ubuntu package moreutils which provides command ts and create a specific directory for log files:

sudo apt-get install moreutils
sudo mkdir /var/log/databank
sudo chown databank:databank /var/log/databank

Finally set a crontab on the c-VEDA server to schedule daily downloads from the Delosis server, at 6 AM IST, using Python script cveda_databank/psytools/cveda_psytools_download.py and wrapper shell script cveda_databank/psytools/cveda_psytools_all.sh:

sudo crontab -u databank - <<EOF
0 6 * * * /cveda/databank/framework/cveda_databank/psytools/cveda_psytools_all.sh
EOF

Date of birth

We rely on the date of birth (DOB) to further anonymize questionnaires, by changing the date of all events related to a subject (be it a child or a parent) into an age in days at the event. For that we subtract the date of the event to the date of birth. Here is an example of typical Python code:

from datetime import datetime
from cveda_databank import DOB_FROM_PSC1

psc1 = '110001234567'

date_of_birth = DOB_FROM_PSC1[psc1]

timestamp_of_event = datetime.strptime(event, '%Y-%m-%d %H:%M:%S.%f')
date_of_event = timestamp_of_event.date()
age_in_days = date_of_event - date_of_birth

Initially, the authoritative sources for the date of birth used to be Psytools questions ACEIQ_C2 and PHIR_01. Unfortunately there were discrepancies and no way to decide. An additional Excel file is now provided (PSC1_DOB_2017-01-21.xlsx as of this writing) that contains results from further investigations. This will probably be the authoritative source for date of birth. Because new discrepancies were in turn detected in the Excel file itself, I have written a script mri/cveda_mri_collect.py to report discrepancies so that acquisition centres can update the Excel file if possible — the Psytools questionnaires will unfortunately not be updated.

De-identification

The goal is to further de-identify Psytools questionnaires using further pseudonymization and anonymization techniques:

  1. PSC1 codes are converted to PSC2 codes in field User code.
  2. We remove cross-checking questions without scientific purpose from all Psytools questionnaires. Specifically questions id_check_gender, ID_check_gender, id_check_dob, ID_check_dob should only be used for error checking with PSC1 codes and not used with PSC2).
  3. We substitute age of subject in days for dates, as described above:
    • in time stamps Completed Timestamp and Processed Timestamp for all questions,
    • in field Trial result of questions ACEIQ_C3, PDS_07a, PHIR_01 and PHIR_02.
  4. In the future we may remove or sanitize other questions, if they contain other sensitive information such as names.

Codebook, format and units

There is no codebook yet. Field Trial result in CSV files exported from Psytools includes:

  • dates (format is dd-mm-yyyy),
  • measures of length (formatting rules are currently quite loose and include 12.3 inch, 12.3INCH, 12.3CM, 12.3 cms, 12.3 which would require extensive curation to extract useful information),
  • integers,
  • floats,
  • a choice of F and M for gender
  • ...

This needs to be improved as questionnaires mature.

Clone this wiki locally