Skip to content

Psytools

Dimitri Papadopoulos Orfanos edited this page Feb 1, 2017 · 42 revisions

We use Psytools questionnaires to collect demographics, clinical and environmental data.

Here we merely describe how questionnaires are downloaded from the Delosis server as CSV files, then anonymized and further pseudonymized.

Download questionnaires

Store identifiers for the Delosis server in the ~/.netrc file of the service account on the c-VEDA server.

Install Ubuntu package moreutils which provides ts and create a specific directory for log files:

sudo apt-get install moreutils
sudo mkdir -p /var/log/databank/psytools
sudo chown -R databank:databank /var/log/databank

Finally set a crontab on the c-VEDA server to schedule daily downloads from the Delosis server, at 6 AM IST, using Python script cveda_databank/psytools/cveda_psytools_download.py, and ts to add nice timestamps to logs:

sudo crontab -u databank - <<EOF
0 6 * * * /cveda/databank/framework/cveda_databank/psytools/cveda_psytools_download.py 2>&1 | ts '\%Y-\%m-\%d \%H:\%M:\%S \%Z LOG:' >> /var/log/databank/psytools/download.log
EOF

Date of birth

We rely on the date of birth (DOB) to further anonymize questionnaires, by changing the date of all events related to a subject (be it a child or a parent) into an age in days at the event. For that we subtract the date of the event to the date of birth. Here is an example of typical Python code:

from datetime import datetime
from cveda_databank import DOB_FROM_PSC1

psc1 = '110001234567'

date_of_birth = DOB_FROM_PSC1[psc1]

timestamp_of_event = datetime.strptime(event, '%Y-%m-%d %H:%M:%S.%f')
date_of_event = timestamp_of_event.date()
age_in_days = date_of_event - date_of_birth

The authoritative sources for the date of birth are Psytools questions ACEIQ_C2 and PHIR_01. In case these sources provide different values or no values, an additional Excel file is provided (PSC 1 code & D-O-B.xlsx as of 2017-02-01) that contains results from further investigations. If no value is given at all, neither from the authoritative sources nor the additional Excel file, dates are erased; in such cases researchers fall back on other questions that specifically ask for the age in years.

Anonymize and pseudonymize questionnaires

The goal is to further de-identify questionnaires:

  1. PSC1 codes are converted to PSC2 codes in field User code.
  2. We remove cross-checking questions without scientific purpose from all Psytools questionnaires. Specifically questions id_check_gender, ID_check_gender, id_check_dob, ID_check_dob should only be used for error checking with PSC1 codes and not used with PSC2).
  3. We substitute age of subject in days for dates, as described above:
    • in time stamps Completed Timestamp and Processed Timestamp for all questions,
    • for questions ACEIQ_C3, PDS_07a, PHIR_01 and PHIR_02.
  4. In the future we may remove or sanitize other questions, if they contain other sensitive information such as names.

Codebook, format and units

There is no codebook yet. Field Trial result in CSV files exported from Psytools includes:

  • dates (format is dd-mm-yyyy),
  • measures of length (formatting rules are quite loose and include 12.3 inch, 12.3INCH, 12.3CM, 12.3 cms, 12.3 which requires extensive curation to extract useful information),
  • integers,
  • floats,
  • a choice of F and M for gender
  • ...

This needs to be improved as questionnaires mature.

Clone this wiki locally