-
Notifications
You must be signed in to change notification settings - Fork 3
Psytools
We use Psytools questionnaires to collect demographics, clinical and environmental data.
Here we merely describe how questionnaires are downloaded from the Delosis server as CSV files, then anonymized and further pseudonymized.
Store identifiers for the Delosis server in the ~/.netrc
file of the service account on the c-VEDA server.
Install Ubuntu package moreutils which provides ts
and create a specific directory for log files:
sudo apt-get install moreutils
sudo mkdir -p /var/log/databank/psytools
sudo chown -R databank:databank /var/log/databank
Finally set a crontab on the c-VEDA server to schedule daily downloads from the Delosis server, at 6 AM IST, using Python script cveda_databank/psytools/cveda_psytools_download.py
, and ts
to add nice timestamps to logs:
sudo crontab -u databank - <<EOF
0 6 * * * /cveda/databank/framework/cveda_databank/psytools/cveda_psytools_download.py 2>&1 | ts '\%Y-\%m-\%d \%H:\%M:\%S \%Z LOG:' >> /var/log/databank/psytools/download.log
EOF
We rely on the date of birth (DOB) to further anonymize questionnaires, by changing the date of all events related to a subject (be it a child or a parent) into an age in days at the event. For that we subtract the date of the event to the date of birth. Here is an example of typical Python code:
from datetime import datetime
from cveda_databank import DOB_FROM_PSC1
psc1 = '110001234567'
date_of_birth = DOB_FROM_PSC1[psc1]
timestamp_of_event = datetime.strptime(event, '%Y-%m-%d %H:%M:%S.%f')
date_of_event = timestamp_of_event.date()
age_in_days = date_of_event - date_of_birth
The authoritative sources for the date of birth are Psytools questions ACEIQ_C2
and PHIR_01
. In case these sources provide different values or no values, an additional Excel file is provided (PSC 1 code & D-O-B.xlsx
as of 2017-02-01) that contains results from further investigations. If no value is given at all, neither from the authoritative sources nor the additional Excel file, dates are erased; in such cases researchers fall back on other questions that specifically ask for the age in years.
The goal is to further de-identify questionnaires:
- PSC1 codes are converted to PSC2 codes in field
User code
. - We remove cross-checking questions without scientific purpose from all Psytools questionnaires. Specifically questions
id_check_gender
,ID_check_gender
,id_check_dob
,ID_check_dob
should only be used for error checking with PSC1 codes and not used with PSC2). - We substitute age of subject in days for dates, as described above:
- in time stamps
Completed Timestamp
andProcessed Timestamp
for all questions, - for questions
ACEIQ_C3
,PDS_07a
,PHIR_01
andPHIR_02
.
- in time stamps
- In the future we may remove or sanitize other questions, if they contain other sensitive information such as names.
There is no codebook yet. Field Trial result
in CSV files exported from Psytools includes:
- dates (format is
dd-mm-yyyy
), - measures of length (formatting rules are quite loose and include
12.3 inch
,12.3INCH
,12.3CM
,12.3 cms
,12.3
which requires extensive curation to extract useful information), - integers,
- floats,
- a choice of
F
andM
for gender - ...
This needs to be improved as questionnaires mature.