masterfile

Tools to organize, document, and validate the variables of interest in scientific studies

Command line usage

masterfile --help will list all the subcommands.

Create

masterfile create masterfile_path out_file

Join

masterfile join masterfile_path out_file

Extract

masterfile extract [-s|--skip ROWS] [--index_column COL]
                      masterfile_path csv_file out_file

Validate

asterfile validate masterfile_path [file [file ...]]

Draft API usage example

import masterfile
# Load all of the .csv files from /path, and the dictionary files in
# /path/dictionaries. Takes settings info from a 'settings.json' file in
# /path.
# joins the .csv files on 'participant_id', which will be used as the index
# There will be warnings if the data look bad in some way
mf = masterfile.load('/path')
# Get the pandas dataframe associated
df = mf.dataframe  # aliased as mf.df

# All the variable stuff is less important, people can go look in data dicts
# So we'll write that stuff later.
v = mf.lookup('sr_t1_panas_pa')
v.contacts # list_of_names
v.measure.contact  # Someone
v.modality # Component("self-report")

CSV file format

CSV files should be comma-separated (no surprise there) and have DOS line endings (CRLF). They should not have the stupid UTF-8 signature at the start. UTF-8 characters are fine. Missing data is indicated by an empty cell. Quoting should be like Excel does.

Basically, you want Excel-for-Windows-style CSV files with no UTF-8 signature.

Dictionaries

CSV format
Has AT LEAST two columns: component, short_name
Those are the indexes
There shouldn't be any repeats in the index
The settings.json file should contain a "components" thing that says what should exist in the component column
Things with blank component are ignored (TODO: Maybe?)

Exclusion files

CSV format
Live in exclusions/
One row per ppt, one column per value
Has index column, same as data file
Blanks mean "Use this value," nonblanks mean "exclude this value"
Things in the cells may be codes; these codes may be defined in settings.json
If data is excluded for more than one reason, separate codes with ","
Not all rows / columns in masterfiles need to be included in exclusion files. Missing rows / columns are treated like blank values.

Data checks

Here are some (all?) of the things to do to verify you have semantically reasonable data:

Variable parts not in dictionaries
Missing participant_id column
Repeated paticipant_id column
Blanks in participant_id column
Duplicate columns
Column names not matching format

Getting started for development

Create a virtualenv:

virtualenv ~/env/masterfile
source ~/env/masterfile/bin/activate

Install the requirements and this module for development:

pip install -r requirements_dev.txt
pip install -e .

Run tests:

pytest

Run tests across all supported Python versions:

tox

To run in a specific python version:

tox -e py37

Credits

Written by Nate Vack [email protected] with help from Dan Fitch [email protected]

masterfile packages some wonderful tools: schema and attrs.

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
.github/workflows		.github/workflows
masterfile		masterfile
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pytest.ini		pytest.ini
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

masterfile

Tools to organize, document, and validate the variables of interest in scientific studies

Command line usage

Create

Join

Extract

Validate

Draft API usage example

CSV file format

Dictionaries

Exclusion files

Data checks

Getting started for development

Credits

About

Releases 2

Packages

Contributors 3

Languages

License

uwmadison-chm/masterfile

Folders and files

Latest commit

History

Repository files navigation

masterfile

Tools to organize, document, and validate the variables of interest in scientific studies

Command line usage

Create

Join

Extract

Validate

Draft API usage example

CSV file format

Dictionaries

Exclusion files

Data checks

Getting started for development

Credits

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages