[ENH] Accept Parquet (<entities>_<suffix>.parquet
) as alternative to .tsv
and .tsv.gz
formats
#1792
Labels
<entities>_<suffix>.parquet
) as alternative to .tsv
and .tsv.gz
formats
#1792
Your idea
BIDS has generally followed the convention of adopting human-readable or widely-adopted standards for its files. At 1.0, we used
.tsv
for all tabular files except physiological and stimulus recordings, which use a headerless.tsv.gz
format. In 1.9, we added a headerlessmotion.tsv
file, which is quite large. The eye-tracking BEP (#1128) is underway, which is having to cope with some limitations in the TSV options.In 2024 we now have over a decade of the Apache Parquet format development. The format specification is open, and there is a Project(Arrow) which includes native libraries or bindings for Python, MATLAB, R, Julia, Java, Javascript and C, among others.
For data that do not benefit from human readability (TSV files > ~1k lines), Parquet offers advantages such as typed columns, chunked compression, as well as not requiring round-trips between floating point and ASCII decimal representations.
I propose the following:
.parquet
files anywhere that a TSV or TSV-GZ file is currently permitted..tsv
for high-level metadata tables, such asparticipants.tsv
,*_sessions.tsv
and*_scans.tsv
as well as*_channels.tsv
,*_electrodes.tsv
and similar metadata files.This is pulled out of #197, which is about N-dimensional data. I am excerpting the relevant recent posts here:
@satra (#197 (comment))
@effigies (#197 (comment))
@bendichter (#197 (comment))
The text was updated successfully, but these errors were encountered: