Metadata files are in JSON format. JSON provides a good compromise between able to store structured data and easy of use. Templates and a validator are provided.
This file contains information about the submitting group. This is at the top
of the submitter's tree (see Submission structure) and
applies to all submissions by the group. See
submitter.json
for an example. An
empty template is also available:
submitter.json
.
submitter_id
- symbolic name for the submitter, assigned when the user registers for LRGASP. This will be a valid Python-style identifier names.group_name
- name of the submitting labgroup_url
- URL of the submitting lab page (optional)notes
- notes (optional)contacts
- array of contacts, with the first entry considered the primary contactname
- name of the contactemail
- e-mail of the contact, which can be an e-mail listnotes
- notes about the contact (optional)
This file describes the submission, specifying all data files. Once is create
in each submission directory (see Submission structure). Data
files are either in the submission directory or a sub-directories. All files
paths in submission.json
are relative to the directory containing submission.json
.
See submission.json
for an example.
An empty template is also available: submission.json
.
submitter_id
- must match thesubmitter_id
insubmitter.json
.submission_id
- submitter-define identifier, unique to that submitter and must be a valid Python-style identifierdescription
- description of submissionchallenge_id
- one of the valid challenge identifiers.submission_type
- one ofmodel
orexpression
.model_submission_id
- if anexpression
submission, the modelsubmission_id
for which the expressions were computed.technologies
- sequencing technologies, one or more ofPacBio
,ONT
,Illumina
protocol
- library preparation protocol, values will be defined latersamples
- list of sample namesnotes
- notes (optional)files
- List of files descriptions for submitted files:fname
- name of file (without directory), compressed with required extensions.ftype
- type of file, one of:modelGTF
- Transcript model formatreadModelMap
- Read to transcript model mapexpressionMatrix
- Transcript expression matrix format
md5
- md5 sum of file, as a hexadecimal string (standard output frommd5sum
command)units
- Expression units for expression results matrix:RPM
,RPKM
,FPKM
,TPM
,counts
.notes
- notes about the file (optional)
software
- list of software used by the pipeline:name
- name of software packagedescription
- description of software (optional)version
- version of softwareurl
- URL to software repositorynotes
- notes about software or how it was used (optional)
- separate expression and matrix submission
- replicate
- updates: do we allow overwrite or keep all version
- version
- target data set identifier ==
- add picture ** add read-to transcript file