Skip to content

HelixTripleFileFormat

Maxie D. Schmidt edited this page Nov 13, 2019 · 9 revisions

Specification of the helix triple file format (*.helix)

The sample helix files input by RNAStructViz are restricted to contain exactly one structure per file. The core components of the file format are as follows (in order):

  1. Header Comment Line: First line(s) of the file which start with a ">" character are interpreted as ascii text comments which are not a part of the sequence structure or base specifications.
  2. Sequence Base String: A string of (uppercase) A/C/G/U characters without spaces or punctuation that specifies the base types of the sequence.
  3. Helix Triple Data: All components of a single contiguous helix are expected to be on the same line, though there may be multiple lines in the file specifying helix data. We interpret a helix triple (i, j, k) to be of the form produced by SFold, which corresponds to there being a set of k contiguous base pairs of the form (i, j), (i + 1, j - 1), ..., (i + k - 1, j - k + 1). The helix triples all combined together completely specify the secondary structure of the sequence. Helix triples may be given one per line as in
1 115 8
14 66 2
16 63 6
26 54 3
29 49 4
68 104 2
71 101 2
77 95 1
80 92 5

or alternately with multiple triples on the same line delimited by commas and plain spaces as in:

6 116 4, 10 80 2, 12 77 4, 16 67 1, 17 64 5, 24 57 2, 27 54 11, 83 108 4, 89 102 4

Within RNAStructViz, the new Helix Triple Format files are recognized using extensions of .helix or .hlx.

Examples of the new format

First file (helix triples on separate lines)

> Structure 1 -35.00 0.59688E-02
UAAAGUUUUCUUUCAGGGAAUUAAAAUUUGAUCAUGGUUUAAGAUGAUUUAAAAUGGUAUUAUCUAAAUUUGAUUUACAGAGUAGGCAAUAAAAAUUUACCUCGGCAAUUUAUCGCUUGUAAAAUACUUGUUCCAGAAUAAUCGGCUAGACUUGUUAAAGCUUGUACUUUAAUUGAUGUUAAUUAUGAAAUUAUUAUAUUUUCUUUUAGAUCUAUGGUAGAAUUUGGAUUUAUAUUAGUGAAUUUUCAUAAUUUUAAGAUUUGUUGAACAAAGCAGAUUAGUACCUGGUUAGACAAAAAUUAAAAGAGCAGGAGUAAAGUUGUAUUUAAACUGAAAAGAUAUUGGCAGACAUUCUAAAUUAUCUUUGGAGGCUGAGUAGUAACUGAGAACCCUCAUUAACUACUUAAUUUUUUGACUCGUGUAUGAUCGUUUAUUUUAUUCUUAAGGAUUAUAAUAAAAAAUUUUUAAUUUAUUAAAAUAGAUAUAUACCCGGUUUAUGAUUUAAGAAACAUUUGGCCUACAAUAUUUUAUAUUAUGGAUUUUAGUUUUAGUUAACUAAAUGAAAUUGUAAAAGACAGUAAAAAAUUCUUAAUGUAUUUUUGAAGAUUAUCUAGAAGUGGUACAAAUCAUCCAUCAAUUGCCCAAAGGGGAGUAAGUUGUAGUAAAGUAGAUUUAGGGGAACCUGAAUCUAGUAAUA
1 115 8
14 66 2
16 63 6
26 54 3
29 49 4
68 104 2
71 101 2
77 95 1
80 92 5

Second file (multiple triples specified per line)

> Structure 2 -36.00 0.18584E-01
UAAAGUUUUCUUUCAGGGAAUUAAAAUUUGAUCAUGGUUUAAGAUGAUUUAAAAUGGUAUUAUCUAAAUUUGAUUUACAGAGUAGGCAAUAAAAAUUUACCUCGGCAAUUUAUCGCUUGUAAAAUACUUGUUCCAGAAUAAUCGGCUAGACUUGUUAAAGCUUGUACUUUAAUUGAUGUUAAUUAUGAAAUUAUUAUAUUUUCUUUUAGAUCUAUGGUAGAAUUUGGAUUUAUAUUAGUGAAUUUUCAUAAUUUUAAGAUUUGUUGAACAAAGCAGAUUAGUACCUGGUUAGACAAAAAUUAAAAGAGCAGGAGUAAAGUUGUAUUUAAACUGAAAAGAUAUUGGCAGACAUUCUAAAUUAUCUUUGGAGGCUGAGUAGUAACUGAGAACCCUCAUUAACUACUUAAUUUUUUGACUCGUGUAUGAUCGUUUAUUUUAUUCUUAAGGAUUAUAAUAAAAAAUUUUUAAUUUAUUAAAAUAGAUAUAUACCCGGUUUAUGAUUUAAGAAACAUUUGGCCUACAAUAUUUUAUAUUAUGGAUUUUAGUUUUAGUUAACUAAAUGAAAUUGUAAAAGACAGUAAAAAAUUCUUAAUGUAUUUUUGAAGAUUAUCUAGAAGUGGUACAAAUCAUCCAUCAAUUGCCCAAAGGGGAGUAAGUUGUAGUAAAGUAGAUUUAGGGGAACCUGAAUCUAGUAAUA
6 116 4, 10 80 2, 12 77 4, 16 67 1, 17 64 5, 24 57 2, 
27 54 11, 83 108 4, 89 102 4