Shortening tags is not important. #286
-
Might be worthwhile to unlearn the tendency to shrink the size of tags. There is no sacredness to the number four, and indeed, all along we've had three (SEX) and five (underscore plus four). Now there is at least one more SCHMA. What is the benefit of removing the 'E'? Humans don't often read GEDCOM directly, but some of us do, and too much push to shorten tags is at best an inconvenience to those who aren't native speakers of English. I'm not suggesting the other extreme of having really long ones, but four should not be the goal. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 1 reply
-
There comes a point where tag length 'feels' too long, given what is required for expressiveness and differentiating between other tags. Everyone will have a different view on this. For me, 6 characters does feel longer than necessary. This is reinforced by the fact that I write raw GEDCOM as well as read it. There is also something to be said of having tags that are similar in length so that the components align across lines and the cognitive burden of reading it is minimised. |
Beta Was this translation helpful? Give feedback.
-
The steering committee has generally tried to balance the desire to be brief, the desire to match the style of previous versions, and the desire to be expressive when read by English speakers. For example, when introducing
GEDOCM 5.3 (only; not the versions before or after it) had a |
Beta Was this translation helpful? Give feedback.
-
"reduce cognitive burden of reading"—a more effective way of accomplishing this would be allowing leading white space on lines, and comments on lines that end with "@". Implementations (or human editors) could optionally add them, but they would be ignored on import.
|
Beta Was this translation helpful? Give feedback.
-
This was allowed in the 5.5.1 spec, but a survey of existing GEDCOM tools during the 7.0.0 drafting stage revealed that many did not support this feature. We removed it from 7.0 because we felt that interoperability of tools was more important than readability of files. Incidentally, that finding surprised me; stripping leading whitespace seemed trivial to implement. My rationale for the finding (based on thought and not hard evidence) is that a sizable set of tools appear to add features to their parsers on demand when a user complains that a file exported by tool X fails to import correctly, and most software does not add extraneous whitespace so failure to handle it was not reported. |
Beta Was this translation helpful? Give feedback.
-
Both adding and trimming the white space is indeed trivial. I once wrote a perl script that split a GEDCOM into a separate file for each level zero record, added before each level number two times the number spaces, and put each record into a Berkeley database keyed by the xref. And another one that put it all back together, removing the spaces. It's also not hard to add the comments in my example, and even easier to remove them. |
Beta Was this translation helpful? Give feedback.
-
Another program (not by me) that added and removed indentation was LifeLines: “The indentation shown in the examples is not part of GEDCOM format. When LifeLines prepares records for you to edit, however, it always indents the records, making them easier to read and understand. You do not need to follow this indentation scheme when you edit the records. Indentation is removed from the data before it is stored in the database.” |
Beta Was this translation helpful? Give feedback.
The steering committee has generally tried to balance the desire to be brief, the desire to match the style of previous versions, and the desire to be expressive when read by English speakers. For example, when introducing
PHRASE
we spelled it out in full but when introducingSDATE
we abbreviated "Sort date". However, more important than any of these has been avoiding ambiguity and name conflict, which brings us to the specific example:GEDOCM 5.3 (only; not the versions before or after it) had a
SCHEMA
with different semantics. We picked a different tag to avoid potential name collisions with old files.