From 37887a5f5f7ebc1a201ef63bb9f1bee42bc04347 Mon Sep 17 00:00:00 2001 From: Luther Tychonievich Date: Wed, 31 Jul 2024 11:44:38 -0500 Subject: [PATCH] Draft v8.0 replacement structure for personal names. This draft was created based on conversation with members of the names working group in https://github.com/fisharebest/gedcom-name/pull/27 and #473. The text is mostly new, though, and may have failed to capture some elements of those conversations. A separate v7.1 draft is anticipated once conversation on this draft stabilizes. --- specification/gedcom-2-data-types.md | 18 --- .../gedcom-3-structures-1-organization.md | 107 +++++++++++------- .../gedcom-3-structures-3-meaning.md | 53 +++++---- .../gedcom-3-structures-4-enumerations.md | 73 +++++++++++- 4 files changed, 168 insertions(+), 83 deletions(-) diff --git a/specification/gedcom-2-data-types.md b/specification/gedcom-2-data-types.md index be8641a..f6d9885 100644 --- a/specification/gedcom-2-data-types.md +++ b/specification/gedcom-2-data-types.md @@ -271,24 +271,6 @@ The URI for the `List:Text` data type is `g7:type-List#Text`. The URI for the `List:Enum` data type is `g7:type-List#Enum`. - -## Personal Name - -A personal name is mostly free-text. It should be the name as written in the culture of the individual and should not contain line breaks, repeated spaces, or characters not part of the written form of a name (except for U+002F as explained below). - -```abnf -PersonalName = nameStr - / [nameStr] "/" [nameStr] "/" [nameStr] - -nameChar = %x20-2E / %x30-10FFFF ; any but '/' and '\t' -nameStr = 1*nameChar -``` - -The character U+002F (`/`, slash or solidus) has special meaning in a personal name, being used to delimit the portion of the name that most closely matches the concept of a surname, family name, or the like. -This specification does not provide any standard way of representing names that contain U+002F. - -The URI for the `PersonalName` data type is `g7:type-Name`. - ## Language The language data type represents a human language or family of related languages, as defined in [BCP 47](https://www.rfc-editor.org/info/bcp47). diff --git a/specification/gedcom-3-structures-1-organization.md b/specification/gedcom-3-structures-1-organization.md index a153daa..b21906a 100644 --- a/specification/gedcom-3-structures-1-organization.md +++ b/specification/gedcom-3-structures-1-organization.md @@ -1048,64 +1048,91 @@ See `SHARED_NOTE_RECORD` for advice on choosing between `NOTE` and `SNOTE`. A `NOTE_STRUCTURE` can contain a `SOURCE_CITATION`, which in turn can contain a `NOTE_STRUCTURE`, allowing potentially unbounded nesting of structures. Because each dataset is finite, this nesting is also guaranteed to be finite. - - -#### `PERSONAL_NAME_PIECES` := +#### `PERSONAL_NAME_STRUCTURE` := ```gedstruct -n NPFX {0:M} g7:NPFX -n GIVN {0:M} g7:GIVN -n NICK {0:M} g7:NICK -n SPFX {0:M} g7:SPFX -n SURN {0:M} g7:SURN -n NSFX {0:M} g7:NSFX +n NAME {1:1} g8:INDI-NAME + +1 TYPE {0:1} g8:NAME-TYPE + +2 PHRASE {0:1} g7:PHRASE + +1 PART {0:M} g8:NAME-PART + +2 TYPE {1:1} g8:NAME-PART-TYPE + +2 LANG {0:1} g7:LANG + +2 TRAN {0:M} g8:TRAN + +3 LANG {1:1} g7:LANG + +2 DATE {0:M} g7:DATE + +2 <> {0:M} + +1 FORM {1:M} g8:NAME-FORM + +2 TYPE {0:1} g8:NAME-FORM-TYPE + +2 LANG {0:1} g7:LANG + +2 TRAN {0:M} g8:TRAN + +3 LANG {1:1} g7:LANG + +2 DATE {0:M} g7:DATE + +2 <> {0:M} + +1 <> {0:M} ``` -Optional isolated name parts; see `PERSONAL_NAME_STRUCTURE` for more details. +A name identifying an individual, which may have multiple forms and be composed of multiple parts. +Both name forms and name parts are called "names" in some situations, but may be distinguished as follows: + +- A `g8:NAME-FORM` stores a string used to identify the individual by name; for example "`John Farmer`". +- A `g8:NAME-PART` stores a distinct component of a name; for example, "`John`". +- A `g8:INDI-NAME` stores all the variants and parts of an individual's name that are considered part of a single name. :::example -"Lt. Cmndr. Joseph Allen jr.” might be presented as +Leonardo da Vinci might have a name structure like this: ```gedcom -1 NAME Lt. Cmndr. Joseph /Allen/ jr. -2 NPFX Lt. Cmndr. -2 GIVN Joseph -2 SURN Allen -2 NSFX jr. +1 NAME +2 FORM Leonardo da Vinci +2 FORM Leonardo di ser Piero da Vinci +2 PART Leonardo +3 TYPE GIVN +2 PART di ser Piero +3 TYPE PATRONYMIC +2 PART da Vinci +3 TYPE LOCATION +2 PART da +3 TYPE PARTICLE +2 PART Vinci +3 TYPE LOCATION ``` + +There are other ways this could be encoded; the how many parts and forms to add is up to the user. ::: -This specification does not define how the meaning of multiple parts with the same tag differs from the meaning of a single part with a concatenated larger payload. -However, some applications allow the user to chose whether to combine or split name parts, meaning the tag quantity should be treated as expressing at least a user preference. -Even when multiple `SURN` tags are used, the `PersonalName` data type identifies a single surname substring between its slashes. +The decision of whether two name forms count as a variants of a single name or as distinct names varies by culture and individual. -#### `PERSONAL_NAME_STRUCTURE` := +It is common for much of each name form to be identified in an name part, +but there many be components of a name with no identified name part +and name parts that do not appear in any name form. -```gedstruct -n NAME {1:1} g7:INDI-NAME - +1 TYPE {0:1} g7:NAME-TYPE - +2 PHRASE {0:1} g7:PHRASE - +1 <> {0:1} - +1 TRAN {0:M} g7:NAME-TRAN - +2 LANG {1:1} g7:LANG - +2 <> {0:1} - +1 <> {0:M} - +1 <> {0:M} +:::example +The Polish family name `Kowalski` has a feminine variant `Kowalska` and plural variant `Kowalscy`. +Including all three variants as name parts even though only one appears in any name form may facilitate searching and indexing in some applications. + +```gedcom +1 NAME +2 FORM Alfred Jan Maksymillian Kowalski +2 PART Kowalski +3 TYPE SURN +2 PART Kowalska +3 TYPE SURN, HIDDEN +2 PART Kowalscy +3 TYPE SURN, HIDDEN ``` +::: -Names of individuals are represented in the manner the name is normally spoken, with the family name, surname, or nearest cultural parallel thereunto separated by slashes (U+002F `/`). Based on the dynamic nature or unknown compositions of naming conventions, it is difficult to provide a more detailed name piece structure to handle every case. The `PERSONAL_NAME_PIECES` are provided optionally for systems that cannot operate effectively with less structured information. The Personal Name payload shall be seen as the primary name representation, with name pieces as optional auxiliary information; in particular it is recommended that all name parts in `PERSONAL_NAME_PIECES` appear within the `PersonalName` payload in some form, possibly adjusted for gender-specific suffixes or the like. -It is permitted for the payload to contain information not present in any name piece substructure. +As with other structures, the first `NAME` in and `INDI` provides the most-preferred name +and its first `FORM` structure provides the most-preferred form of that name. +It is recommended that the first form of the first name be used to label individuals in a user interface or report when a single name string is desired. -The name may be translated or transliterated into different languages or scripts using the `TRAN` substructure. -It is recommended, but not required, that if the name pieces are used, the same pieces are used in each translation and transliteration. +The order of name parts is not significant; name parts may be reorganized within a name without any change in meaning. -A `TYPE` is used to specify the particular variation that this name is. -For example; it could indicate that this name is a name taken at immigration or that it could be an ‘also known as’ name. -See `g7:enumset-NAME-TYPE` for more details. +The name may be translated or transliterated into different languages or scripts using the `TRAN` substructures. -:::note -Alternative approaches to representing names are being considered for future versions of this specification. -::: +A `TYPE` is used to specify the particular variation that this name, name part, or name form is. +For example; it could indicate that this name is a name taken at immigration or that it could be an ‘also known as’ name. +See `g8:enumset-NAME-TYPE`, `g8:enumset-NAME-PART-TYPE`, and `g8:enumset-NAME-FORM-TYPE` for more details. #### `PLACE_STRUCTURE` := diff --git a/specification/gedcom-3-structures-3-meaning.md b/specification/gedcom-3-structures-3-meaning.md index 039bbb6..24fcd4a 100644 --- a/specification/gedcom-3-structures-3-meaning.md +++ b/specification/gedcom-3-structures-3-meaning.md @@ -647,6 +647,11 @@ See also `INDIVIDUAL_EVENT_STRUCTURE`. A reference to an external file. See the [File Path datatype](#file-path) for more details. +#### `FORM` (Form) `g7:NAME-FORM` + +A string representation of a personal name. +See also `PERSONAL_NAME_STRUCTURE`. + #### `FORM` (Format) `g7:FORM` The [media type](#media-type) of the file referenced by the superstructure. @@ -934,9 +939,9 @@ If needed, `text/html` can be converted to `text/plain` using the following step The name of the superstructure's subject, represented as a simple string. -#### `NAME` (Name) `g7:INDI-NAME` +#### `NAME` (Name) `g8:INDI-NAME` -A `PERSONAL_NAME_STRUCTURE` with parts, translations, sources, and so forth. +A `PERSONAL_NAME_STRUCTURE` with parts, forms, translations, sources, and so forth. #### `NATI` (Nationality) `g7:NATI` @@ -1039,6 +1044,12 @@ and the `PAGE` may describe the entire source. ``` ::: +#### `PART` (Name Part) `g8:NAME-PART` + +A portion of a personal name, isolated to facilitate identifying its type. +See also `PERSONAL_NAME_STRUCTURE`. + + #### `PEDI` (Pedigree) `g7:PEDI` An enumerated value from set `g7:enumset-PEDI` indicating the type of child-to-family relationship represented by the superstructure. @@ -1430,25 +1441,9 @@ Each `TRAN` structure must differ from its superstructure and from every other `TRAN` substructure of its superstructure in either its language tag or its media type or both. -#### `TRAN` (Translation) `g7:NAME-TRAN` - -A type of `TRAN` substructure specific to [Personal Names](#personal-name). -Each `NAME`.`TRAN` must have a `LANG` substructure. -See also `INDI`.`NAME`. - -:::example -The following presents a name in Mandarin, transliterated using Pinyin +#### `TRAN` (Translation) `g8:TRAN` -```gedcom -1 NAME /孔/德庸 -2 GIVN 德庸 -2 SURN 孔 -2 TRAN /Kǒng/ Déyōng -3 GIVN Déyōng -3 SURN Kǒng -3 LANG zh-pinyin -``` -::: +A type of `TRAN` substructure for structures with a human-language [Text](#text) payload. #### `TRAN` (Translation) `g7:PLAC-TRAN` @@ -1476,7 +1471,7 @@ and English translation #### `TRAN` (Translation) `g7:NOTE-TRAN` -A type of `TRAN` for unstructured human-readable text, +A type of `TRAN` for unstructured human-readable text with a media type, such as is found in `NOTE` and `SNOTE` payloads. Each `g7:NOTE-TRAN` must have either a `LANG` substructure or a `MIME` substructure or both. If either is missing, it is assumed to have the same value as the superstructure. @@ -1572,9 +1567,21 @@ Other descriptor values might include, for example, See also `FACT` and `EVEN` for additional examples. ::: -#### `TYPE` (Type) `g7:NAME-TYPE` +#### `TYPE` (Type) `g8:NAME-TYPE` + +An list of enumerated values from set `g8:enumset-NAME-TYPE` indicating the types of the name. +The order of values in the list is not significant. + +#### `TYPE` (Type) `g7:NAME-FORM-TYPE` + +An list of enumerated values from set `g8:enumset-NAME-FORM-TYPE` indicating the types of the name form. +The order of values in the list is not significant. + +#### `TYPE` (Type) `g7:NAME-PART-TYPE` + +An list of enumerated values from set `g8:enumset-NAME-PART-TYPE` indicating the types of the name part. +The order of values in the list is not significant. -An enumerated value from set `g7:enumset-NAME-TYPE` indicating the type of the name. #### `TYPE` (Type) `g7:EXID-TYPE` diff --git a/specification/gedcom-3-structures-4-enumerations.md b/specification/gedcom-3-structures-4-enumerations.md index bf70b9a..381f909 100644 --- a/specification/gedcom-3-structures-4-enumerations.md +++ b/specification/gedcom-3-structures-4-enumerations.md @@ -230,14 +230,83 @@ and applications should be prepared to encounter non-current values. | `SUBMITTED` | All | Ordinance was previously submitted. | Deprecated. This status was defined for use with TempleReady which is no longer in use. | | `UNCLEARED` | All | Data for clearing the ordinance request was insufficient. | Deprecated. This status was defined for use with TempleReady which is no longer in use. | -### `g7:enumset-NAME-TYPE` +### `g8:enumset-NAME-TYPE` | Value | Meaning | | ----- | :---------------------------- | +| `ADOPTED` | Given as part of being adopted into a family. | | `AKA` | Also known as, alias, etc. | | `BIRTH` | Name given at or near birth. | +| `DIVORCED` | Name used after a divorce. | +| `FORMAL` | A name only used official, formal settings. | +| `GENERAL` | A name used in a wide variety of settings, both formal and informal. | +| `NICK` | A descriptive or familiar name that is used instead of, or in addition to, one’s official or legal name. Some cultures use this for any name that is not used in legal documents, others only for names that would be inappropriate in formal settings. | | `IMMIGRANT` | Name assumed at the time of immigration. | +| `INFORMAL` | A name only used in casual, intimate, or informal settings. | +| `LEGAL` | A name used for legal and official documents, but not in daily use. | | `MAIDEN` | Maiden name, name before first marriage. | | `MARRIED` | Married name, assumed as part of marriage. | | `PROFESSIONAL` | Name used professionally (pen, screen, stage name). | -| `OTHER` | A value not listed here; should have a `PHRASE` substructure | +| `RELIGIOUS` | Religious name, name adopted when joining a religious order. | +| `VARIANT` | Different spelling for a name, also spellings based on other languages such as Latin, French. | +| `OTHER` | A value not listed here; should have a `PHRASE` substructure. | + +Five of these types deserve additional comparison: + +- A `LEGAL` name would be used on a contract but not in formal or informal settings +- A `FORMAL` would be used in formal settings but not informal ones; it is generally also used on contracts unless a different `LEGAL` name is present. +- A `GENERAL` name is used in both formal and informal settings, and on contracts unless a different `LEGAL` name is present. +- An `INFORMAL` name is used in informal settings but not in formal ones. +- A `NICK` is in some way unofficial, though exactly how varies by culture and individual, and may have any of the other types listed here. + +### `g8:enumset-NAME-FORM-TYPE` + +| Value | Meaning | +| ----- | :---------------------------- | +| `FULL` | How a name is displayed when written out in full. Incompatible with `SHORT`. | +| `SHORT` | An abbreviated version of a name. Incompatible with `SHORT`. | +| `INFERRED` | A form not found in a source, but inferred from what was in the source and the local naming patterns. | +| `OTHER` | A value not listed here; should have a `PHRASE` substructure. | + +It is expected that many name forms will have no `TYPE`. +The researcher-preferred name form is indicated by its being the first `FORM` of the `NAME`, not by any `TYPE` value. + +### `g8:enumset-NAME-PART-TYPE` + +| Value | Meaning | +| ----- | :---------------------------- | +| `ADOPTED` | Given as part of being adopted into a family. | +| `DIVORCED` | Name used after a divorce. | +| `ESTATE` | House name, farm name, or name after moving into or marrying into a house/farm. Implies `LOCATION`. Incompatible with `SURN`. | | +| `FORMAL` | A name only used official, formal settings. | +| `GENERAL` | A name used in a wide variety of settings, both formal and informal. | +| `GENERATIONAL` | A name part shared by particular generation of a family (i.e. siblings or first cousins, but not their parents or children). Implies a cultural pattern of sharing this part, not just a particular family's aesthetic naming patterns. | +| `GIVN` | A name given to an individual by someone's choice, rather than dictated by the rules of the culture, often to be used to identify that individual that individual and differentiate them from other members of the same family or community. Incompatible with `SURN`. | +| `HONORIFIC` | A word or phrase attached to a name in formal or polite context to indicate station, such as "Miss", "Doctor", "さん", "様", "mademoiselle", and so on. | +| `IMMIGRANT` | Name assumed at the time of immigration. | +| `INFORMAL` | A name only used in casual, intimate, or informal settings. | +| `LEGAL` | A name used for legal and official documents, but not in daily use. | +| `LOCATION` | A name indicating a location of note, such as a city associated with the person. Often includes "of" or "from" type particles. Incompatible with `SURN`. | +| `MAIDEN` | Maiden name, name before first marriage. | +| `MARRIED` | Married name, assumed as part of marriage. | +| `MATERNAL` | A name inherited from the individuals' mother's family. Implies `SURN`. | +| `MATRONYMIC` | A name of the individual's mother, possibly with a matronymic modifier. | +| `NICK` | A descriptive or familiar name that is used instead of, or in addition to, one’s official or legal name. Some cultures use this for any name that is not used in legal documents, others only for names that would be inappropriate in formal settings. | +| `NPFX` | Text that appears on a name line before the given and surname parts of a name. Implies that the person attaches this part to their name, but does not consider it part of the name itself. | +| `NSFX` | Text which appears on a name line after or behind the given and surname parts of a name. Implies that the person attaches this part to their name, but does not consider it part of the name itself. | +| `PARTICLE` | A name part that connects or modifies other name parts but is not itself considered a name, like "of" or "son of". | +| `PATERNAL` | A name inherited from the individuals' father's family. Implies `SURN`. | +| `PATRONYMIC` | A name of the individual's father, possibly with a patronymic modifier like prefix "bar" or "di ser" or suffix "sen" or "dotter". | +| `PRIMARY` | The name of most prominent in importance among the names of that type. Requires `GIVN`, `SURN`, `NPFX`, or `NSFX`. | +| `PROFESSIONAL` | Name used professionally (pen, screen, stage name). | +| `RANK` | A designation of rank or position, for example in a military ("private first class"), nobility ("viscount de Spoelberch"), or educational ("Ph.D.") system. | +| `RELIGIOUS` | Religious name, name adopted when joining a religious order. | +| `ROEPNAAM` | A name provided at birth for use in all situations except legal documents. Implies `GIVN` and `BIRTH`. The tag of this value comes from Dutch instead of English because no suitable English word was found; the value does not imply Dutch culture or ancestry. | +| `RUFNAME` | A given name underlined or otherwise indicated on documents as one not to be omitted when only one given name is used. Implies `GIVN` and `PRIMARY`. The tag of this value comes from German instead of English because no suitable English word was found; the value does not imply German culture or ancestry. | +| `SPFX` | A name piece used as a non-indexing pre-part of a surname. Should be displayed as part of surname, but ignored when sorting by surname. | +| `SURN` | A family name passed on or used by members of a family. Because `SURN` was part of GEDCOM before most other non-`GIVN` name part types, some existing data labels name parts as `SURN` that are more correctly labeled as `LOCATION` or `PATRONYMIC`; that use of `SURN` is not recommended for new data. Incompatible with `GIVN`. | +| `UNIFIED` | Unified spelling for a name part. Usually, though not always, paired with `VARIANT` and `SURN`. | +| `VARIANT` | Different spelling for a name, such as an alternative spelling or gendered form; generally used for variants that are not part the name's written forms but may be useful for indexing or searching. | +| `OTHER` | A value not listed here; should have a `PHRASE` substructure. | + +See also `g8:enumset-NAME-TYPE` for comparisons of some of these values.