Skip to content

Latest commit

 

History

History
257 lines (203 loc) · 8.29 KB

README.md

File metadata and controls

257 lines (203 loc) · 8.29 KB

DRAFT MessageFormat 2.0 Data Model

This section defines a data model representation of MessageFormat 2 messages.

Implementations are not required to use this data model for their internal representation of messages. Neither are they required to provide an interface that accepts or produces representations of this data model.

The major reason this specification provides a data model is to allow interchange of the logical representation of a message between different implementations. This includes mapping legacy formatting syntaxes (such as MessageFormat 1) to a MessageFormat 2 implementation. Another use would be in converting to or from translation formats without the need to continually parse and serialize all or part of a message.

Implementations that expose APIs supporting the production, consumption, or transformation of a message as a data structure are encouraged to use this data model.

This data model provides these capabilities:

  • any MessageFormat 2.0 message can be parsed into this representation
  • this data model representation can be serialized as a well-formed MessageFormat 2.0 message
  • parsing a MessageFormat 2.0 message into a data model representation and then serializing it results in an equivalently functional message

This data model might also be used to:

  • parse a non-MessageFormat 2 message into a data model (and therefore re-serialize it as MessageFormat 2). Note that this depends on compatibility between the two syntaxes.
  • re-serialize a MessageFormat 2 message into some other format including (but not limited to) other formatting syntaxes or translation formats.

To ensure compatibility across all platforms, this interchange data model is defined here using TypeScript notation. Two equivalent definitions of the data model are also provided:

  • message.json is a JSON Schema definition, for use with message data encoded as JSON or compatible formats, such as YAML.
  • message.dtd is a document type definition (DTD), for use with message data encoded as XML.

Note that while the data model description below is the canonical one, the JSON and DTD definitions are intended for interchange between systems and processors. To that end, they relax some aspects of the data model, such as allowing declarations, options, and attributes to be optional rather than required properties.

Note

Users relying on XML representations of messages should note that XML 1.0 does not allow for the representation of all C0 control characters (U+0000-U+001F). Except for U+0000 NULL , these characters are allowed in MessageFormat 2 messages, so systems and users relying on this XML representation for interchange might need to supply an alternate escape mechanism to support messages that contain these characters.

Important

The data model uses the field name name to denote various interface identifiers. In the MessageFormat 2 syntax, the source for these name fields sometimes uses the production identifier. This happens when the named item, such as a function, supports namespacing.

Messages

A SelectMessage corresponds to a syntax message that includes selectors. A message without selectors and with a single pattern is represented by a PatternMessage.

In the syntax, a PatternMessage may be represented either as a simple message or as a complex message, depending on whether it has declarations and if its pattern is allowed in a simple message.

type Message = PatternMessage | SelectMessage;

interface PatternMessage {
  type: "message";
  declarations: Declaration[];
  pattern: Pattern;
}

interface SelectMessage {
  type: "select";
  declarations: Declaration[];
  selectors: VariableRef[];
  variants: Variant[];
}

Each message declaration is represented by a Declaration, which connects the name of a variable with its expression value. The name does not include the initial $ of the variable.

The name of an InputDeclaration MUST be the same as the name in the VariableRef of its VariableExpression value.

type Declaration = InputDeclaration | LocalDeclaration;

interface InputDeclaration {
  type: "input";
  name: string;
  value: VariableExpression;
}

interface LocalDeclaration {
  type: "local";
  name: string;
  value: Expression;
}

In a SelectMessage, the keys and value of each variant are represented as an array of Variant. For the CatchallKey, a string value may be provided to retain an identifier. This is always '*' in MessageFormat 2 syntax, but may vary in other formats.

interface Variant {
  keys: Array<Literal | CatchallKey>;
  value: Pattern;
}

interface CatchallKey {
  type: "*";
  value?: string;
}

Patterns

Each Pattern contains a linear sequence of text and placeholders corresponding to potential output of a message.

Each element of the Pattern MUST either be a non-empty string, an Expression, or a Markup object. String values represent literal text. String values include all processing of the underlying text values, including escape sequence processing. Expression wraps each of the potential expression shapes. Markup wraps each of the potential markup shapes.

Implementations MUST NOT rely on the set of Expression and Markup interfaces defined in this document being exhaustive. Future versions of this specification might define additional expressions or markup.

type Pattern = Array<string | Expression | Markup>;

type Expression =
  | LiteralExpression
  | VariableExpression
  | FunctionExpression;

interface LiteralExpression {
  type: "expression";
  arg: Literal;
  function?: FunctionRef;
  attributes: Attributes;
}

interface VariableExpression {
  type: "expression";
  arg: VariableRef;
  function?: FunctionRef;
  attributes: Attributes;
}

interface FunctionExpression {
  type: "expression";
  arg?: never;
  function: FunctionRef;
  attributes: Attributes;
}

Expressions

The Literal and VariableRef correspond to the the literal and variable syntax rules. When they are used as the body of an Expression, they represent expression values with no function.

Literal represents all literal values, both quoted literal and unquoted literal. The presence or absence of quotes is not preserved by the data model. The value of Literal is the "cooked" value (i.e. escape sequences are processed).

In a VariableRef, the name does not include the initial $ of the variable.

interface Literal {
  type: "literal";
  value: string;
}

interface VariableRef {
  type: "variable";
  name: string;
}

A FunctionRef represents a function. The name does not include the : starting sigil.

Options is a key-value mapping containing options, and is used to represent the function and markup options.

interface FunctionRef {
  type: "function";
  name: string;
  options: Options;
}

type Options = Map<string, Literal | VariableRef>;

Markup

A Markup object has a kind of either "open", "standalone", or "close", each corresponding to open, standalone, and close markup. The name in these does not include the starting sigils # and / or the ending sigil /. The options for markup use the same key-value mapping as FunctionRef.

interface Markup {
  type: "markup";
  kind: "open" | "standalone" | "close";
  name: string;
  options: Options;
  attributes: Attributes;
}

Attributes

Attributes is a key-value mapping used to represent the expression and markup attributes.

Attributes with no value are represented by true here.

type Attributes = Map<string, Literal | true>;

Extensions

Implementations MAY extend this data model with additional interfaces, as well as adding new fields to existing interfaces. When encountering an unfamiliar field, an implementation MUST ignore it. For example, an implementation could include a span field on all interfaces encoding the corresponding start and end positions in its source syntax.

In general, implementations MUST NOT extend the sets of values for any defined field or type when representing a valid message. However, when using this data model to represent an invalid message, an implementation MAY do so. This is intended to allow for the representation of "junk" or invalid content within messages.