This section defines a data model representation of MessageFormat 2 messages.
Implementations are not required to use this data model for their internal representation of messages. Neither are they required to provide an interface that accepts or produces representations of this data model.
The major reason this specification provides a data model is to allow interchange of the logical representation of a message between different implementations. This includes mapping legacy formatting syntaxes (such as MessageFormat 1) to a MessageFormat 2 implementation. Another use would be in converting to or from translation formats without the need to continually parse and serialize all or part of a message.
Implementations that expose APIs supporting the production, consumption, or transformation of a message as a data structure are encouraged to use this data model.
This data model provides these capabilities:
- any MessageFormat 2.0 message can be parsed into this representation
- this data model representation can be serialized as a well-formed MessageFormat 2.0 message
- parsing a MessageFormat 2.0 message into a data model representation and then serializing it results in an equivalently functional message
This data model might also be used to:
- parse a non-MessageFormat 2 message into a data model (and therefore re-serialize it as MessageFormat 2). Note that this depends on compatibility between the two syntaxes.
- re-serialize a MessageFormat 2 message into some other format including (but not limited to) other formatting syntaxes or translation formats.
To ensure compatibility across all platforms, this interchange data model is defined here using TypeScript notation. Two equivalent definitions of the data model are also provided:
message.json
is a JSON Schema definition, for use with message data encoded as JSON or compatible formats, such as YAML.message.dtd
is a document type definition (DTD), for use with message data encoded as XML.
Note that while the data model description below is the canonical one, the JSON and DTD definitions are intended for interchange between systems and processors. To that end, they relax some aspects of the data model, such as allowing declarations, options, and attributes to be optional rather than required properties.
Note
Users relying on XML representations of messages should note that XML 1.0 does not allow for the representation of all C0 control characters (U+0000-U+001F). Except for U+0000 NULL , these characters are allowed in MessageFormat 2 messages, so systems and users relying on this XML representation for interchange might need to supply an alternate escape mechanism to support messages that contain these characters.
Important
The data model uses the field name name
to denote various interface identifiers.
In the MessageFormat 2 syntax, the source for these name
fields
sometimes uses the production identifier
.
This happens when the named item, such as a function, supports namespacing.
A SelectMessage
corresponds to a syntax message that includes selectors.
A message without selectors and with a single pattern is represented by a PatternMessage
.
In the syntax,
a PatternMessage
may be represented either as a simple message or as a complex message,
depending on whether it has declarations and if its pattern
is allowed in a simple message.
type Message = PatternMessage | SelectMessage;
interface PatternMessage {
type: "message";
declarations: Declaration[];
pattern: Pattern;
}
interface SelectMessage {
type: "select";
declarations: Declaration[];
selectors: VariableRef[];
variants: Variant[];
}
Each message declaration is represented by a Declaration
,
which connects the name
of a variable
with its expression value
.
The name
does not include the initial $
of the variable.
The name
of an InputDeclaration
MUST be the same
as the name
in the VariableRef
of its VariableExpression
value
.
type Declaration = InputDeclaration | LocalDeclaration;
interface InputDeclaration {
type: "input";
name: string;
value: VariableExpression;
}
interface LocalDeclaration {
type: "local";
name: string;
value: Expression;
}
In a SelectMessage
,
the keys
and value
of each variant are represented as an array of Variant
.
For the CatchallKey
, a string value
may be provided to retain an identifier.
This is always '*'
in MessageFormat 2 syntax, but may vary in other formats.
interface Variant {
keys: Array<Literal | CatchallKey>;
value: Pattern;
}
interface CatchallKey {
type: "*";
value?: string;
}
Each Pattern
contains a linear sequence of text and placeholders corresponding to potential output of a message.
Each element of the Pattern
MUST either be a non-empty string, an Expression
, or a Markup
object.
String values represent literal text.
String values include all processing of the underlying text values,
including escape sequence processing.
Expression
wraps each of the potential expression shapes.
Markup
wraps each of the potential markup shapes.
Implementations MUST NOT rely on the set of Expression
and
Markup
interfaces defined in this document being exhaustive.
Future versions of this specification might define additional
expressions or markup.
type Pattern = Array<string | Expression | Markup>;
type Expression =
| LiteralExpression
| VariableExpression
| FunctionExpression;
interface LiteralExpression {
type: "expression";
arg: Literal;
function?: FunctionRef;
attributes: Attributes;
}
interface VariableExpression {
type: "expression";
arg: VariableRef;
function?: FunctionRef;
attributes: Attributes;
}
interface FunctionExpression {
type: "expression";
arg?: never;
function: FunctionRef;
attributes: Attributes;
}
The Literal
and VariableRef
correspond to the the literal and variable syntax rules.
When they are used as the body
of an Expression
,
they represent expression values with no function.
Literal
represents all literal values, both quoted literal and unquoted literal.
The presence or absence of quotes is not preserved by the data model.
The value
of Literal
is the "cooked" value (i.e. escape sequences are processed).
In a VariableRef
, the name
does not include the initial $
of the variable.
interface Literal {
type: "literal";
value: string;
}
interface VariableRef {
type: "variable";
name: string;
}
A FunctionRef
represents a function.
The name
does not include the :
starting sigil.
Options
is a key-value mapping containing options,
and is used to represent the function and markup options.
interface FunctionRef {
type: "function";
name: string;
options: Options;
}
type Options = Map<string, Literal | VariableRef>;
A Markup
object has a kind
of either "open"
, "standalone"
, or "close"
,
each corresponding to open, standalone, and close markup.
The name
in these does not include the starting sigils #
and /
or the ending sigil /
.
The options
for markup use the same key-value mapping as FunctionRef
.
interface Markup {
type: "markup";
kind: "open" | "standalone" | "close";
name: string;
options: Options;
attributes: Attributes;
}
Attributes
is a key-value mapping
used to represent the expression and markup attributes.
Attributes with no value are represented by true
here.
type Attributes = Map<string, Literal | true>;
Implementations MAY extend this data model with additional interfaces,
as well as adding new fields to existing interfaces.
When encountering an unfamiliar field, an implementation MUST ignore it.
For example, an implementation could include a span
field on all interfaces
encoding the corresponding start and end positions in its source syntax.
In general, implementations MUST NOT extend the sets of values for any defined field or type when representing a valid message. However, when using this data model to represent an invalid message, an implementation MAY do so. This is intended to allow for the representation of "junk" or invalid content within messages.