A Modular and Extensible MessageFormat 2.0 #190

eemeli · 2021-08-21T06:50:47Z

eemeli
Aug 21, 2021
Maintainer

tl;dr We should split up the spec, making variables, functions, terms and elements extensions of the core.

Recent work with and discussions about the MF2 spec has lead me to realise that there's a somewhat obvious level at which the spec may be made extensible: Pattern elements. So far, our discussions have identified at least five possible types:

literal: immediately defined values
variable: values defined at runtime
function: placeholders or formatting functions
term: including messages within other messages
element: formatting and styling elements

Our discussions of the data model have thus far been based on the premise that some explicit monolithic set of such elements is the correct answer, but we don't really agree on which:

Elango/Mihai: literal, function
Eemeli/Zibi: literal, variable, function, term
Staś: literal, variable, function

Rather than trying to resolve these various schools of thought into one, we should agree to disagree, and build the spec so that each element is defined as an extensions of the core spec. By building an example implementation of such a modular system (see src/pattern/ for implementations), I've determined that an appropriate interface for such a pattern element formatter might look like this:

interface PatternElementFormatter {
  type: string; // needs to match exactly with a data model element
  asFormattable(ctx: Context, part: PatternElement): Formattable; // for selectors and options
  formatAsPart(ctx: Context, part: PatternElement): FormattedPart; // for format-to-parts
  formatAsString(ctx: Context, part: PatternElement): string; // for format-to-string
  initContext?: (mf: MessageFormat, resId: string) => unknown; // allow access to current runtime & messages
}

The scope of that interface also indicates how these pattern elements are different from the formatting functions available via function: They require more complex setup, and have a more complex API. Essentially, these two layers answer rather different questions:

A formatting function (available via function) should be relatively simple to implement. For example, the functions required for MF1 and Fluent compatibility are all 1-6 lines of code each. They're allowed to throw errors, and return a single value.
A pattern element formatter provides multiple invocation methods and has deeper access to its runtime environment. Its methods need to never throw, instead reporting errors otherwise and providing decent fallback values in such cases. It may even define its own representation in source form or XLIFF.

For example, term is somewhat similar to function in that it enables for relatively simple ways to define a value that may be embedded within a message, but leaves its definition controlled by the localiser or translator rather than a programmer.

Adopting a modular approach as presented here should make it easier to focus on and agree to individual parts of the data model (as we recently did on the interface of function), as well as providing a way for external interfaces to communicate requirements and expectations about what they support. For instance, MessageFormat 1 compatibility requires literal, variable, function, while Fluent also needs term. Similarly, a translation provider could e.g. claim complete function configurability, while needing to process element as pass-through values.

stasm · 2021-08-21T08:09:13Z

stasm
Aug 21, 2021
Maintainer

I think there's a difference between the data model extensibility and the runtime model extensibility. Looking at your implementation, I think you're proposing both?

export interface Function extends PatternElement {
  type: 'function';
  func: string;
  args: (Literal | Variable)[];
  options?: Record<string, Literal | Variable>;
}

export const formatter: PatternFormatter = {
  type: 'function',
  formatAsPart: formatFunctionAsPart,
  formatAsString: formatFunctionAsString,
  formatAsValue: formatFunctionAsValue,
  initContext: mf => mf.runtime
};

I agree about the runtime extensibility. I'm cautious about the data model one. My hope was that the data model can be stable to increase its portability. Ideally, messages should be understandable by all tools in the pipeline. If the user introduces a custom extension of the data model (e.g. adds a new PatternElement subclass), all tooling involved in the localization pipeline will need to support this extension too. My current thinking is that this is too much to ask for.

2 replies

stasm Aug 21, 2021
Maintainer

In the third data model proposal, I tried to provide extension points on the runtime via the following RuntimeValue interface (thanks to @mihnita for suggesting adding match):

export abstract class RuntimeValue<T> {
	public value: T;
	abstract formatToString(ctx: FormattingContext): string;
	abstract formatToParts(ctx: FormattingContext): IterableIterator<FormattedPart | OpaquePart>;
	abstract match(ctx: FormattingContext, key: VariantKey): boolean;
}

This way users can define custom runtime values, with custom formatting and selection logic. At the same time the underlying data model doesn't change at all.

eemeli Aug 21, 2021
Maintainer Author

Handling errors and fallbacks

I would actually argue that from a tooling point of view, a model with types+functions can provide at least an equivalent if not better experience that one with functions only. To start, let's consider a hypothetical element, which could be implemented in one of two ways (simplifying literals to strings):

{ type: 'element', name: 'a', tag: 'start', attr: { href: 'http://example.com/' } }
{ type: 'function', func: 'element', args: ['a', 'start'], attr: { href: 'http://example.com/' } }

If the tooling does not know what to do with this element, it'll need to in both cases either fail or use some sort of a fallback value when representing this part of the message. Next, consider also a missing function, for example:

{ type: 'function', func: 'list', args: ['foo', 'bar', 'baz'], attr: { type: 'disjunction' } }

This would be rendered the same in either sort of system, but with one key difference: With a types+functions model we can know that this would be performing some operation on just the args and attr values. With a functions-only model, we can't know that this isn't doing something like term and looking up and entire message that ought to be included here, or indicating the start of some formatting element.

I would argue that need to use a fallback should be considered a different level of failure in the two above examples, and that a functions-only model can't differentiate between them. And that in neither case does the data model actually change when it starts supporting an element, it's just packaged differently.

Formatting function signatures

Another way of putting this is that we're talking about the signature of a function call, which I'm arguing should be as simple and limited as possible, whereas you'd like it to have greater access:

stas/third:

message-format-wg/experiments/stasm/third/impl/registry.ts

Lines 5 to 9 in 2879325

    
           export type RegistryFunc<T> = ( 
        
           	ctx: FormattingContext, 
        
           	args: Array<Argument>, 
        
           	opts: Record<string, Parameter> 
        
           ) => RuntimeValue<T>;

ts_eemeli (slightly updated here, but still pretty much the same):

message-format-wg/experiments/data_model/ts_eemeli/data-model.d.ts

Lines 181 to 185 in 2879325

    
           type RuntimeFunction<R> = ( 
        
             locales: string[], 
        
             options: FunctionOptions | undefined, 
        
             ...args: any[] 
        
           ) => R

The key difference here is that with types+functions we don't need to give a formatting function unrestricted access to the runtime context, which functions-only effectively requires in order to make it possible for e.g. term to access the current message resource. This makes reasoning about the function's behaviour much easier, and provides a significantly smaller attack surface for exploiting any vulnerabilities in custom formatter code.

Accessing separate parts of the context

With my current implementation, each of its types is distinguishable based on the specific part of the context that it accesses:

literal requires no context
variable gets the runtime scope, i.e. the parameters that the formatting function was called with
function uses the function registry
term builds a getter function for other messages

Eventually, element would probably get access to a yet-separate set of source components for it to overlay its contents, which need is in fact a decent argument for its separate implementation from the generic function.

stasm · 2021-08-21T08:25:11Z

stasm
Aug 21, 2021
Maintainer

Our discussions of the data model have thus far been based on the premise that some explicit monolithic set of such elements is the correct answer, but we don't really agree on which:

Elango/Mihai: literal, function

Eemeli/Zibi: literal, variable, function, term

Staś: literal, variable, function

This is a good summary. It's interesting to consider which elements can be expressed through other, more atomic elements. For example, I replaced term (a message reference) by the built-in PHRASE function which on runtime returns a special PatternValue instance, whose interface is almost identical to the PatternFormatter you're proposing above.

In fact, my proposal started out with even variables being represented by a built-in functions: VAR("count"). However, I backed out of this idea and instead decided to add a separate data node, VariableReference, in order to allow some composability (PLURAL($count)), and at the same time forbid other sorts of composability (PLURAL(VAR("count"))).

This to me is a good example of how we can enforce rules and form expectations about the data model by introducing separate data types. I wouldn't want us to shy away from using this design tool.

0 replies

mihnita · 2021-08-23T00:37:08Z

mihnita
Aug 23, 2021
Maintainer

In my mind (and in the EM model) the placeholder (which invokes a function) also handles "variables" and "terms" (which are actually message references), and "elements"

Variables

Variable without a formatting function does not make much sense (if the variable is a date then it needs formatting).

Elements

The "elements" used for formatting also need functions.

For example ...{element, id=1, kind=open, function:render, color:red}...{/element, id=1, kind=close, function:render}...

When you "render" this message in a GUI environment (let's say a html widget) the "render" function would generate <span style="color"> and </span>, respectively (...<span style="color">...</span>...)
When you do that in a console the "render" function will generate ansi escape sequences \e[91m and \e[37m (...\e[91m...\e[m...)
And when the console app has the output redirected to a file the elements are rendered to nothing.

That way you can reuse the message in various environments and platforms.
(for example Android can use a Spannable with a ForegroundColorSpan for the above message.

So elements also need a function.

Message refs (terms)

And you need a function to load the referred messages from various places (resource bundles, database, etc.)

That's why variable / term / element are all bundled under "placeholder" in the EM model.

2 replies

grhoten Aug 23, 2021
Maintainer

I really really hope that we aren't proposing any interpretation nor direct handling of HTML, XML, JSON, CSS, SSML or something else like that. I'm really hoping to focus on what has been proposed as the data model in this group. From my experience, such mappings can be fragile to maintain. I'd rather focus on passing that formatting/decoration information through unmodified, because I consider that a solved problem. I'm hoping that we can focus more on getting words into grammatical agreement.

Though I will admit that our implementation formats 2 representations at once. One is for the printed form, and the other is the spoken form. We're basically creating 2 strings in 1 shot. For example, I may want to print "1 mujer" in Spanish, but pronounce it as "una mujer". Both spans are the same by default.

eemeli Aug 24, 2021
Maintainer Author

@grhoten Absolutely agreed on not defining specific interfaces for HTML, its DOM, SSML, React, or anything else. However, I do think that there's value in considering the needs that such interfaces present, and how they might be best accommodated in the interfaces that we do define, so that a real-world implementation wouldn't be forced to e.g. output a message as stringified HTML or SSML and then immediately re-parse that with a different parser in order to actually use it.

This is particularly relevant in a live context such as React, which may use a complex non-stringifiable object as a value that's to be included in the middle of its output. Similarly, Fluent does really rather interesting things with its DOM overlays, which allow for an existing HTML DOM element to be used as a template or enriched by data coming from a message.

Hence e.g. the proposed element type, and indeed the desire that I share with you that we don't get stuck with the discussions on its details continuing to block progress on the core data model. If we could agree that the data model will support at least some set X of possible pattern element types, we could then agree separately on what that data model should look like, and what each of those pattern element types look like. That's what I mean by splitting up the spec.

dminor · 2021-08-25T13:21:31Z

dminor
Aug 25, 2021

I'm not sure I fully understand what @eemeli is proposing here, if this is meant to be a mental model for how we work, or something reflected in the published spec.

I agree with @stasm that looking at the proposals in terms of elements is a nice and tidy way of summarizing the models.

As a mental model, thinking of elements as extensions, developing them independently, and then considering them for inclusion in the data model seems like a nice way of breaking our impasse and moving forward. If we're all in agreement that literal and function are required, start there, and then let people who support variable develop that as an "extension", and make an argument for its inclusion in the spec.

I think it would be a failure of the group if we publish a spec where almost everything in the data model is an extension. But as a conceptual model for how we could work, it seems like it would let us first focus on areas of agreement, and then have a process for how to build on those areas of agreement.

0 replies

mihnita · 2021-09-15T19:37:50Z

mihnita
Sep 15, 2021
Maintainer

publish a spec where almost everything in the data model is an extension

I actually favor this approach, and here is why:

the standard would be the spec itself + a registry of official extensions
the registry would be "seeded" with what we already need for backward compatibility with ICU / Fluent: plural / ordinal selectors, date / time / number formatting, etc.

In a way that makes sure (and proves) that all functions are "first class citizens"
So something added to the registry 2 years from now is no more special than the one that comes "out of the box".
Think for example Eclipse, or maven. Everything is a plugin, even what some think it is "core"

0 replies

mihnita · 2021-09-15T19:42:01Z

mihnita
Sep 15, 2021
Maintainer

I find this idea weird, and if someone would come with a code PR like that for review I would reject it.

export abstract class RuntimeValue<T> {
	public value: T;
	abstract formatToString(ctx: FormattingContext): string;
	abstract formatToParts(ctx: FormattingContext): IterableIterator<FormattedPart | OpaquePart>;
	abstract match(ctx: FormattingContext, key: VariantKey): boolean;
}

There is no "function" that we know of that can be used both for formatting (formatToParts / formatToString) and for selection (match).

They should really be 2 different interfaces (traits).

1 reply

eemeli Sep 18, 2021
Maintainer Author

I think there might be some misunderstanding here. @stasm's RuntimeValue is a wrapper around a value that has a generic type T. It provides three different methods with concrete types for when that value needs to be formatted to a string, formatted to parts, or compared against a select key.

Why should this kind of encapsulation automatically be rejected in code review? For example, an implementation of this interface around a number could include its formatting options, and be able to use those both when formatting as well as when determining a match against a CLDR plural category.

mihnita · 2021-09-15T19:49:09Z

mihnita
Sep 15, 2021
Maintainer

To understand (and properly decide) what the functions look like we really need to decide what the runtime looks like, and how the rendering happens.

There are a few relevant section in this "whiteboard doc":
https://docs.google.com/document/d/1w6fxTfh4xeaqlKZ_QZK1sDpvTv1TqtBeh897aOT8qyk/edit?usp=sharing&resourcekey=0-jRKq3bi_UzcnK-5gA8yjsw
(it is shared with "read access" for the whole world, please ask for comment / edit rights and I will approve it)

Looks like about half of the "friction points" are around functions:
3. Ordered params / arguments vs named
5. Selection functions != formatting functions
7. Variable references
9. Various “states” for in which the model can be
10. Function signatures

And I think that a lot of these are caused a bit less by "raw disagreements", but more by under-defined terms.
Things are not clearly explained, and various parties understand them based on their previous experiences.

0 replies

mihnita · 2021-09-15T20:03:58Z

mihnita
Sep 15, 2021
Maintainer

under-defined terms. ... Things are not clearly explained

TLDR: I think that until we define things the discussion is premature.
We will either spend time arguing on differences that are not real differences (just different "lingo"), or we will spend that time defining things.
None of these two options looks like something worth doing in a full WG meeting.

0 replies

mihnita · 2021-09-17T23:09:35Z

mihnita
Sep 17, 2021
Maintainer

I am kind of reluctant to duplicate a lot of the content that is already in the "whiteboard doc"
Especially since (at least for me) discussing on github feels a lot clunkier than on a GoogleDoc.

I that that there are 3 areas that need clarification in order to decide what functions look like:

1. What they look like in "the registry"

More precise, the "registry schema" or the file that will accompany the MF2 standard.
So not the runtime "map" (or whatever other structure) where an implementer "registers" a name and a class / function pointer / lambda.

Think of it as a header file in C / C++

At this level the info would look something like this:

function
    name: DATE
    possible inputs (name : type):
        * timeStamp: long
        * date : Any // ??? to discuss the type. But should be very open.
        * instant: Any // ??? to discuss the type. But should be very open.
        * calendar: Any // ??? to discuss the type. But should be very open.
    options: map<String, ???> // a reduced number of types. Maybe string / number / boolean is probably enough

function
    name: INTERVAL
    possible inputs (name : type):
        * startTimeStamp: long
        * endTimeStamp: long
        * startDate : Any // Same as above. And will not repeat for Calendar, Instant
        * endDate : Any
        * interval<Any> // a class that has a start and an end member
        * pair<Any>
        * array<Any> and size 2
    options: map<String, ???> // a reduced number of types. Maybe string / number / boolean is probably enough

The options part is less relevant here.
There is still a need to decide what types we accept there (probably string, number, boolean are enough?)

The possible inputs are unclear.
But the types, and the names of the possible arguments.

2. What does the model look like when parsed

This might be different than the model ready to use when rendering.
It it the model used for lint, refactoring, IDE plugins, export / import to XLIFF, tooling in general.

There is no need to have real implementations for the various functions.

Similar to what an editor needs to know about HTML.
No need to have all the images referred present, JS files, etc.

So:

msg = MessageFormat2.parse(...) // parsing some serialized form

At this level what info needs to be present?

With the examples above probably something like this:

"Your card issued on {DATE(issue_date, options:{year:numeric, month:abbrev})}, must be payed by {DATE(due_date, options:{year:numeric, month:full, day:numeric}}, or you will be charged late fees."

Rendered as "Your card issued on Feb 2019, must be payed by September 30, 2021, or you will be charged late fees."

And

"The XYZ Conference moved from {INTERVAL, oldInterval, options} to {INTERVAL, newInterval, options}"

Rendered: "The XYZ Conference moved from Oct 2-5 to Oct 9-12"

3. What the model looks like at runtime, and how it the rendering done

At this point we need real implementations for the various functions, and we also need to "bring in" somehow the various inputs.

msg = MessageFormat2.parse(...) // parsing some serialized form
msg.format(parameters)

What is the type of parameters?
What are the types of the various runtime values that we want to format?
And how do we support more than one placeholder-function (see the examples above).

To render the first message we would need to somehow connect:
issue_date.timeStamp : 1231231231 // a long, epoch time
due_date.date with a Java Date (in Java), JS Date (in JS), some custom DateTime from some Rust crate
and for the second message:
oldInterval.startTime : 1231231231
oldInterval.endTime : 1231239919
newInterval: new Interval(d1, d2)

At this point we need function implementations, and real types.
But the types can't be captured in the standard, or the data-model.
They are implementation details, might be platform dependent.

We want to be able to use the same serialized message, parse, then render, on Win, MacOS, gettext, Android, browser, etc.
The only part that is platform dependent is the rendering part, where we bring the runtime parameters together and we map them to the names we have in the model.

TLDR: looking at the examples above, and following the full "lifecycle", I think it is pretty clear that:

A. The data-model should stop at "parse level" (opinion)

B. We need a description of the rendering (runtime?) behavior

C. The model can't specify they types of the inputs for the functions.

Something like "Date" is platform / framework specific.

But I think that the good news is: we don't need to (opinion)

D. Would be "unclean" for the model to force on all platforms that "functions" must be class-like at rendering time

I should be free to use function pointers / references, lambdas, something else, whatever. It's an implementation detail.

We only need this at rendering time.
And all we need at that time is a way to map from a string (the function name) to something that can be called (invoked).

As an implementer I should be free to implement a runtime registry as one single map from string to classes with format, formatToParts, match members, or as one class with 3 maps, each map for one kind of function.

It's an implementation detail, not a data model issue.

E. A simple [] would not be enough for the arguments, we need something with names

1 reply

eemeli Sep 18, 2021
Maintainer Author

What they look like in "the registry"

[...] Think of it as a header file in C / C++

Yes, this will need to be decided. I would think that it'd be easiest to express this in the spec as the shape of the arguments with which each formatting function will be called, along with the type of its return value.

You've argued previously that the data model should only include the literal and function pattern elements, and that all of the other functions could be implemented via function. Is this still so?

The examples DATE and INTERVAL you give are relatively simple, as they only operate on their direct input values. Still, they'll need access to at least the current locale when they're called. How about message references? What would such a function need to have access to in order to work?

What does the model look like when parsed

This might be different than the model ready to use when rendering.
It it the model used for lint, refactoring, IDE plugins, export / import to XLIFF, tooling in general.

I don't think I understand this. Why or how would the model be "parsed"? I at least have understood the data model to be a description of a structure that is alrady parsed. When considering various use cases, it's of course likely that not all of them will make use of all parts of the data model, but do we really need to be explicit about which parts are and are not used for various tasks?

What the model looks like at runtime, and how it the rendering done

At this point we need real implementations for the various functions, and we also need to "bring in" somehow the various inputs.
msg = MessageFormat2.parse(...) // parsing some serialized form
msg.format(parameters)
What is the type of parameters?

I think it should be a Record<string, unknown>.

What are the types of the various runtime values that we want to format?

I'm pretty sure that we should not generally limit the types of runtime values.

And how do we support more than one placeholder-function (see the examples above).

Is this a question about how pattern elements in a message may refer to more than one runtime value, or something else? I'm not sure that I understand the concern here.

TLDR: looking at the examples above, and following the full "lifecycle", I think it is pretty clear that:

A. The data-model should stop at "parse level" (opinion)

As in, the data model should only contain stringifiable data?

B. We need a description of the rendering (runtime?) behavior

Yes, for both string and parts outputs.

C. The model can't specify they types of the inputs for the functions.

I agree that we should not limit what types of values might be passed in as formatting function arguments at runtime and passed to formatting functions that use them. However, we do need to clarify at least how the formatting function is told what the current locale is, and we need to have a solid story for how message references can work.

Furthermore, we can (and probably should) wrap all formatting function arguments in something like RuntimeValue or Formattable. Consider for instance your DATE and INTERVAL, and how they might handle format-to-parts output. If their input values are wrapped up as implementation-specific FormattableDate objects which encapsulate the code for formatting a date to a list of parts, implementing those functions becomes significantly simpler and duplicates far less code.

D. Would be "unclean" for the model to force on all platforms that "functions" must be class-like at rendering time

I should be free to use function pointers / references, lambdas, something else, whatever. It's an implementation detail.

Agreed. But we should still be able to use object-oriented concepts in the spec when it makes sense to do so. Because we're really talking about implementation internals, and those can of course be implemented in a number of different ways.

E. A simple [] would not be enough for the arguments, we need something with names

Why? I don't see how this follows from your previous arguments.

mihnita · 2021-09-18T01:52:45Z

mihnita
Sep 18, 2021
Maintainer

I am at this point not pushing for the EM model, or against any model.

But I want to explain how the EM model answered the problems described above:

1. The "schema" / registry

All functions only take one parameter, of type Any

There is no need for timeStamp / instant / date / calendar named inputs.

That is resolved in the implementation, by looking at the runtime-type info (RTTI)
If the programming language used for implementation does not have RTTI then a poor-man's version is to represent the Any as a struct with a type and value.
In raw C that would be something like struct Any { char* type, void* value}
ICU4C has Formattable
I think the equivalent in Rust is the std::any::{Any, TypeId}
And in his proposal Stas used this pattern: interface Foo { type, value }
But I think that RTTI vs type + value is an implementation details, and does not need to be visible in the data model spec.

So in the EM model it is not possible to pass startTime, endTime or anything like that as input.
You are forced to wrap them in "one thing" (class, array, pair, whatever the dev chooses / the prog language offers)

As such the inputs don't need names, it's only one parameter:

function
    name: DATE
    un-named input: Any
    options: map<String, ???> // a reduced number of types. Maybe string / number / boolean is probably enough

function
    name: INTERVAL
    un-named input: Any
    options: map<String, ???> // a reduced number of types. Maybe string / number / boolean is probably enough

Since all functions take on input and one input only, there is no real need to list it in the registry
(the same way C++ or Java don't explicitly list the first parameter of a member function, which is always this)

2. Parse time

The examples look 100% like described before:

"Your card issued on {DATE(issue_date, options:{year:numeric, month:abbrev})}, must be payed by {DATE(due_date, options:{year:numeric, month:full, day:numeric}}, or you will be charged late fees."
"The XYZ Conference moved from {INTERVAL, oldInterval, options} to {INTERVAL, newInterval, options}"

The issue_date, due_date, oldInterval, newInterval are id of the placeholder, nothing to do with the function argument (which is unnamed)
But they are used to differentiate between the 2 placeholders otherwise identical in the same message.
As such they are needed during translation (so that one can change the order of two parameters, if needed).
(... on {due_date} your card issued on {issue_date} will expire ...)
The way most string interpolation works (where "toString" or similar are "dumb" functions that take one parameter)

3. Rendering time

The parameters passed to msg.format(params) is a map from String (the name) to Any
The name of the placeholder (previous step) is also used to get the matching parameter, by name.

So the implementation is something like this (pseudo-code):

format(map<string, any> parameters)
    output = empty
    for each part in message:
        if part is plainText
            append to result as is
        else  // placeholder, there is no other type
            something_to_call = use the placeholder.function_name to get the right function from the runtime registry
            argument = use the placeholder.function_name to get the right on from parameters
            invoke something_to_call with argument passed as (unnamed) input, and with placeholder.options (to know how to format it)
    return output

Output is string for format, some string + ranges info for formatToParts
something_to_call is no a function, is anything that can be called.
Conceptually a function (that takes one parameter and returns another).
But can be lambda, class, etc (implementation detail)

The selector functions are called completely differently, yet another reason to not "dump" them together with the formatter ones.

The "map function name to real callable" part is solved by the "runtime registry"

In term of interfaces:
format functions: take an Any input, return a string
formatToPart functions: take an Any input, return a something_with_parts_info
selector (or match) functions: take an Any input, return an integer (a score)

So a developer wold register functions like this: function_registry[function_name, function_type] = function_implementation
With the equivalent way to get it: function_implementation = function_registry[function_name, function_type]

Real implementation:

   function_registry["DATE", TYPE_FORMAT] = function pointer / class ref / lambda / anything that can be called
   function_registry["DATE", TYPE_FORMAT2PARTS] = function pointer / class ref / lambda / anything that can be called
   function_registry["PLURAL", TYPE_SELECT] = function pointer / class ref / lambda / anything that can be called

Or (yes) one can wrap all 3 kinds of functions in one single object.
But it is an implementation detail, not visible in the model.

I personally don't think I like the idea, and I would not do it.
Imagine interface { openFile(filename); closeFile(); dateToString(date); checkIfNull(object); }
The methods have nothing to do with each other other than the fact that they "can be called"

But as long as this is not visible in the data-model any implementer can do whatever they want.

And in the EM model this is not visible.

0 replies

mihnita · 2021-09-19T02:08:15Z

mihnita
Sep 19, 2021
Maintainer

First, let me clarify what I'm trying here: I am trying to understand what the proposal is.
And asking questions about areas that seem to be under-specified (or over-specified, but not soo much).

I am not arguing (yet?) against or pro some features.

You've argued previously that the data model should only include the literal and function pattern elements, and that all of the other functions could be implemented via function. Is this still so?

Yes, I think it would be enough. But I am open to listen.

The examples DATE and INTERVAL you give are relatively simple, as they only operate on their direct input values. Still, they'll need access to at least the current locale when they're called. How about message references? What would such a function need to have access to in order to work?

I would expect that all functions will need to have some kind of "context".
But that can't really be part of the spec.

Basically "the implementation captures in a context everything that is needed for the implementation to work"

I think that it is not really possible to define what the context contains.

Or the function would have access to the parent message, which has a locale. No need for that to be in the context.

For example to load strings using the framework that uses MF2 one would need access to native functionality. Stuff like LoadString on Windows, or a Resources.getString on Android, or NSLocalizedString on MacOS.
Some system might decide that there is no need for an explicit locale. getString (or the equivalent) would return the string for the proper locale. Or the implementation might call something like getDefault (reminder: many OSes / framework can have more than one locale, depending on the use. For example Java has a display locale (for string loading) and a formatting locale.
MacOS can have different locales for strings (UI), date / time format, sort. And so on.

Can we force them to use one locale and one locale only?
Is there any values to put in the standard how is one supposed to access the locale(s)?

At least for now I see these "functions" the same way Java (and other languages) see interfaces:
you must implement at least these N methods, with these signatures.
Classes implementing the interfaces are free to have extra info (passed in constructors, globals, injection, setters).

Whenever an interface definition forces the implementer to write some dummy function, or pass a null parameter,
just to "obey" the interface, but without that method / parameter be needed, it means that the interface was overreaching.
And is a sign of bad API design.

I don't think I understand this. Why or how would the model be "parsed"? I at least have understood the data model to be a description of a structure that is alrady parsed.

You are right, that is not well described.

I'll try to say it differently: there would be one model the is the output of the parser.
And one that can be used for "rendering" (to string, or to parts).

Should we stop to only specify the result of the parse model?

I think it should be a Record<string, unknown>.

Is that the same thing as a Map?

I'm pretty sure that we should not generally limit the types of runtime values.

100% agree.

Am I only confused by the model (as proposed by Stas) defining all kind of types (string, decimal, boolean, etc).
Are they just for the options?

And are we still talking the Stas model, or are you proposing a forth model?

Because the RuntimeValue there seems quite confusing to me.

It is used both as return of a formatting function:

interface RegistryFunc<T> {
	(ctx: FormattingContext, args: Array<Argument>, opts: Record<string, Parameter>): RuntimeValue<T>
}

How can I represent "format to parts" with the few basic types defined?

And for vars in the FormattingContext
So I can't have a ResourceManager in context, because it is not a RuntimeValue.

Or RuntimeValue only means some "unknown type packed with type info"?
So I can write my own DateValue or PersonValue that extend RuntimeValue?

If that is all there is, do we even need it, if the programming language I'm using has RTTI?

And how do we support more than one placeholder-function (see the examples above).

Is this a question about how pattern elements in a message may refer to more than one runtime value, or something else?

Yes.

I'm not sure that I understand the concern here.

How do we get to them "at render time?"

Because I don't see explained anywhere how things are rendered.
Are there 2 steps, or not?

parse => gives me a model
model.format( arguments ) => gives me a String / parts (as in "format to parts", TBD the name / type)

Basically I don't think it is clear how functions and "variables" (or whatever the name is now) come together, and when.

As in, the data model should only contain stringifiable data?

No.
As in "the data model after parse might not be sufficient to render it" (toString or toParts)
(among other because anything is stringifiable, I can base64 a blob and now it's a string)

B. We need a description of the rendering (runtime?) behavior

Yes, for both string and parts outputs.

Sure, we agree on this.
But what I mean is: in order to be able to discuss what the functions look like we need the runtime part described.
We can't go ahead to "let's discuss functions, do you agree this is good" without explaining how they work.

Furthermore, we can (and probably should) wrap all formatting function arguments in something like RuntimeValue or Formattable.

I think that only makes things "clunky"
If the programming language has RTTI there is no need to wrap things.
Instead of doing something like
if (runtimeVal.getType == "date") { ... }
one would do if (runtimeVal instanceof Date) { ... }

It is also less work when I try to render (format) the message.
Instead of parameters { startDate: new RuntimeDateValue(date) } I can do parameters { startDate: date }

Worse, we might need a RuntimeDateValue. RuntimeCalendarValue, RuntimeInstantValue

We should not burden the spec (and the devs) with extra abstractions , unless they are really needed.

If their input values are wrapped up as implementation-specific FormattableDate objects which encapsulate the code for formatting a date to a list of parts, implementing those functions becomes significantly simpler and duplicates far less code.

I don't see how.
We still need to treat epochTime / Date / Calendar / Instant differently in the implementation.

But we should still be able to use object-oriented concepts in the spec when it makes sense to do so.

Also agreed. It is hard to NOT agree when the statement contains "when it makes sense"
The trouble is agreeing then it makes sense :-)

E. A simple [] would not be enough for the arguments, we need something with names

Why? I don't see how this follows from your previous arguments.

If the arguments come as an array (as proposed) there it no way to extend things.

All the functions that take 2 RuntimeValues are "merged" into one.
When this happens in classic APIs the way people go about it is they add yet another parameter.
And some of them are null, and we ignore them. And it's all a mess.

For example, if we have a formatRange( int startIndex, int endIndex ) in this data model they will be an array with 2 IntegerLiterals.
Now I want formatRange( int startIndex, int rangeLength ). Can't do it.

That's why people like named parameters.

I think there might be some misunderstanding here. @stasm's RuntimeValue is a wrapper around a value that has a generic type T.
It provides three different methods with concrete types for when that value needs to be formatted to a string, formatted to parts,
or compared against a select key.

So who own the methods? The RuntimeValue?
Art the 3 methods members of the RuntimeValue?
That is not what the current code (from Stas) seems to show.

Is there some implied decision already (that we should make explicit) that we do:

class DateRuntimeValue extends RuntimeValue {
    type = "date"
    value : Date
    function format(Context, Date, Options) : String
    function formatToParts(Context, Date, Options) : String
    function matcher(Context, Date, Options) : ??? // Does it throw?
}

Or we do

class DateRuntimeValue extends RuntimeValue {
    type = "date"
    value : Date
}

function dateFmt = function dateFormat(Context, Date, Options) : String { implementation }
registry.register("date", "format", dateFmt);

or

class DateFormatter {
   function format(Context, Date, Options) : String { implementation }
   function formatToParts(Context, Date, Options) : Parts  implementation }
}
registry.register("date", new DateFormatter());

or maybe something else? (I have a factory, for example)

But the core question here is:
Are the formatting functions members of the object to be formatted (OO)? foo.toString(options)
Or we are functional? fooToString(foo, options)

The code seems to indicate functional.
You answer seems to indicate OO.

Sorry, looking through it again, and it feels very argumentative.
That is not my intention.

It's just that I don't see how things come together.
"Look at the code" is not a good answer, we can't discuss these issues in the whole group by asking everybody to read the code.

The main reason is not "people can't read code" (of course they can)
But by looking at the code it is impossible to tell what is intentional, what is implementation detail, what is workaround for programming language limitations, what is a shortcut for cleaner code.

For example in the Stas's code there is stasm\third\impl\context.ts

And at the end the toRuntimeValue has a switch on node.type with cases for "StringLiteral" and so on.

What happens if I add "DateRuntimeValue"? TypeError
Do I have to modify the core implementation to add a new type?

I think that proof of concept implementation should simulate the "real world": I get ICUNext, or FluentNext, or ECMAScript.Intl.MessageFormat, and WITHOUT changing the code I add my own types to format, or to select on.

1 reply

eemeli Sep 19, 2021
Maintainer Author

The examples DATE and INTERVAL you give are relatively simple, as they only operate on their direct input values. Still, they'll need access to at least the current locale when they're called. How about message references? What would such a function need to have access to in order to work?

I would expect that all functions will need to have some kind of "context".
But that can't really be part of the spec.

Ah, but there I think we disagree, on both counts. First, I assert that the only part of the message/resource/formatting-invocation specific context that formatting functions should have access to is the current locale. Yes, a formatting function could make use of static tools such as the JS Intl.ListFormat, but it should not be able to behave differently depending on the context in which it's called, except so far as that context defines the explicit arguments and options passed to the function.

Second, I further assert that we should include in the spec an explicit method for pattern element formatters to be able to define the context that they require to operate. See here for more on this: #190 (reply in thread).

I don't think I understand this. Why or how would the model be "parsed"? I at least have understood the data model to be a description of a structure that is alrady parsed.

You are right, that is not well described.

I'll try to say it differently: there would be one model the is the output of the parser.
And one that can be used for "rendering" (to string, or to parts).

Should we stop to only specify the result of the parse model?

I think I'm still lost on this. Does this "rendering model" merge in some additional data from some other source, or is it a definition of some operations on the "parse model"?

I think it should be a Record<string, unknown>.

Is that the same thing as a Map?

Pretty much, yes. Here's the TS definition: https://www.typescriptlang.org/docs/handbook/utility-types.html#recordkeystype

Am I only confused by the model (as proposed by Stas) defining all kind of types (string, decimal, boolean, etc).
Are they just for the options?

If you're referring to the Parameter type in Staś's data model, then yes, that is purely a description of the contents of the stringifiable data model. During formatting, runtime values are wrapped in a RuntimeValue, which may encapsulate a value of any type.

And are we still talking the Stas model, or are you proposing a forth model?

I'm not sure that it makes sense to really see this as a conversation about a specific model, but about the general structure of the eventual MF2 spec. My preferred model continues to be the one available and implemented here: https://github.com/messageformat/messageformat/tree/mf2/packages/messageformat

That is based on the EZ model, but has been updated to incorporate e.g. the recent consensus of only allowing literals and variables as function arguments.

Because the RuntimeValue there seems quite confusing to me.

It is used both as return of a formatting function:
interface RegistryFunc<T> {
	(ctx: FormattingContext, args: Array<Argument>, opts: Record<string, Parameter>): RuntimeValue<T>
}
How can I represent "format to parts" with the few basic types defined?

In case you're not familiar with the syntax, the <T> in the interface name indicates that it's a generic, and that this type T is used to further define the return value RuntimeValue<T>. That, in turn, is a generic abstract class that wraps a value: T, while requiring its implementations to provide methods such as formatToParts(ctx). If you look then at e.g. NumberValue, one of its concrete implementations, you'll see that it includes a member opts: Intl.NumberFormatOptions, which is used by the formatting methods.

So I can't have a ResourceManager in context, because it is not a RuntimeValue.

Would a ResourceManager be something that you'd like to format? If not, then it probably doesn't make sense to wrap it inside a RuntimeValue.

Or RuntimeValue only means some "unknown type packed with type info"?
So I can write my own DateValue or PersonValue that extend RuntimeValue?

Yes, you can write your own implementations of RuntimeValue. That's actually exactly what you'd be expected to do in order to provide support for a new data type. In Staś's implementation, the base class is abstract, so a custom implementation is always needed. In my implementation, the corresponding class is Formattable, and the base class already provides an implementation that's sufficient for most generic data types: https://github.com/messageformat/messageformat/blob/mf2/packages/messageformat/src/formattable/formattable.ts

If that is all there is, do we even need it, if the programming language I'm using has RTTI?

It's needed to allow for formatting options to be included with the value.

And how do we support more than one placeholder-function (see the examples above).

Is this a question about how pattern elements in a message may refer to more than one runtime value, or something else?

Yes.

I'm not sure that I understand the concern here.

How do we get to them "at render time?"

Because I don't see explained anywhere how things are rendered.
Are there 2 steps, or not?
parse => gives me a model
model.format( arguments ) => gives me a String / parts (as in "format to parts", TBD the name / type)
Basically I don't think it is clear how functions and "variables" (or whatever the name is now) come together, and when.

I think I'm starting to understand a possible source of your confusion here. Each of the proposed models handles this rather differently, and stasm/third is lazier than the others, forcing each formatting function to internally resolve the values of its arguments and options.

Also, it might be easier if we could agree not to refer to a message or resource object as a "model", given that it has methods, and the data model only contains pure data. We should not overload the term with multiple meanings.

B. We need a description of the rendering (runtime?) behavior

Yes, for both string and parts outputs.

Sure, we agree on this.
But what I mean is: in order to be able to discuss what the functions look like we need the runtime part described.
We can't go ahead to "let's discuss functions, do you agree this is good" without explaining how they work.

Good. I just want to ensure that we continue to keep both the format-to-string and format-to-parts outputs in mind, as the need to provide more than one form of output has significant effects on the design of the runtime.

Furthermore, we can (and probably should) wrap all formatting function arguments in something like RuntimeValue or Formattable.

I think that only makes things "clunky"
If the programming language has RTTI there is no need to wrap things.
Instead of doing something like
if (runtimeVal.getType == "date") { ... }
one would do if (runtimeVal instanceof Date) { ... }

As mentioned previously, a wrapper is needed in order to include formatting options with the value. Without such a wrapper, partially formatted values can't be passed in as formatting parameters and building generic formatting functions becomes rather burdensome.

Worse, we might need a RuntimeDateValue. RuntimeCalendarValue, RuntimeInstantValue

We should not burden the spec (and the devs) with extra abstractions , unless they are really needed.

I agree that the specification of e.g. date-based runtime values should be beyond the scope of the spec. I do think that defining a generic RuntimeValue or Formattable interface is within the scope of the spec.

E. A simple [] would not be enough for the arguments, we need something with names

Why? I don't see how this follows from your previous arguments.

If the arguments come as an array (as proposed) there it no way to extend things.

All the functions that take 2 RuntimeValues are "merged" into one.
When this happens in classic APIs the way people go about it is they add yet another parameter.
And some of them are null, and we ignore them. And it's all a mess.

For example, if we have a formatRange( int startIndex, int endIndex ) in this data model they will be an array with 2 IntegerLiterals.
Now I want formatRange( int startIndex, int rangeLength ). Can't do it.

That's why people like named parameters.

I challenge this assertion. While in general I do agree that there are situations in which named parameters are good, there are plenty of situations where a list of arguments works really well.

To use your chosen example, as far as I know every real-world function in every programming language for formatting a range of numbers or dates uses [start, end] arguments. Or can you provide a counterexample?

I think there might be some misunderstanding here. @stasm's RuntimeValue is a wrapper around a value that has a generic type T.
It provides three different methods with concrete types for when that value needs to be formatted to a string, formatted to parts,
or compared against a select key.

So who own the methods? The RuntimeValue?
Art the 3 methods members of the RuntimeValue?
That is not what the current code (from Stas) seems to show.

In Staś's code they are methods of RuntimeValue. In my code they are methods of Formattable. If this is not clear, could you point to specific parts of either implementation that are confusing to you?

But the core question here is:
Are the formatting functions members of the object to be formatted (OO)? foo.toString(options)
Or we are functional? fooToString(foo, options)

The code seems to indicate functional.
You answer seems to indicate OO.

Please be specific about which parts of the code (either Staś's or mine) implies that either his RuntimeValue or my Formattable does not provide methods for formatting its contents.

Sorry, looking through it again, and it feels very argumentative.
That is not my intention.

It's just that I don't see how things come together.
"Look at the code" is not a good answer, we can't discuss these issues in the whole group by asking everybody to read the code.

The main reason is not "people can't read code" (of course they can)
But by looking at the code it is impossible to tell what is intentional, what is implementation detail, what is workaround for programming language limitations, what is a shortcut for cleaner code.

For example in the Stas's code there is stasm\third\impl\context.ts

And at the end the toRuntimeValue has a switch on node.type with cases for "StringLiteral" and so on.

What happens if I add "DateRuntimeValue"? TypeError
Do I have to modify the core implementation to add a new type?

If you have a Date as a runtime value, that will be a VariableReference in the data model. Hence when resolving a Parameter in toRuntimeValue(), you may observe that the value this.vars[node.name] is returned. Then, we may look at the type of this.vars (Record<string, RuntimeValue<unknown>>) and from this deduce that when this function is called, the runtime date value will be returned as a RuntimeValue<unknown>, i.e. just providing the default methods. If a custom formatting function needs to do something special with the wrapped date's value or formatting options, it could use an instanceof check to verify that it has a DateRuntimeValue.

Something very much like this is demonstrated here: https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/stasm/third/example/example_opaque.ts

I think that proof of concept implementation should simulate the "real world": I get ICUNext, or FluentNext, or ECMAScript.Intl.MessageFormat, and WITHOUT changing the code I add my own types to format, or to select on.

This is already now possible with Staś's and my implementations.

mihnita · 2021-09-19T02:13:42Z

mihnita
Sep 19, 2021
Maintainer

P.S. Don't feel pressured to answer over the week-end (the way I did).

Even if you do, we can't expect all the group members to "parse" this thread and get an informed opinion.
That's why I said (on slack) that I don't think that the topic is "ripe" enough to discuss on Monday with the WG.

1 reply

eemeli Sep 19, 2021
Maintainer Author

The specific and entire extent of what I'm looking to bring to the group for discussion tomorrow is contained in the "Formatting function signatures" part of this comment above: #190 (reply in thread)

Here's the text of it, in case it's difficult to find:

Another way of putting this is that we're talking about the signature of a function call, which I'm arguing should be as simple and limited as possible, whereas you'd like it to have greater access: [snip]

The key difference here is that with types+functions we don't need to give a formatting function unrestricted access to the runtime context, which functions-only effectively requires in order to make it possible for e.g. term to access the current message resource. This makes reasoning about the function's behaviour much easier, and provides a significantly smaller attack surface for exploiting any vulnerabilities in custom formatter code.

I have prepared a presentation clarifying the above, covering the current state of the art, how the different proposals currently address this, and why it's relevant to our design work. In general, the preceding discussion in this thread has not touched on this specific topic at all, or at most only in passing.

mihnita · 2021-09-20T08:10:00Z

mihnita
Sep 20, 2021
Maintainer

has not touched on this specific topic at all

Mostly because that description is too vague to discuss without clarifying that that means.

What does { type: 'function', func: 'list', args: ['foo', 'bar', 'baz'], attr: { type: 'disjunction' } } even means?

The most critical missing piece is what us "known" after a parse, and what is know at "render time"
Is there a a parse + msg.format(params), or not?

Does it mean that the ['foo', 'bar', 'baz'] are known after parse? So they are present in the serialized form (syntax)?

If that's the case, then there is no need for a function, to format a list, just write "foo, bar or baz"
If they are not in the serialized message, when do these pieces of info come together with the "parsed" format?

So if you don't want to clarify it here, I hope they are clarified in the presentation to the group.

1 reply

eemeli Sep 20, 2021
Maintainer Author

I'm sorry, I don't really know what to say here, so I'll try repeating what I've said before.

As I mention in my preceding comment, the relevant part in this thread to what I'd like us to discuss in the meeting is contained in the "Formatting function signatures" section of this comment, and I even went and block-quoted its relevant text.

The part that you quoted and ask about is not in that section. It is of course relevant to the wider discussion here, but not specifically to formatting function signatures. I am not intending to cover it in the meeting.

If you find that it's unclear what I mean by "Formatting function signatures" or why I think they're rather important, I would invite you to look at my presentation on this ahead of the meeting; it's linked from the meeting's agenda.

mihnita · 2021-09-20T22:08:38Z

mihnita
Sep 20, 2021
Maintainer

Quoting from that exact block:

literal requires no context
variable gets the runtime scope, i.e. the parameters that the formatting function was called with
function uses the function registry
term builds a getter function for other messages

For me the part that is unclear is the variable one.
Can't really display a variable without a function.
If the variable is a Date, then it needs to be handled by the DATE function.

That's what I try to understand.
More than once before Zibi insisted that the .format(parameters) call is my blind spot, and that Fluent does not do this.

In my mind there has to be "something like that", even in Fluent.
At some point "something" should take the count = 42 or date(2021-12-31) and "merge" it with the data model
(AST, whatever Fluent calls this) in order to render things (to a DOM, or whatever).
The fact that is not called .format or that the user does not see that because it is done by the framework, deep-down, is not relevant.

So, what's it "the thing" that brings the_answer=42 and "{count} is the answer" together?
Is that happening in one step (at parse?), or is a separate call?

This is what I tried to clarify in the "7. Variable references" section of the "whiteboard doc" I had with Stas.
https://docs.google.com/document/d/1w6fxTfh4xeaqlKZ_QZK1sDpvTv1TqtBeh897aOT8qyk/edit?resourcekey=0-jRKq3bi_UzcnK-5gA8yjsw#heading=h.qjtpf1lwpfvn

All that affects directly the function signatures.

1 reply

eemeli Sep 21, 2021
Maintainer Author

For me the part that is unclear is the variable one.
Can't really display a variable without a function.
If the variable is a Date, then it needs to be handled by the DATE function.

While it's true that a Date needs specific handling when formatting, this is not in fact provided by a custom formatting function in either the EZ proposal or Staś's third. Instead, both of these approaches provide the actual formatting via a wrapper around the Date. Staś doesn't have this implemented yet, but in EZ it's FormattableDateTime.

With this approach, the role of a DATE formatting function is to return a FormattableDateTime that includes any corresponding options.

At some point "something" should take the count = 42 or date(2021-12-31) and "merge" it with the data model
(AST, whatever Fluent calls this) in order to render things (to a DOM, or whatever).
The fact that is not called .format or that the user does not see that because it is done by the framework, deep-down, is not relevant.

So, what's it "the thing" that brings the_answer=42 and "{count} is the answer" together?
Is that happening in one step (at parse?), or is a separate call?

If I understand right, pattern element formatters are indeed here proposed as the "things" that connect e.g. variable references with runtime variable values.

So if we presume that we start with a message being first resolved into a list of pattern elements:

[
  { type: 'variable', var_path: ['count'] },
  { type: 'literal', value: ' is the answer' }
]

then it is the pattern element formatters that turn this into a list of Formattable objects by combining each with any appropriate context values:

[
  new FormattableNumber(42),
  new Formattable(' is the answer')
]

Leaving the concatenation of these formattables' toString() or toParts() values to be task of the actual format() or formatToParts() call.

mihnita · 2021-09-21T22:39:49Z

mihnita
Sep 21, 2021
Maintainer

If I understand right, pattern element formatters are indeed here proposed as the "things"

I don't know. It is your proposal, and I am trying to understand it.
So if you don't understand it right, who does?

the role of a DATE formatting function is to return a FormattableDateTime that includes any corresponding options.

I'll try to map things to a different set of concepts, to see if I get it:

Would it be fair to say that "FormattableDateTime" is a formatter in ICU meaning?
That works like this:FormattableDateTime df = new FormattableDateTime(options, locale, ... more?)
But does not encapsulate the actual value of a date, that comes later on.
So it can be used (df.format(date) or equivalent) again and again, for different dates?
(for example if I write a logger message, like "Log entry: {date} {time} {component}::{class}: {tag} ::: {message}")

And would that make DATE a factory? That takes options and other "stuff" and builds a "formatter" (or Formattable*)
(not arguing about names right now, just trying to understand)

So there is another "secret function" that takes something like

[
  { type: 'variable', var_path: ['count'] },
  { type: 'literal', value: ' is the answer' }
]

and takes the current value of count (where from) and produces

[
  new FormattableNumber(42),
  new Formattable(' is the answer')
]

Is that the case?

If yes, then I have to think about the implications.
If that's the case, I already have some more clarifying questions :-)
I can see some advantages, and I can see some possible disadvantages. But I don't have a strong opinion yet.

But before spinning my wheels in the wrong direction I would like to understand if this is the case, and where the value of that count comes from.

3 replies

eemeli Sep 21, 2021
Maintainer Author

If I understand right, pattern element formatters are indeed here proposed as the "things"

I don't know. It is your proposal, and I am trying to understand it.
So if you don't understand it right, who does?

"If I understand right" was a reference to understanding what your were trying to say. I could not be certain that I had understood right what you meant by "the thing" in your previous message.

the role of a DATE formatting function is to return a FormattableDateTime that includes any corresponding options.

I'll try to map things to a different set of concepts, to see if I get it:

Would it be fair to say that "FormattableDateTime" is a formatter in ICU meaning?
That works like this:FormattableDateTime df = new FormattableDateTime(options, locale, ... more?)
But does not encapsulate the actual value of a date, that comes later on.
So it can be used (df.format(date) or equivalent) again and again, for different dates?
(for example if I write a logger message, like "Log entry: {date} {time} {component}::{class}: {tag} ::: {message}")

And would that make DATE a factory? That takes options and other "stuff" and builds a "formatter" (or Formattable*)
(not arguing about names right now, just trying to understand)

No. It would perhaps be easiest if you could allow yourself to become "tainted" (as you put it on Slack) by studying the actual implementations of Fluent, the EZ model and/or Staś's proposal. Each of them has an equivalent of this type of wrapper (Fluent: FluentValue, EZ: Formattable, third: RuntimeValue) that wraps a value and its formatting options.

For example, the actual FormattableDateTime constructor takes the arguments value, locales and options. It explicitly includes the actual value being formatted, so that its toParts() and toString() methods do not set or change the initially set value.

So there is another "secret function" that takes something like
[
  { type: 'variable', var_path: ['count'] },
  { type: 'literal', value: ' is the answer' }
]
and takes the current value of count (where from) and produces
[
  new FormattableNumber(42),
  new Formattable(' is the answer')
]
Is that the case?

There's no need to use an appellation like "secret function". You are describing a pattern element formatter. This is in fact explicitly stated in the paragraph between those to code blocks which you elided:

then it is the pattern element formatters that turn this into a list of Formattable objects by combining each with any appropriate context values

If you would like to find out more about how a pattern element formatter might work, you are invited to study the implementations available here: https://github.com/messageformat/messageformat/tree/mf2/packages/messageformat/src/pattern

I would be very happy to answer any questions you might have about the code, in case it's hard to understand.

mihnita Sep 23, 2021
Maintainer

you could allow yourself to become "tainted" (as you put it on Slack) by studying the actual implementations

There are reasons I don't want to do that.

In code it is not possible to say what is accidental, implementation detail, shortcut, tech stack limitations, vs intentional, relevant to the data model.
The code does not tell you "why", only "how", does not tell you the big picture.
The code does not answer questions like "how is X handled" if there is not already some unit test for it.
Specs are standalone. Even when there is a reference implementation, the spec does not point to the implementation, but the other way around
Because I already looked at it, and it is unclear (that's why I'm asking about INTENT)

Based on the answers here (and what I got from the code) the intention to have that construct (variable without function) is convenience.
So that as a developer I can say "You have a meeting in {count} days, at {time}" without being forced to say "...{NUMBER(count)}..."

ICU has the same thing, I can say "...{dueDate}..." or "...{dueDate, DATE}..." or "...{dueDate, DATE, ::yMMMd}..."

Is that the case?

I think that if the goal is not just "let's put what Fluent does in a standard", but make things better where possible.

So when we design something we should also look at alternate ways, to see if there might be something better.
And be open if someone else suggest something. Might be better, or not, TBD.

What am thinking goes back to some of the core "good practices" that I try to follow (and I acknowledge this was not voted / agreed)

For me:

All functions and types should be equal, and behave the same

There should be no "second class citizens".
A function I write should be able to replace a "stock" one.
A type I add should be able to be default for a type, replacing the stock one.

I should not have to say "...{dueDate}..." works, because Date is a well known type, but "...{price}..." does not work, because the type of price is Amount, and that is your custom data type.

All functions should behave as they are custom

In that the the core implementation should not know about any types, and the "well known types" (and formatters) should be in a different folder. The core should never do if (foo.type == "date")

The only difference between the currently recognized Fluent / ICU types and formatters (long, date, etc) should be that:

most popular implementations of the core spec will also implement these functions out of the box
they are in the "standard registry" already (both the non-runtime registry (schema?) and the runtime one). So I don't need a company registry / schema, and don't need to add register them at runtime (in the runtime reg)

But basically if you use a library implementing the spec in 5 years you should not be able to feel any difference between the way numbers and dates (and plurals) work and lists, or person, or amount (types added 2 years later).

After this big detour, this is how I would handle this...

Values are nothing special, they are still formatted with functions.
The "save some typing" is relevant for syntax, not for data model.

It is like defining a function with default parameters (def greet(name, msg="Good morning!"): ...)

When I say "...{dueDate}..." it is still a function there, and there is still a default style (for example LONG)

But I would "register" those function in the runtime registry, associated with the type (as key).
(and I can also replace "standard" ones)

The core implementation never does if (type == "date") ..., it does runtimeRegisty.getFormatterForType(type) ...

So there is no difference between variables and formatting functions, they are all functions.
But there is no need to bother to tell you the function if you already know how to get it from the type.

There might be a difference between solving the type at rendering time vs at parse time (which is why making the distinction is important)

And that might result in treating them differently in the data model.
But should be a deliberate decision, with documented pros and cons, what a design document does.

We should not drag in "the Fluent way" or "the ICU way" without any discussion.

It is very likely that the people who designed Fluent DID all this (I would be worry if they didn't)
And there was a deliberate decision to do things a certain way.

But those documents:

Are not accessible to outsiders
What was good in for browser / JS / custom localization tooling / company standard might be different for a server / C++ / standard l10n industry / universal standard
Some decisions might be mistaken. People design stuff and then learn better when the rubber hits the road

So we can't just take some code that emulates what Fluent does and "enshrine" it as a Unicode standard.

eemeli Sep 23, 2021
Maintainer Author

you could allow yourself to become "tainted" (as you put it on Slack) by studying the actual implementations

There are reasons I don't want to do that.

Alas. I can only speak from personal experience, in that I have myself learned much by studying how others have solved similar problems to ones that I've faced.

Based on the answers here (and what I got from the code) the intention to have that construct (variable without function) is convenience.
So that as a developer I can say "You have a meeting in {count} days, at {time}" without being forced to say "...{NUMBER(count)}..."

ICU has the same thing, I can say "...{dueDate}..." or "...{dueDate, DATE}..." or "...{dueDate, DATE, ::yMMMd}..."

Is that the case?

Convenience is certainly one reason, but I would posit that there is value in separating different operations from each other, in this case:

Determine the value of a variable.
Get the result of calling a formatting function.

A key facet of this proposal, though, is to realise that we may use a higher level of abstraction when dealing with various pattern elements. It isn't specifically about how we deal with a variable or a function, but it's about the fact that each of those (along with e.g. literal values) are pattern elements, and when formatting a message we may use the abstraction of a "pattern element formatter" to answer the question of how we deal with each one.

I think that if the goal is not just "let's put what Fluent does in a standard", but make things better where possible.

So when we design something we should also look at alternate ways, to see if there might be something better.
And be open if someone else suggest something. Might be better, or not, TBD.

Oh, absolutely agreed. For instance, the abstraction of pattern element formatters proposed here has no Fluent equivalent whatsoever.

What am thinking goes back to some of the core "good practices" that I try to follow (and I acknowledge this was not voted / agreed)

For me:

All functions and types should be equal, and behave the same

There should be no "second class citizens".
A function I write should be able to replace a "stock" one.
A type I add should be able to be default for a type, replacing the stock one.

I should not have to say "...{dueDate}..." works, because Date is a well known type, but "...{price}..." does not work, because the type of price is Amount, and that is your custom data type.

Agreed. I actually think the MF2 spec does not need to define the actual formatting for numbers or dates either, leaving those entirely as implementation details. With such an approach, the explicit formatters of numbers and datetime objects would only need to be defined in terms of the options that they might receive.

All functions should behave as they are custom

In that the the core implementation should not know about any types, and the "well known types" (and formatters) should be in a different folder. The core should never do if (foo.type == "date")

Agreed. I think there is a specific case to be made for the spec to make reference to the matching of CLDR plural categories and string representations of integers to number values, and for the concatenated-string output to be somehow defined in addition to the formatted-parts output. Other than those specific cases and their resulting requirements, I do not believe the spec should concern itself with the actual types of values.

After this big detour, this is how I would handle this...

Values are nothing special, they are still formatted with functions.
The "save some typing" is relevant for syntax, not for data model.

It is like defining a function with default parameters (def greet(name, msg="Good morning!"): ...)

When I say "...{dueDate}..." it is still a function there, and there is still a default style (for example LONG)

But I would "register" those function in the runtime registry, associated with the type (as key).
(and I can also replace "standard" ones)

The core implementation never does if (type == "date") ..., it does runtimeRegisty.getFormatterForType(type) ...

So there is no difference between variables and formatting functions, they are all functions.
But there is no need to bother to tell you the function if you already know how to get it from the type.

There might be a difference between solving the type at rendering time vs at parse time (which is why making the distinction is important)

And that might result in treating them differently in the data model.
But should be a deliberate decision, with documented pros and cons, what a design document does.

Perhaps it would help to consider things from a slightly higher level of abstraction? We have all sorts of values that might be coming in as runtime arguments, or as the results of formatting functions. Rather than considering how to format each type separately, let's assume that we can build a wrapper around each { value, options } tuple that itself knows how to format itself as a string, or as a list of formatted parts, and how to match itself againt selector keys.

With this abstraction, we may consider the output of a formatting function to be such a wrapped value. So for instance something like the formatting function corresponding to {dueDate, DATE, ::yMMMd} would know how to construct such a wrapper around a date object, and would be able to parse the skeleton given to it, should the native formatter require some different shape of options.

One benefit of such an abstraction is that the spec could then be written in a way that requires all runtime values to use this wrapper, allowing for the spec itself to be completely agnostic on the type of a value. Some specific consideration would probably need to be made regarding the behaviour of a wrapper around a number, and how it behaves when compared to selector keys.

How does that sound? In my implementation, I use the term "Formattable" to refer to such a wrapper.

We should not drag in "the Fluent way" or "the ICU way" without any discussion.

I do not think anything at all is getting dragged in without thorough and exhaustive discussion.

It is very likely that the people who designed Fluent DID all this (I would be worry if they didn't)
And there was a deliberate decision to do things a certain way.

But those documents:
* Are not accessible to outsiders

* What was good in for browser / JS / custom localization tooling / company standard might be different for a server / C++ / standard l10n industry / universal standard

* Some decisions might be mistaken. People design stuff and then learn better when the rubber hits the road
So we can't just take some code that emulates what Fluent does and "enshrine" it as a Unicode standard.

I'm not really sure to what you're referring to here? This proposal is rather separate from anything done in Fluent, in case that's what you're implying.

If you're interested in the development history of Fluent, it's actually all been done in the open, and is accessible to anyone. If you're interested, this is the repository where you can find all this: https://github.com/projectfluent/fluent. I would in fact encourage you to do so, as the discussions there have covered quite a bit of the same ground as we have in the MF2 working group, but have often used a different approach.

Thankfully, though, some of the original developers of Fluent (namely @stasm and @zbraniecki) are with us in this working group, and have brought with them the benefit of their experiences while designing that language, and then seeing its adoption by real-world users. This allows us to build the MF2 spec with the benefit of their hindsight, which should be considered invaluable. Would you agree?

mihnita · 2021-09-28T00:54:38Z

mihnita
Sep 28, 2021
Maintainer

I have myself learned much by studying how others have solved similar problems to ones that I've faced

Absolutely.

But when judging APIs I usually prefer getting in the shoes of a random developer who uses that API directly, usually only reading the documentation when they get stuck.
If the names are clear, the methods parameters intuitive, good examples available, they are happy to use things.
They might read the doc if they are confused.
They go and ask on StackOverflow.
Maybe read the spec.
But I would never ask a user of my library to read the code.

So what I want here is to clarify how things are expected to behave.
Not look at the implementation to figure out what the intended (and not accidental) behavior is.

Perhaps it would help to consider things from a slightly higher level of abstraction? We have all sorts of values that might be coming in as runtime arguments, or as the results of formatting functions
...
How does that sound? In my implementation, I use the term "Formattable" to refer to such a wrapper.

I have the nagging feeling that we talk about the same thing, but somehow we don't understand each other.

So I keep rephrasing the same question in all kind of ways, to maybe click at some point...

Here is another way for me to ask it...

Does Formattable contains the value to format, or not?

When I parse the syntax (whatever that is) and I have a model, we end up with a PatternElement[]
Are some of them Formattable, at some point? Or not?

If the syntax was "...{dueDate, DATE, ::yMMMd}...", what would the PatternElement[] look like?
Is there a Formattable there?
Does it contain the value to be formatter, or not?

Can I do

parserMessage.format ( params : {dueDate: {2021, 12, 17} }
parserMessage.format ( params : {dueDate: {2022, 1, 27} }
parserMessage.format ( params : {dueDate: {2023, 7, 3} }

and get different results?

That means that the value to be formatted is not part of the Formattable PatternElement, it "comes in" later, based on an argument name (dueDate)

Or not?

So the Formattable contains the value to be formatted? Or the name of the parameter that later will contain the value to be formatted?

Thankfully, though, some of the original developers of Fluent (namely @stasm and @zbraniecki) are with us in this working group,
...
This allows us to build the MF2 spec with the benefit of their hindsight, which should be considered invaluable. Would you agree?

OK, in my answer want to say anything about Fluent of the designers of Fluent.
I will try to explain what is my position, in general.

So, having the designers available: very-very-very valuable.
Invaluable? In the "indispensable" or "can't be replaced" sense?
Then my answer might be controversial, but no, I don't think it is "invaluable"

The value is added when the original developers answer questions, not send us to existing docs or code.
Otherwise I can read said docs / code without them.

I value the experience of someone who used a system for a long time over the person who designed it.
The designers can tell you why certain choices were made a certain way.

The power users can tell you why some of the choices were bad and some of the choices are good.

If I design something I will think it is the best thing since sliced bread.
I am biased, and we are all humans.

But if the power user that has experience with my tool and 20 other tools tells me that I could have done better, I should listen.
Even if they say "yours is best, BUT...", I should listen to that "but" part.

And adoption is not a criteria.

ICU has very high adoption as a library.
But we all agree that MessageFormat is sub-par (that's why we are here)

Python 2 adoption is huge. But we still have Python 3.
And the fixes in the way strings / bytes / encoding were handled were long due.
Python 2 was a pile of *** in that area.

2 replies

zbraniecki Sep 28, 2021
Collaborator

I value the experience of someone who used a system for a long time over the person who designed it.

We're in a lucky position to have a very tight feedback loop with the engineers using Fluent today in their day job building localizable Web interfaces in HTML, JS, WebComponents and React.

We just completed the migration of the main UI fully to Fluent and we have around 5500 Fluent messages in use in the product used by tens of millions of users. We also have a good paper trail of the work of hundreds of front end engineers who work on new features and maintain the codebase that uses Fluent over the span of the last 3 years.
And that's just one product, we have multiple that we can test, some of them purely "web", some of them fully "react" based etc.

We have 3 localization project managers who maintain those projects and can share relevant experience.
We also have access to the developer community of the Web based CAT tool which is used to localize Firefox and other Mozilla products to Fluent.

Finally, we have over 100 volunteer localization teams from all around the World who work with the CAT tool on localization of all of those products.

All of that work is public, open source and accessible to all of us for any incidental, or statistical analysis.

Are there any questions that we could ask of those groups that would help us formulate objective position on the questions we are facing about viability of the solutions implemented in Fluent?

eemeli Sep 28, 2021
Maintainer Author

I have the nagging feeling that we talk about the same thing, but somehow we don't understand each other.

So I keep rephrasing the same question in all kind of ways, to maybe click at some point...

Here is another way for me to ask it...

Does Formattable contains the value to format, or not?

And I keep trying to rephrase my affirmative answer to this. Let me try one more time: Yes, a Formattable contains the value to format, along with any relevant formatting options.

For more on this, I invite you to review my previous answers in this thread or look at PR #198, which adds the relevant section to the spec.

When I parse the syntax (whatever that is) and I have a model, we end up with a PatternElement[] Are some of them Formattable, at some point? Or not?

If the syntax was "...{dueDate, DATE, ::yMMMd}...", what would the PatternElement[] look like? Is there a Formattable there? Does it contain the value to be formatter, or not?

This too seems like a question that gets repeated. This is the PatternElement that I would use to represent your example:

{
  type: 'function',
  func: 'DATE',
  args: [{ type: 'variable', var_path: [{ type: 'literal', value: 'dueDate' }] }],
  options: { skeleton: { type: 'literal', value: 'yMMMd' } }
}

That is not a Formattable, it's a PatternElement. Those are different abstractions.

The PatternElement contains e.g. a reference to the dueDate variable. During formatting, the appropriate Pattern Element Formatter for type: 'function' is used to format it. A method of this Pattern Element Formatter is then called with the current Formatting Context and this PatternElement as its arguments. It would then:

Call the type: 'variable' Pattern Element Formatter to resolve the value of args[0] into a Formattable. During that process, the type: 'variable' Pattern Element Formatter would call the type: 'literal' Pattern Element Formatter to resolve the var_path[0] value into a Formattable.
Call the type: 'literal' Pattern Element Formatter to resolve the value of options.skeleton first into a Formattable, and then using its getValue() method, into the actual value.
Identify the DATE formatting function in the registry.
Call the DATE formatting function with the resolved args and options, as well as the current locale(s), and receive a Formattable representation of the resulting value.
Depending on the desired form of the output, either call the toString() or toParts() method of the returned Formattable to determine its own output value.
Return either a string or a list of MessageFormatPart objects (which are yet again different from either PatternElement and Formattable).

As you may observe, in this flow Formattable is only used internally to wrap runtime values, and does not show up either in the formatter's input arguments or its output.

Can I do

parserMessage.format ( params : {dueDate: {2021, 12, 17} }
parserMessage.format ( params : {dueDate: {2022, 1, 27} }
parserMessage.format ( params : {dueDate: {2023, 7, 3} }

and get different results?

Yes you can.

That means that the value to be formatted is not part of the Formattable PatternElement, it "comes in" later, based on an argument name (dueDate)

Or not?

The value to be formatted is not part of the PatternElement, but it is at runtime included in the Formattable wrapper around the dueDate value. PatternElement and Formattable are different things.

So the Formattable contains the value to be formatted? Or the name of the parameter that later will contain the value to be formatted?

Yes, Formattable contains the value to be formatted.

mihnita · 2021-09-29T18:11:16Z

mihnita
Sep 29, 2021
Maintainer

I think I am starting to (slowly?) understand...

So there is a model after parse, that does not contain any Formattable.
The things that will be formatted at some point (like dueDate) are at this point just some kind of identifier.

But then at some later point there is a model where (some of?) the "functions" are replaced by Formattables, which now contain the real dueDate, which is (probably) a wrapper around Date (DateRuntimeValue?)

Is this acurate?

If this is the case, then these are the areas that I think are getting in the way of understanding:

There are in fact 2 models:

One with functions but not formattables, and no real values, just names of values. That is the one from parse?
The second model is derived from the first, with variable names like dueDate & functions replaced by RuntimeValue<?>, which are Formattables

We can call these 2 models, or the same model transformed, whatever, tbd

This is not explicit anywhere.

And I think that trying to describe 2 models (or 1 model with transformations in time) is what makes things confusing.
That is the part that I did not clearly understand until now.

So the runtime workflow is something like this?

model1 = parser( some syntax )
// model1 has functions + variable names
model2 = model1.secretFunction( map of variables ? )
// model2 has Formattables, which wrap real values (RuntimeValue)
model2.format(???) => string
model2.formatToParts(???) => ???

I called the secretFunction that because there is no explicit mentioning of it that I can find.
But ignoring the names, is this what's happening, more or less?

Because at least for me this was not at all clear (still not sure that's the case)

This is what I asked in the "whiteboard document", the "9. Various “states” for in which the model can be" section.

I think reading that document might help reduce all this back and forth, as I was trying there to understand the model proposed by Stas, which is what I thing all of these PRs are trying to implement.

I think that the "naming" also gets in the way of understanding, but let's take it one at the time...

1 reply

eemeli Sep 29, 2021
Maintainer Author

I think I am starting to (slowly?) understand...

So there is a model after parse, that does not contain any Formattable. The things that will be formatted at some point (like dueDate) are at this point just some kind of identifier.

But then at some later point there is a model where (some of?) the "functions" are replaced by Formattables, which now contain the real dueDate, which is (probably) a wrapper around Date (DateRuntimeValue?)

Is this acurate?

Yes, with the specification that the "some later point" that you refer to is during the format() or formatToParts() call.

If this is the case, then these are the areas that I think are getting in the way of understanding:

There are in fact 2 models:
* One with functions but not formattables, and no real values, just names of values. That is the one from parse?

Could you clarify what specifically you mean by "functions" here?

* The second model is derived from the first, with variable names like `dueDate` & functions replaced by `RuntimeValue<?>`, which are `Formattable`s
We can call these 2 models, or the same model transformed, whatever, tbd

I'm not sure that it's really beneficial to call the second representation a "model", but sure.

This is not explicit anywhere.

And I think that trying to describe 2 models (or 1 model with transformations in time) is what makes things confusing. That is the part that I did not clearly understand until now.

So the runtime workflow is something like this?
model1 = parser( some syntax )
// model1 has functions + variable names
model2 = model1.secretFunction( map of variables ? )
// model2 has Formattables, which wrap real values (RuntimeValue)
model2.format(???) => string
model2.formatToParts(???) => ???
I called the secretFunction that because there is no explicit mentioning of it that I can find. But ignoring the names, is this what's happening, more or less?

Because at least for me this was not at all clear (still not sure that's the case)

The reason you're not seeing any reference to this secretFunction method is because it doesn't exist. There is no model2. It's more like this:

const msgRes = parseMessageResource(str) // pure data, with messages represented as PatternElement sequences
const mf = new MessageFormat(locale) // an object that has formatting methods
mf.addResource(msgRes) // and now it has messages too
mf.format(msgId, variables) // returns a string

Inside that mf.format() call, the following happens:

The right message is found from the available resources
Its select case is resolved (if necessary)
Pattern element formatters turn each of its pattern element values into string values
Those string values are concatenated.

In all of the above, Formattable values only exist and are relevant in step 3 of the above. Please see my preceding comment in this thread for an explicit step-by-step walkthrough of what's happening inside such a pattern element formatter.

This is what I asked in the "whiteboard document", the "9. Various “states” for in which the model can be" section.

I think reading that document might help reduce all this back and forth, as I was trying there to understand the model proposed by Stas, which is what I thing all of these PRs are trying to implement.

While the models that Staś and I are proposing are similar, they are not the same. It is unfortunate that you find it so challenging to read others' code, as that might be the most direct way of understanding what they're doing, and how they differ.

mihnita · 2021-09-29T18:23:54Z

mihnita
Sep 29, 2021
Maintainer

@zbraniecki

I am sorry, it looks like my answer about "invaluable" did upset you.
That was not my intention, and that is why I tried to stay away from Fluent and talk about ICU, and Python.

So I will just "walk away" from that discussion, to not make it worse.

If you really want to hear my take on it ask, and I will try to put it as neutral as possible. maybe not in this github discussion.
Or we can chat about it in person at the IUC, so that my face / body language can (hopefully) help to show that it is not a personal attack.

And it is all my personal opinion. Personal opinions can be of course wrong. It's not science.
And my "view of the world" probably does not belong in a github discussion anyway.

1 reply

zbraniecki Sep 29, 2021
Collaborator

It did not upset me at all.

My intention was to emphasize that we do operate in a (lucky for most standard WGs) scenario where we have access to prior art authors, and multiple classes of users of the system so if we have a question like "will CAT tools be able to X" or "will developers want to use Y" or "will localizers benefit from Z" we can ask those questions directly and get answers.

It was my response to your comment that you value the experience of someone who used a system for a long time over the person who designed it.

So it seems like we can get those opinions and reduce the he-said-she-said if we can formulate the questions that would unblock us.

mihnita · 2021-09-30T15:02:09Z

mihnita
Sep 30, 2021
Maintainer

Could you clarify what specifically you mean by "functions" here?

I was thinking that this is what you describe as "function" (things that are registered, like "DATE")

====

I'm not sure that it's really beneficial to call the second representation a "model", but sure.

It is kind of a different state of the model.
And in order to call .format(param1) .format(param2) and get different results it means it is a copy + transform of the parsed model.
If you have name for these 2 models (or states of the same model), we can use them.

But they are conceptually different.

My main question then is: if this "second state" exists only inside the format call, why expose it?
Why make it part of the official data model at all?
Why not an implementation detail?
What is the benefit in exposing it?

I is clearly possible to implement format / formatToParts without going through this state.
And it is not useful as some kind of "cache", because it already depends on the parameter.
So successive calls of format with different parameters result in different "state 2 models" (with formatables)

1 reply

eemeli Oct 1, 2021
Maintainer Author

Could you clarify what specifically you mean by "functions" here?

I was thinking that this is what you describe as "function" (things that are registered, like "DATE")

Okay, with this understanding this description that you provided of the "first model" does not match mine:

One with functions but not formattables, and no real values, just names of values. That is the one from parse?

Specifically, I do not think functions are included within any such model. In my view, the model may include e.g. a type: 'function' pattern element, which is purely data. As described in this comment, when that pattern element is being formatted, the registered function is identified in the registry and called with appropriate arguments.

In my understanding, the data model is at all times a model that only contains data. In fact, I don't think there's ultimately any reason for it to contain anything other than maps, lists, and string values.

My main question then is: if this "second state" exists only inside the format call, why expose it? Why make it part of the official data model at all? Why not an implementation detail? What is the benefit in exposing it?

Could you clarify how you see it being exposed? Later on in this comment I provide a step-by-step walkthrough of the internals of a format() call, and this "second state" is never reached, as each pattern element is formatted directly to a string.

In brief, I'm trying to figure out why you think that this "second state" exists, and if perhaps there's something unclear about my previous comments where I've tried to explain how the formatting would happen? Please do ask if I've left something unclear, I've tried to be quite thorough.

When introducing Formattable to this discussion in this comment, I tried to clarify that it's primarily a wrapper around runtime values. Was that not clear? When formatting a message, there usually isn't any reason to represent it or its parts directly as Formattables, even if Formattables are used internally in the pattern element formatters.

mihnita · 2021-10-05T17:42:21Z

mihnita
Oct 5, 2021
Maintainer

Specifically, I do not think functions are included within any such model. In my view, the model may include e.g. a type: 'function' pattern element, which is purely data.

I am perfectly happy with that, and I would like to see if we can agree on the implications.

For example this means:

Let's not talk about Formattables, it's an implementation detail
The interface that defines the select/format/formatToParts is irrelevant

What is important is the signatures of the 3 methods, and describing what they do.
It is not relevant if the "runtime registry" is implemented as a map from string (function name) to "Function" (class+3 methods),
or there are 2 classes, a "format" one and a "select" one, or if there are 3, one for each method type.

We specify the methods exposed by an implementation of the "runtime registry" and that's good enough
(that is working on terms of interfaces).

In my understanding, the data model is at all times a model that only contains data. In fact, I don't think there's ultimately any reason for it to contain anything other than maps, lists, and string values.

100% to that.
TBD if we want to stop at maps, lists, string values, or we want to add a few more (numbers, bool), basically JSON types.

But in principle I agree that a reduced set of "core types" is the way to go.

there usually isn't any reason to represent it or its parts directly as Formattables, even if Formattables are used internally in the pattern element formatters

Maybe I don't understand some of the explanations in this discussion.

So if there is no reason to represent something as Formattables, and it is only used internally, why expose it?

Could you clarify how you see it being exposed?

There is a PR adding it to the spec: #198
I call that "exposing it"

This is the reason I use "fuzzy lingo" when I talk about implementation details, and I put things in quotes,
or give slash-separated alternate names (function/formatter/lambda/callable), and use pseudo-code.
Not because I can't decide on a specific term.

So that nobody confuses possible implementation with spec.

1 reply

eemeli Oct 11, 2021
Maintainer Author

Specifically, I do not think functions are included within any such model. In my view, the model may include e.g. a type: 'function' pattern element, which is purely data.

I am perfectly happy with that, and I would like to see if we can agree on the implications.

For example this means:
* Let's not talk about `Formattables`, it's an implementation detail

* The interface that defines the select/format/formatToParts is irrelevant

I have difficulty seeing how this follows, i.e. how agreeing on the data model only containing data leads to conclusions about discussion on other parts of the specification. I agree that Formattables and the interfaces of format/formatToParts calls don't really relate to the data model, but whether or not they're otherwise relevant to the whole of the MF2 spec is an entirely separate question.

What is important is the signatures of the 3 methods, and describing what they do. It is not relevant if the "runtime registry" is implemented as a map from string (function name) to "Function" (class+3 methods), or there are 2 classes, a "format" one and a "select" one, or if there are 3, one for each method type.

We specify the methods exposed by an implementation of the "runtime registry" and that's good enough (that is working on terms of interfaces).

First question that arises to me from this: Do you see a reason to expose an external selectMessageBody method/function in the public API? I would've presumed that this would be strictly internal.

Second, do you see how Formattables may be used as an abstraction for defining e.g. the "runtime registry" function interfaces? As in, we need a way to encapsulate the behaviour of a formatting function in at least the three places that you've mentioned:

When the function's value is used to select between message cases.
When the function's value is a part of a message that is formatted to a string.
When the function's value is a part of a message that is formatted to a sequence of parts.

Formattable is an abstraction for a value that allows for a single function to implement each of the above requirements. It is not a mandated implementation. If you do not think that it is a useful abstraction, could you e.g. write up a description of how you would specify the runtime registry methods? If you'd find it easier to do so in Java or TypeScript rather than text, I'd be happy to look at those as well.

In my preferred implementation, the MessageFormat constructor accepts a runtime option that may override the default datetime and number formatting functions. This runtime function registry is then included in the formatting context, and from there used by the function pattern element formatter when needed. When called, the arguments of a registered function are all wrapped up as Formattables, as is its return value. This means that when called, the registered formatting function doesn't need to know if its output is being e.g. formatted to parts vs. formatted to string.

So if you do not think that Formattables should be included in the spec, then please do indeed describe what these registry methods ought to look like.

Could you clarify how you see it being exposed?

There is a PR adding it to the spec: #198 I call that "exposing it"

Did you perhaps not notice the description of the PR? It starts with this paragraph:

As the proposal's text describes, a Formattable is a wrapper around values and their formatting options. It may be thought of as an abstraction that allows us to define the rest of the specification in a manner that's independent of the actual type of any runtime value.

A Formattable is an abstraction. It is a tool that allows us to talk about values being passed to and from functions, such that the function does not need to care about how it is being called. This does not in fact mean that it becomes a required part of each implementation.

This is the reason I use "fuzzy lingo" when I talk about implementation details, and I put things in quotes, or give slash-separated alternate names (function/formatter/lambda/callable), and use pseudo-code. Not because I can't decide on a specific term.

So that nobody confuses possible implementation with spec.

On the other hand, using "fuzzy lingo" means that we can't really be sure we're talking about the same thing, or that we're not mixing up different things. This is why I'm doing my best to refer to each thing every time with the same name as before.

I do not buy the argument that this clarity should be avoided because it might make someone mix up what's in the spec and what isn't. That is very clearly answered by looking at the spec, and seeing if it's there or not.

grhoten · 2021-10-13T05:16:53Z

grhoten
Oct 13, 2021
Maintainer

This is a very lengthy discussion. So I may be missing part of the discussion.

I like the fact that skeleton has been mentioned for dates. An explicit date is probably not a good idea. Though for our use case of grammatically correct sentences we have a modified relative date, which is a little harder to do, but it goes to show a generic problem with supporting prepositions. Here are 2 example scenarios that I'd like to see supported with this MessageFormat. These are just 2 scenarios that we frequently struggle with.

Dates with a preposition
You can have the following examples for a date or time when trying to represent an appointment made at a specific time for the same message. Here are some examples.

You have an appointment today at 1 PM.
You have an appointment tomorrow at 1 PM.
You have an appointment on November 11 at 1 PM.
You have an appointment on November 11, 2022 at 1 PM.

Notice that the "on" preposition is sometimes there. In some languages, the preposition "at" varies depending on whether it's 1 or not 1 for the time. What's more interesting is that some languages (i.e. Brazilian Portuguese) can format the time as 13:00 and then verbally say "1 in the afternoon" for the same time, which is confusing if the preposition is written one way and the entire phrase is spoken differently. I'm unsure if CLDR can fully support this scenario.

Locations with a preposition
Another common issue is handling the right preposition for a city, nation, island or other region specific things. Here are some examples:

Finding things in California
Finding things on Hawaii
Finding things in San Jose

In a language like French, the rules are a little more complicated for choosing the right preposition.

Can we handle such situations with the current proposals? I'm not saying that it needs to be solved, but the infrastructure should be able to handle them.

1 reply

eemeli Oct 25, 2021
Maintainer Author

The short answer is "yes, such situations will be supported". I would envision that both examples could/would be handled by (custom) formatters, so in syntax you'd end up with something like You have an appointment { locative($temporal } and Finding things { locative($place) }, with the locative() formatter handling the complicated magic.

Perhaps a more fundamental question here would be whether the "formatted" output ought to be static and fully resolved, or whether it could contain functions/methods that expect additional parameters to be given. So e.g. a "formatted" date could still expect to be passed an argument determining whether it's to be displayed on screen or read out loud. And this is where we connect with the questions about how deeply we define the MF2 runtime; do we assert a specific answer to this question, or do we only define the general shape of what a formatter might do with a message?

A Modular and Extensible MessageFormat 2.0 #190

eemeli Aug 21, 2021 Maintainer

Replies: 21 comments · 20 replies

stasm Aug 21, 2021 Maintainer

stasm Aug 21, 2021 Maintainer

eemeli Aug 21, 2021 Maintainer Author

Handling errors and fallbacks

Formatting function signatures

Accessing separate parts of the context

stasm Aug 21, 2021 Maintainer

mihnita Aug 23, 2021 Maintainer

grhoten Aug 23, 2021 Maintainer

eemeli Aug 24, 2021 Maintainer Author

dminor Aug 25, 2021

mihnita Sep 15, 2021 Maintainer

mihnita Sep 15, 2021 Maintainer

eemeli Sep 18, 2021 Maintainer Author

mihnita Sep 15, 2021 Maintainer

mihnita Sep 15, 2021 Maintainer

mihnita Sep 17, 2021 Maintainer

eemeli Sep 18, 2021 Maintainer Author

mihnita Sep 18, 2021 Maintainer

mihnita Sep 19, 2021 Maintainer

eemeli Sep 19, 2021 Maintainer Author

mihnita Sep 19, 2021 Maintainer

eemeli Sep 19, 2021 Maintainer Author

mihnita Sep 20, 2021 Maintainer

eemeli Sep 20, 2021 Maintainer Author

mihnita Sep 20, 2021 Maintainer

eemeli Sep 21, 2021 Maintainer Author

mihnita Sep 21, 2021 Maintainer

eemeli Sep 21, 2021 Maintainer Author

mihnita Sep 23, 2021 Maintainer

eemeli Sep 23, 2021 Maintainer Author

mihnita Sep 28, 2021 Maintainer

zbraniecki Sep 28, 2021 Collaborator

eemeli Sep 28, 2021 Maintainer Author

mihnita Sep 29, 2021 Maintainer

eemeli Sep 29, 2021 Maintainer Author

mihnita Sep 29, 2021 Maintainer

zbraniecki Sep 29, 2021 Collaborator

mihnita Sep 30, 2021 Maintainer

eemeli Oct 1, 2021 Maintainer Author

mihnita Oct 5, 2021 Maintainer

eemeli Oct 11, 2021 Maintainer Author

grhoten Oct 13, 2021 Maintainer

eemeli Oct 25, 2021 Maintainer Author

eemeli
Aug 21, 2021
Maintainer

Replies: 21 comments 20 replies

stasm
Aug 21, 2021
Maintainer

stasm Aug 21, 2021
Maintainer

eemeli Aug 21, 2021
Maintainer Author

stasm
Aug 21, 2021
Maintainer

mihnita
Aug 23, 2021
Maintainer

grhoten Aug 23, 2021
Maintainer

eemeli Aug 24, 2021
Maintainer Author

dminor
Aug 25, 2021

mihnita
Sep 15, 2021
Maintainer

mihnita
Sep 15, 2021
Maintainer

eemeli Sep 18, 2021
Maintainer Author

mihnita
Sep 15, 2021
Maintainer

mihnita
Sep 15, 2021
Maintainer

mihnita
Sep 17, 2021
Maintainer

eemeli Sep 18, 2021
Maintainer Author

mihnita
Sep 18, 2021
Maintainer

mihnita
Sep 19, 2021
Maintainer

eemeli Sep 19, 2021
Maintainer Author

mihnita
Sep 19, 2021
Maintainer

eemeli Sep 19, 2021
Maintainer Author

mihnita
Sep 20, 2021
Maintainer

eemeli Sep 20, 2021
Maintainer Author

mihnita
Sep 20, 2021
Maintainer

eemeli Sep 21, 2021
Maintainer Author

mihnita
Sep 21, 2021
Maintainer

eemeli Sep 21, 2021
Maintainer Author

mihnita Sep 23, 2021
Maintainer

eemeli Sep 23, 2021
Maintainer Author

mihnita
Sep 28, 2021
Maintainer

zbraniecki Sep 28, 2021
Collaborator

eemeli Sep 28, 2021
Maintainer Author

mihnita
Sep 29, 2021
Maintainer

eemeli Sep 29, 2021
Maintainer Author

mihnita
Sep 29, 2021
Maintainer

zbraniecki Sep 29, 2021
Collaborator

mihnita
Sep 30, 2021
Maintainer

eemeli Oct 1, 2021
Maintainer Author

mihnita
Oct 5, 2021
Maintainer

eemeli Oct 11, 2021
Maintainer Author

grhoten
Oct 13, 2021
Maintainer

eemeli Oct 25, 2021
Maintainer Author