Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add note to "Function Resolution" section about function argument and result types #686

Merged
merged 17 commits into from
Feb 28, 2024

Conversation

catamorphism
Copy link
Collaborator

Currently, the introduction to the spec states:

The form of the resolved value is implementation defined and the
value might not be evaluated or formatted yet.
However, it needs to be "formattable", i.e. it contains everything required
by the eventual formatting.

And the "Expression and Markup Resolution" section says:

Since a variable can be referenced in different ways later,
implementations SHOULD NOT immediately fully format the value for output.

However, the "Function Resolution" section is not as clear as it could be about the implications of these requirements for the interface with formatting functions.

I added some text that effectively implies that functions have the same operand type and result type. If this wasn't the case, it wouldn't make sense to bind the result of a function to a variable and use that result as an operand for another function call.

I think it's useful guidance for implementors to state this explicitly rather than letting it be inferred from the two existing passages that I quoted.

This relates to #515; some version of #645 would make this much more precise, but it's a start.

The reason this came up was that I was discussing the API for custom functions with members of the ICU TC, who were puzzled at first about why formatting functions take and return the same type (in my implementation).

If an implementor instead requires custom functions to take a "formattable" thing as an argument, and return a "formatted" thing, examples like the first use case in #515 (with the two calls to :number) wouldn't work. In my opinion, the current spec doesn't say that you can't do this -- it could be read as saying that a "resolved operand value" as mentioned in step 4, and the value referred to by the text "resolve the value of the expression as the result of that function call", might refer to different kinds of "resolved values".

(Taking and returning the same type makes formatting functions seem more like "transformers" than "formatters" -- and that's in line with the text in syntax.md saying, "Functions are used to evaluate, format, select, or otherwise process data values during formatting." -- but changing the name might be more controversial.)

@catamorphism catamorphism added the Agenda+ Requested for upcoming teleconference label Feb 23, 2024
Comment on lines 227 to 228
Thus, formatting functions SHOULD use a structure for the resolved _operand_ value
that is interconvertible with the structure for the result of the _function_.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few observations:

  1. It is misleading to refer to "formatting" functions here, given that their output may also be used for selection. Note how the text around this avoids that term.

  2. I at least have never encountered the term "interconvertible", and using it here should be avoided.

  3. There are cases where it makes sense for the operand type to be wider than the output type. For example, consider the resolution of {$n :number}, where the implicit input variable $n has a string value '42'. In a programming language like JS, it's easier for the custom function implementation to accept that its input may be a number, bigint, string, or object, rather than requiring each of those to come pre-wrapped as suggested by the SHOULD. See steps 6 and 7 here for the JS implementation details of this particular case.

    I do fully agree that the output of a function is expected to be in a shape that's acceptable as input, but I am not convinced that the input must always have that same shape.

Copy link
Member

@aphillips aphillips Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what you're trying to get at with your PRs for the post-45 period, @catamorphism. I recognize the problem here.

There is some question about whether the original operand value is available (transitive) or whether it becomes masked. That is, using @eemeli's example, {$n :number} where $n=="42", it seems reasonable that the value passed might be some number type. In a strongly typed language, this might be vary depending on the input or it might be a specifically expansive type like BigDecimal. That would be up to the implementer. Probably the original string is not available (well... you can get it through the original variable)

Perhaps:

Suggested change
Thus, formatting functions SHOULD use a structure for the resolved _operand_ value
that is interconvertible with the structure for the result of the _function_.
Thus, implementations SHOULD provide a means for _functions_ to expose
the resolved value of their _operand_
and _functions_ SHOULD populate that mechanism
with a data structure or type consistent with the set of implementation-defined
types that they would support as input.
> For example,
> Suppose the value of the _variable_ `$n` were the string `1`.
> The resolved value of the _operand_ assigned to `$num` in the example
> below would be a numeric type (such as an `int` or `BigInteger` in Java).
>```
> .input {$n :number}
> .local $num = {$n :integer}
>```

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few observations:

1. It is misleading to refer to "formatting" functions here, given that their output may also be used for selection. Note how the text around this avoids that term.

I wrote it that way since selector functions don't have an output (or at least, not in the way that the "output" is being described here.) Though I'll look and see if Addison's suggested changes address that.

2. I at least have never encountered the term "interconvertible", and using it here should be avoided.

OK.

3. There are cases where it makes sense for the operand type to be wider than the output type. For example, consider the resolution of `{$n :number}`, where the implicit input variable `$n` has a string value `'42'`. In a programming language like JS, it's easier for the custom function implementation to accept that its input may be a number, bigint, string, or object, rather than requiring each of those to come pre-wrapped as suggested by the SHOULD. See steps 6 and 7 [here](https://tc39.es/proposal-intl-messageformat/#sec-messageformat-numberfunctions) for the JS implementation details of this particular case.

That seems to not be ruled out by my original text, since even if the number formatter will never return a string "42", there might be other formatting functions that do just return their inputs, in some cases.

The goal here is to make a statement about all custom functions, and I think the only logical thing we can say about all of them is that (ignoring options and context), they have the type signature T -> T, for some T. In JS, T would mean something like numberBigIntstringobject. That wouldn't need to be explicitly written down in JS, but in C++ (for example), you do need a type to describe the interface that functions need to implement.

I'm not sure if this is clear, but one reason why it's hard to be precise about is that we're slipping between the object language sense of operand types and output types (as expressed in the specification of the function registry) and the meta-language sense (as expressed in an implementation's description of the calling conventions for functions).

   I do fully agree that the output of a function is expected to be in a shape that's acceptable as input, but I am not convinced that the input must always have that same shape.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps: [...]

I took some inspiration from this suggestion, but didn't use it exactly.

Comment on lines 225 to 226
Since the result of a function call can be bound to a _variable_,
the output of one _function_ may be the input of another _function_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use our own internal jargon here so that there is no confusion about what we're talking about. We should also avoid 2119 keywords, even if non-normatively formatted.

Suggested change
Since the result of a function call can be bound to a _variable_,
the output of one _function_ may be the input of another _function_.
A _local-declaration_ binds the output of an _expression_ to a _variable_,
thus the output of one _function_ is potentially the _operand_ of another.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a slightly different change instead (in order to avoid implying that an expression has output, which isn't really a concept in the spec); let me know what you think.

Comment on lines 227 to 228
Thus, formatting functions SHOULD use a structure for the resolved _operand_ value
that is interconvertible with the structure for the result of the _function_.
Copy link
Member

@aphillips aphillips Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what you're trying to get at with your PRs for the post-45 period, @catamorphism. I recognize the problem here.

There is some question about whether the original operand value is available (transitive) or whether it becomes masked. That is, using @eemeli's example, {$n :number} where $n=="42", it seems reasonable that the value passed might be some number type. In a strongly typed language, this might be vary depending on the input or it might be a specifically expansive type like BigDecimal. That would be up to the implementer. Probably the original string is not available (well... you can get it through the original variable)

Perhaps:

Suggested change
Thus, formatting functions SHOULD use a structure for the resolved _operand_ value
that is interconvertible with the structure for the result of the _function_.
Thus, implementations SHOULD provide a means for _functions_ to expose
the resolved value of their _operand_
and _functions_ SHOULD populate that mechanism
with a data structure or type consistent with the set of implementation-defined
types that they would support as input.
> For example,
> Suppose the value of the _variable_ `$n` were the string `1`.
> The resolved value of the _operand_ assigned to `$num` in the example
> below would be a numeric type (such as an `int` or `BigInteger` in Java).
>```
> .input {$n :number}
> .local $num = {$n :integer}
>```

@aphillips aphillips added normative Issue affects normative text in the specification formatting labels Feb 23, 2024
@aphillips
Copy link
Member

In the 2024-02-26 call we agreed that a revision of this PR would be "last in" for 45-alpha. If you care about this issue, following @catamorphism's update (which should appear after this comment) you must comment on the proposed text before COB 2024-02-27 in the America/Los_Angeles time zone. Please note that we will not be fixing this issue with the adopted note.

@aphillips aphillips added Action-Item Action item assigned by the WG fast-track Non-spec editorial changes, etc. and removed Agenda+ Requested for upcoming teleconference labels Feb 26, 2024
Avoid using the word "interconvertible"

Include example of composability

Include example for how the function interface would be defined
in a typed implementation language

Add note about multiple interpretations of composition
that requests feedback
@catamorphism
Copy link
Collaborator Author

I made some significant changes -- let me know what you think, @aphillips @eemeli .

This got pretty bulky, but I think it's necessary to avoid being so general as to be non-useful to implementors.

@aphillips aphillips requested a review from eemeli February 27, 2024 04:37
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our comments should be more cautious, even though in general I think we are on the right track here. We also need to describe resolved value handling carefully.

Comment on lines 227 to 235
Thus, the output of one _function_ is potentially the _operand_
of another _function_. In other words, formatting functions
compose with each other.
For example, in
```
.input {$n :number minIntegerDigits=3}
.local {$n1 :number maxFractionDigits=3}
```
the second call to `:number` composes with the first call.
Copy link
Member

@aphillips aphillips Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding the idea of "compose" is going too far. We should be very conservative here. Also, this still has the "output" of a function in play. Perhaps:

Suggested change
Thus, the output of one _function_ is potentially the _operand_
of another _function_. In other words, formatting functions
compose with each other.
For example, in
```
.input {$n :number minIntegerDigits=3}
.local {$n1 :number maxFractionDigits=3}
```
the second call to `:number` composes with the first call.
Thus, the _operand_ for one _function_ might be the resolved value
of another _function_.
Further, the _options_ for one _expression_ might affect the operation
of another.
> For example, if the value of the variable `n` were `1`:
> ```
> .input {$n :number minimumFractionDigits=1}
> .local $num = {$n :number minimumIntegerDigits=3}
> .match {$num}
> * {{Prints 001.0 for {$num}}}
> ```
> ... because the _options_ for the `.input` and `.local` are both applied to the value
> for the purposes of both formatting and selection.
> (Note that in English, fractional values match the plural rule `other`)

Also note that the .local in the original is incorrect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the indents in the above suggestion. Its example should be updated, because we should not be showing any matching (even if only with a single * variant) on a non-integral number. Then could also drop the parenthetical and irrelevant bit about other.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the example could read:

.input {$n :number minimumFractionDigits=1}
.local $num = {$n :number minimumIntegerDigits=3}
{{Prints 001.0 for {$num}}}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind using the word "compose" here, but I'm also OK with dropping it and replacing with a description of the observed behavior, like @aphillips proposed.

I would, however, edit @aphillips's suggestion slightly:

-Thus, the _operand_ for one _function_ might be the resolved value
-of another _function_.
+Thus, the resolved value of one _function_ might be the _operand_
+or an _option_ value for one another _function_.
 Further, the _options_ for one _expression_ might affect the operation of another.

This does two things:

  • The subject of the sentence is the resolved value, similar to the second sentence.
  • It's not only operands that can be resolved values of other expressions; option values can as well, as per our syntax and data model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we should not be showing any matching (even if only with a single * variant) on a non-integral number.

:number really does need to do plural matching on fractions and this isn't a problem. Your concern is, I think, about exact matching, which I do not show. I actually think the .match is important to call out precisely because the fraction bits happen to $n

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair point, I was confused. Including a one {{This is never selected}} would clarify things a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL. Actually, I had it originally, but removed it before submitting--I thought for clarity!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the resolved value of another function" doesn't really make sense here. The resolved value of a function is a thing that represents the function; the resolved value of a function applied to arguments (using more conventional declarative-language terminology) is something left unspecified, but is not a representation of a function itself. The spec doesn't really offer us the language for disambiguating the two.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, "the options for one expression might affect the operation of another" makes it sound like the language has some weird non-local side effects, which it does not. The options passed to a function affect its output (return value, etc.) and nothing else.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if replacing the original example with the one with .match clarifies the point, although more examples are generally a good thing. I wanted to show an example with two formatting functions, because that's where it might not be obvious that functions compose.

Comment on lines 237 to 275
In addition, selector functions compose with formatting functions
in the sense that a selector function's _operand_
may be the output of any formatting function.

Implementations SHOULD provide a means for formatting functions
to compose with each other
and for formatting functions to compose with selector functions.
Implementations that provide a means for defining custom functions
SHOULD provide a means for those functions to return values
that contain enough information
(e.g. the resolved _operand_ and _option_ values
that the function was called with)
to be used as inputs to subsequent function calls.
For example, an implementation in a typed programming language
MAY define an interface that custom functions implement.
Such an interface SHOULD define an implementation-specific
argument type `T` and return type `U` for custom formatting functions
such that `U` can be coerced to `T` without loss of information.
The type `U`
(or a type that `U` can be coerced to without loss of information)
SHOULD also be the input type of custom selector functions.

> [!NOTE]
> In the Tech Preview, the spec leaves the behavior of the previous
> example implementation-dependent. Supposing that
> the external input variable `n` is bound to the string `"1"`,
> and that the implementation formats to a string,
> the formatted result of the following message:
>
> ```
> .input {$n :number minIntegerDigits=3}
> .local {$n1 :number maxFractionDigits=3}
> {{$n1}}
> ```
>
> is implementation-dependent.
> Depending on whether the options are preserved across
> the two calls to `:number`, a conformant implementation
> could produce either "001.000" or "1.000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave all of this out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This is too specific.

We can really say very little (at this point) about what "resolved value" means. As far as I can tell, it means "value of a variable derived from an expression". The form of that variable is as determined by the implementation of the function.

We cannot require any particular internal structure for the RV, just how it behaves.

  1. If the RV is derived from an expression with a selection function X, it can match literal values (eg :number can match literals 0, 1, one, ...) producing a comparable value (aka relative weight).
  2. If the RV is derived from an expression with a formatting function X, it can produce a formatted string or "parts".
  3. Another function Y can use the RV as an operand, or as an option value. In these cases, it becomes clear that we want Y to be able to access information in RV. Exactly what that information is will depend on the expression that RV was derived from.

For #3, we have not delved into what the specification for outbound communication from an RV to an expression (as operand or option value) are. That is something to examine in the Tech Preview period. For example, for Stas's case, the RV might be able to supply the gender of a an RV representing a noun. I think part of the function registry needs to specify what information a variable derived from a expression using a function can supply, and (at least logically) how to access that. In some implementations, I could see having API so that $bar ={$foo :funct1 gender=$fii case=$fii} (logically) results in an internal call to $fii.get("gender") and a call to $fii.get("case").

For each function, the function registry needs to specify how that RVs deriving from that function behave in #1, #2, and #3.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For #3, we have not delved into what the specification for outbound communication from an RV to an expression (as operand or option value) are.

I think this is a good way of putting it; as it is, the spec doesn't say anything about how to go from the return value of a function (as implemented in an underlying programming language) to an expression (in MessageFormat). If there's going to be a custom function interface at all, I don't know how to not specify that.

Comment on lines 276 to 278
> Feedback from users and implementers is desired
> about whether to require one interpretation or the other
> in the spec.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I would highlight this...

Suggested change
> Feedback from users and implementers is desired
> about whether to require one interpretation or the other
> in the spec.
> [!NOTE]
> During the Technical Preview, feedback on how the registry
> describes how _functions_ inherit resolved values and _options_
> and what requirements this specification should impose
> are highly desired.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a revised version of this, but note that the word "inherit" should be completely off-limits since it connotes object-oriented inheritance, which would just confuse the issue.

Comment on lines 237 to 239
In addition, selector functions compose with formatting functions
in the sense that a selector function's _operand_
may be the output of any formatting function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mention in #686 (comment), we really should avoid "formatting function" and "selector function" as terms here, given that the resolved value of a function can theoretically be used for both.

This also applies to the next paragraph.

Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @aphillips's points above.

Copy link
Collaborator

@stasm stasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, this PR sufficiently clarifies the expectations of the resolution mechanism to be useful to implementors and to allow early adopters to experiment with function composition.

Note that there's also a mention of resolved values in lines 106-118, similar to the wording in line 223:

The form that resolved values take is implementation-dependent,
and different implementations MAY choose to perform different levels of resolution.
> For example, the resolved value of the _expression_ `{|0.40| :number style=percent}`
> could be an object such as
>
> ```
> { value: Number('0.40'),
> formatter: NumberFormat(locale, { style: 'percent' }) }
> ```
>
> Alternatively, it could be an instance of an ICU4J `FormattedNumber`,
> or some other locally appropriate value.

Comment on lines 227 to 235
Thus, the output of one _function_ is potentially the _operand_
of another _function_. In other words, formatting functions
compose with each other.
For example, in
```
.input {$n :number minIntegerDigits=3}
.local {$n1 :number maxFractionDigits=3}
```
the second call to `:number` composes with the first call.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind using the word "compose" here, but I'm also OK with dropping it and replacing with a description of the observed behavior, like @aphillips proposed.

I would, however, edit @aphillips's suggestion slightly:

-Thus, the _operand_ for one _function_ might be the resolved value
-of another _function_.
+Thus, the resolved value of one _function_ might be the _operand_
+or an _option_ value for one another _function_.
 Further, the _options_ for one _expression_ might affect the operation of another.

This does two things:

  • The subject of the sentence is the resolved value, similar to the second sentence.
  • It's not only operands that can be resolved values of other expressions; option values can as well, as per our syntax and data model.

Comment on lines 244 to 249
Implementations that provide a means for defining custom functions
SHOULD provide a means for those functions to return values
that contain enough information
(e.g. the resolved _operand_ and _option_ values
that the function was called with)
to be used as inputs to subsequent function calls.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key part and I wouldn't want to drop it, as other reviewers suggested. Furthermore, I think we need to say something somewhere about resolved values. Otherwise, @aphillips's example from his suggestion above end up being the only place in the spec where we implicitly require that resolved values carry some extra information. I'd prefer to be explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we need to avoid being too specific about how it works. "return" has a specific meaning and I don't necessarily think it's a good idea to think of this as a function call's return value.

Our language is a declarative language and we call the stuff in an expression an "annotation" for a reason. Perhaps:

Suggested change
Implementations that provide a means for defining custom functions
SHOULD provide a means for those functions to return values
that contain enough information
(e.g. the resolved _operand_ and _option_ values
that the function was called with)
to be used as inputs to subsequent function calls.
When resolving the value of an _operand_ or other variable
(such as the value of an _option)
implementations SHOULD provide interfaces so that _annotation_
applied in statements can accompany the value where appropriate.
Implementations of _functions_ SHOULD define whether they change the
value of the _operand_ in any way.
Implementations of _functions_ SHOULD define whether the value of
each _option_ is transitive or local.

I mention "statements" here because .match can be where the annotation is applied (not just .local or .input)

Some examples might help here. Here might be an example of non-transitive options (we might say that field options are non-transitive in the spec):

.input {$d :datetime weekday=short month=medium day=numeric}
.local $d1 = {$d :datetime hour=|2-digit| minute=numeric}
{{The transaction was on {$d} at {$d1}.}}

Here's a similar example:

.input {$d :datetime timeZone=|Europe/Paris|}
.local $date = {$d :datetime dateStyle=short}
.local $time = {$date :datetime timeStyle=short}
{{What does {$date} and {$time} print?}}

I think it is less surprising if $time forgets the earlier style annotation but not the time zone.

We don't currently define any functions that change the value of an operand, but we certainly might do:

.local $regular = {|Addison| :string}
.local $shouted = {$regular :transform to=uppercase}
.match {$shouted}
ADDISON {{... is selected... }}
* {{ ... }}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about "calling" and "returning".

(...) so that annotation applied in statements can accompany the value where appropriate.

This sounds OK, although I'm now not sure about the exact meaning of "the value" here. I realize that this was the whole point of @catamorphism's opening the other PR...

Btw. I think expressions would be more appropriate than statements — it's possible to annotate inside placeholders, too.

Implementations of functions SHOULD define whether they change the
value of the operand in any way.
Implementations of functions SHOULD define whether the value of
each option is transitive or local.

Maybe move this part to the note about seeking feedback? We don't really know yet what defining these constraints and requirements should look like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. I think expressions would be more appropriate than statements — it's possible to annotate inside placeholders, too.

Yes, but annotations in placeholders are terminal.

I agree about "the value"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to avoid using the word "transitive", because transitivity is a property of a mathematical relation and we haven't defined any such relations in the spec.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: this example:

.input {$d :datetime weekday=short month=medium day=numeric}
.local $d1 = {$d :datetime hour=|2-digit| minute=numeric}
{{The transaction was on {$d} at {$d1}.}}

It's not obvious to me that the options from the first :datetime call shouldn't be preserved -- should $d1 be a formatted date with the union of all the options shown in both :datetime annotations? Or just the hour and minute options and defaults for the others? To me there's no "obvious" answer, though maybe it's obvious to people who have more experience with message formatting.

It's certainly worth thinking about what the example means, but I'm not sure if it's the best example if the goal is to show something where certain options obviously shouldn't be preserved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Change the value of the operand" should also be avoided, since (this being a purely functional language), functions never change the value of their operands.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole, I understand the suggestion and think it's getting at something useful, but I don't see how it makes sense in the current framework of the spec, in which there is no data model for runtime values.

I think the first sentence ("...implementations SHOULD provide interfaces so that annotation applied in statements can accompany the value where appropriate.") is too vague, but I'm also not sure how to make it less vague. I would argue that the text in my revised commit is better because it's focused on functions, and the boundary between "inside the formatter" and "in a function implementation" is the one place where information is likely to get "lost".

I don't get the concept of "changing the value of the operand" even if I mentally replace that with "mapping the operand onto a new operand" or something like that. In your :transform example, I would understand :transform as returning something like this (if I borrow some of the machinery from #645 but simplify it for presentation):

AnnotatedFormattableValue {
  source: AnnotatedFormattableValue { source: Formattable("Addison"), value: FormattedValue("Addison") },
  formatter: "transform",
  options: { "to": "uppercase" },
  value: FormattedValue("ADDISON")
}

As with functions in other examples, it passes the same operand through (a wrapped thing ultimately representing the string "Addison"), and the transformed operand (the string "ADDISON") is the "formatted value" thing inside the structure representing the return value.

Obviously this isn't necessarily going to be the data model for runtime values, but still, it's not obvious to me that the result is just "ADDISON" with a bunch of options, rather than something that encapsulates the input "Addison", the output "ADDISON", and the options.

Finally, the sentence about defining options that are or aren't preserved (I would use "preserved" rather than "transitive" as I already said) is important, but we don't have a way in the registry to declare that info, currently.

Comment on lines 232 to 233
.input {$n :number minIntegerDigits=3}
.local {$n1 :number maxFractionDigits=3}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @aphillips noticed, the .local needs a name. Making this suggestion to make sure we don't miss it, in case the other review comment isn't committed.

Suggested change
.input {$n :number minIntegerDigits=3}
.local {$n1 :number maxFractionDigits=3}
.input {$n :number minIntegerDigits=3}
.local $x = {$n1 :number maxFractionDigits=3}

Comment on lines 260 to 261
> In the Tech Preview, the spec leaves the behavior of the previous
> example implementation-dependent. Supposing that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this PR still in scope of the Tech Preview? Or do you mean that it's implementation-dependent because of the SHOULD?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR's text is part of the Tech Preview.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make clear that the 'return type' is not the 'thing bound to a variable'. The 'thing bound to a variable' also includes options.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one of the questions, though. If the contract with function implementations is that they are responsible for returning a thing containing all the options they want to preserve, then whatever a function returns is the thing bound to a variable (modulo possible lazy evaluation). If function implementations have no such responsibility, then yes, the formatter has to do additional processing to transform the "thing returned by a function" into the "thing bound to a variable".

Comment on lines 237 to 275
In addition, selector functions compose with formatting functions
in the sense that a selector function's _operand_
may be the output of any formatting function.

Implementations SHOULD provide a means for formatting functions
to compose with each other
and for formatting functions to compose with selector functions.
Implementations that provide a means for defining custom functions
SHOULD provide a means for those functions to return values
that contain enough information
(e.g. the resolved _operand_ and _option_ values
that the function was called with)
to be used as inputs to subsequent function calls.
For example, an implementation in a typed programming language
MAY define an interface that custom functions implement.
Such an interface SHOULD define an implementation-specific
argument type `T` and return type `U` for custom formatting functions
such that `U` can be coerced to `T` without loss of information.
The type `U`
(or a type that `U` can be coerced to without loss of information)
SHOULD also be the input type of custom selector functions.

> [!NOTE]
> In the Tech Preview, the spec leaves the behavior of the previous
> example implementation-dependent. Supposing that
> the external input variable `n` is bound to the string `"1"`,
> and that the implementation formats to a string,
> the formatted result of the following message:
>
> ```
> .input {$n :number minIntegerDigits=3}
> .local {$n1 :number maxFractionDigits=3}
> {{$n1}}
> ```
>
> is implementation-dependent.
> Depending on whether the options are preserved across
> the two calls to `:number`, a conformant implementation
> could produce either "001.000" or "1.000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This is too specific.

We can really say very little (at this point) about what "resolved value" means. As far as I can tell, it means "value of a variable derived from an expression". The form of that variable is as determined by the implementation of the function.

We cannot require any particular internal structure for the RV, just how it behaves.

  1. If the RV is derived from an expression with a selection function X, it can match literal values (eg :number can match literals 0, 1, one, ...) producing a comparable value (aka relative weight).
  2. If the RV is derived from an expression with a formatting function X, it can produce a formatted string or "parts".
  3. Another function Y can use the RV as an operand, or as an option value. In these cases, it becomes clear that we want Y to be able to access information in RV. Exactly what that information is will depend on the expression that RV was derived from.

For #3, we have not delved into what the specification for outbound communication from an RV to an expression (as operand or option value) are. That is something to examine in the Tech Preview period. For example, for Stas's case, the RV might be able to supply the gender of a an RV representing a noun. I think part of the function registry needs to specify what information a variable derived from a expression using a function can supply, and (at least logically) how to access that. In some implementations, I could see having API so that $bar ={$foo :funct1 gender=$fii case=$fii} (logically) results in an internal call to $fii.get("gender") and a call to $fii.get("case").

For each function, the function registry needs to specify how that RVs deriving from that function behave in #1, #2, and #3.

Implementations SHOULD provide a means for formatting functions
to compose with each other
and for formatting functions to compose with selector functions.
Implementations that provide a means for defining custom functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the sense of this. We should also note somewhere that any two functions are not required to meaningfully compose; there is no requirement or expectation that the following make a meaningful composition:

.input {$date :datetime}
.local $person = {$date :x:personname}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this. I'm not sure if I was able to explain this in yesterday's call, but I definitely agree that there should be no requirement for any two functions to meaningfully compose, as you've nicely put it. This is why I've been using the term cooperative composition (although I'm happy to call it something else), i.e. the functions must be aware of each other's interface in order to meaningfully compose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100 although there are some different things going on here. In @macchiati's example, the types don't match. The .local declaration should emit an Invalid Expression error because $date isn't of a supported type for :x:personname (presumably).

In other cases, the types can match but the annotations might not be supported:

.input {$date :date style=long}
.local $foo = {$date :time}  // not style=long because style means "dateStyle"

For now, we should basically say something like "functions can decide what operands to support" and "functions can decide what functions or function options to support". We permit "composition" without requiring it or prohibiting it.

In the default registry we can add (in the TP period) some guidance for date/time/datetime and number functions as a guide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Comment on lines 260 to 261
> In the Tech Preview, the spec leaves the behavior of the previous
> example implementation-dependent. Supposing that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make clear that the 'return type' is not the 'thing bound to a variable'. The 'thing bound to a variable' also includes options.

Such an interface SHOULD define an implementation-specific
argument type `T` and return type `U` for custom formatting functions
argument type `T` and return type `U`
for implementations of formatting functions
such that `U` can be coerced to `T` without loss of information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the following mutating function. Do we expect to be able to extract

.local $foo {$date :extractDay calendar=georgian}
.local $fii {$foo :extractMonth calendar=georgian}

Would the above text mean that :extractDay SHOULD have a "return type" such that it preserves the month value from $date?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; to be concrete, I'll use a simplified version of the C++ implementation.

:extractDay and :extractMonths would be implemented as classes that provide a format() method. The type of that method (ignoring options and context, for simplicity) is:

FormattedPlaceholder format(FormattedPlaceholder&& argument);

A particular instance of the type FormattedPlaceholder could contain any set of options, or none.

Sorry for talking in implementation terms, but I'm not sure how else to say it, given that my goal here is to suggest that implementors not do the "obvious" thing, which would be to define something like a FormatterInput and FormatterOutput type, which are not coercible to each other, and define the interface like:

FormatterOutput format(FormatterInput&& argument);

(I would expect every implementation to have some sort of interface between the formatter and calls to custom functions (I say "custom" because built-in functions could be handled inside the formatter), so I think it's meaningful to refer to it in this note. In a unityped implementation language like JavaScript, there's less of a hazard since there's only one type, trivially guaranteeing the property stated in the note.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still in an example and must not be normative. Even if it weren't an example, it's way too specific to put in the specification.

If we want to provide guidance to users/implementers, instead of pouring stuff into the spec, we should write some user guide material.

Copy link
Member

@macchiati macchiati Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting the example be added to the spec, but rather using it to illustrate that

such that U can be coerced to T without loss of information

is not something that should be in the spec.

@catamorphism
Copy link
Collaborator Author

catamorphism commented Feb 27, 2024

In the hopes of making the discussion easier to follow, I'll summarize the feedback and how I did or didn't address it in 5803dd8:

  • Fixed the syntax of the example
  • Eliminated "compose" and "composability" per @aphillips
  • In language referring to return values, changed "function" to "function implementation" to make it clear that we're speaking in object language rather than metalanguage. I don't know how to express these properties in metalanguage. I don't think this imposes any requirements on the internal structure of resolved values in the implementation. It does make recommendations about the interface between custom functions and the formatter, but I don't see how we can avoid specifying that interface.
  • For now, I kept in the passage that @aphillips suggested dropping, but rewrote it to refer to function implementations rather than functions.
  • Generalized references to formatting/selection functions to just "functions" (where possible), per @eemeli
  • Added text saying that the result of one call can also be the option value of another call, per @stasm
  • Added text reiterating that function implementations are free to signal errors for any unhandled input, per @macchiati 's comment about "functions are not required to meaningfully compose" (unfortunately, not using "compose" means I can't say it quite so elegantly).

There's one more comment from @aphillips that I didn't address yet; will do that in another commit (edit: not a commit, but a comment instead).

spec/formatting.md Outdated Show resolved Hide resolved
Co-authored-by: Richard Gibson <[email protected]>
@catamorphism
Copy link
Collaborator Author

As a meta-comment, my objective with this PR is to help implementers not paint themselves into a corner, so while it would fit more naturally with the spec to use metalanguage (talking about annotations, expressions, resolved values, etc.) rather than object language, in the absence of the machinery that would be needed for that, I used the escape hatch of object language (e.g. "function implementations") instead.

Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is good, but I've left one "Blocking" comment below that needs to be addressed before merging this. If that is done, please don't hesitate to dismiss this review when merging, as I might not be awake at end-of-day Pacific Time.

spec/formatting.md Outdated Show resolved Hide resolved
spec/formatting.md Outdated Show resolved Hide resolved
spec/formatting.md Outdated Show resolved Hide resolved
Co-authored-by: Eemeli Aro <[email protected]>
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as contributor)

I really believe that less is more here. We're getting into the weeds of how "functions" are "called" and what their "inputs" and "outputs" are. This is imperative programming thinking.

In my opinion, we should focus on calling out "here be dragons" and following up with carefully considered text across the spec (or with user guide like material)

Comment on lines 235 to 236
the output of the first call to `:number`
is the input of the second call to `:number`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is saying outputs and inputs, which, as I noted before, isn't correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to address this in 8a5f589

spec/formatting.md Outdated Show resolved Hide resolved
spec/formatting.md Outdated Show resolved Hide resolved
Such an interface SHOULD define an implementation-specific
argument type `T` and return type `U` for custom formatting functions
argument type `T` and return type `U`
for implementations of formatting functions
such that `U` can be coerced to `T` without loss of information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still in an example and must not be normative. Even if it weren't an example, it's way too specific to put in the specification.

If we want to provide guidance to users/implementers, instead of pouring stuff into the spec, we should write some user guide material.

spec/formatting.md Outdated Show resolved Hide resolved
catamorphism and others added 2 commits February 27, 2024 12:23
...to bundle their results with a "parsed" version of their input
Co-authored-by: Eemeli Aro <[email protected]>
@catamorphism
Copy link
Collaborator Author

(as contributor)

I really believe that less is more here. We're getting into the weeds of how "functions" are "called" and what their "inputs" and "outputs" are. This is imperative programming thinking.

In my opinion, we should focus on calling out "here be dragons" and following up with carefully considered text across the spec (or with user guide like material)

I don't quite know how to specify a foreign function interface without talking about inputs and outputs (or arguments and return values). This is not specific to imperative languages; all functional languages have a concept of function application, arguments, and return values.

Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that my concerns about this PR have been sufficiently addressed.

@aphillips
Copy link
Member

(chair hat)

I have two approvals, which is what I need to merge this. I'm going to make a last call in this comment, ignoring my own review. Any objection to merging this?

spec/formatting.md Outdated Show resolved Hide resolved
spec/formatting.md Outdated Show resolved Hide resolved
catamorphism and others added 2 commits February 27, 2024 12:58
spec/formatting.md Outdated Show resolved Hide resolved
Co-authored-by: Mark Davis <[email protected]>
@@ -222,6 +222,71 @@ the following steps are taken:

The form that resolved _operand_ and _option_ values take is implementation-defined.

A _declaration_ binds the resolved value of an _expression_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not asking for any change in the tech preview, but this will definitely need to be revised afterwards.
The whole notion of a 'resolved value' is very muddy, unless it literally means "what is bound to a variable by an expression declaration", which means it also includes selection/formatting options and their values whenever those are carried over.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#645 is an attempt to address this. (But it could go even further.)

spec/formatting.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@stasm stasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the most recent updates.

I acknowledge and agree that we need to spend more time and deliberation specifying this fully for 46. That said, I do think that the current wording does a good job of giving guidance to implementors and of stating that it's a work-in-progress.

Co-authored-by: Mark Davis <[email protected]>
@eemeli eemeli requested a review from macchiati February 27, 2024 22:04
@aphillips aphillips merged commit 56fcc84 into unicode-org:main Feb 28, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Action-Item Action item assigned by the WG fast-track Non-spec editorial changes, etc. formatting normative Issue affects normative text in the specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants