Data Model Q&A #181
Replies: 10 comments 8 replies
-
As discussed at the last WG meeting, some disagreement exists regarding the expressive powers of the candidate data models. My assertion is that the model proposed by @zbraniecki and myself is able to represent all messages, while the model proposed by @echeran and @mihnita is not. To demonstrate that, please find below a few messages that I assert are not directly expressible in the more limited data model. I have attempted to prune down these examples to their simplest form, and as such they could certainly be individually expressed differently to achieve their specific goals. However, my contention is that the specific message and variable shapes used here cannot be used in the "EM" data model, and would require at least one of the following:
For syntax, the examples use a Fluent-ish notation that's hopefully clear enough about its intent. A few formatting functions are used:
Apologies for the delay in getting this out, as I'm aware that our next "extended" meeting is already on this coming Monday. If it's of use, I'd be happy to also implement these with the initial implementation of the "EZ" model that's available here: https://github.com/messageformat/messageformat/tree/mf2/packages/messageformat Presuming the following context values:
each of
Presuming the following context values:
each of
|
Beta Was this translation helpful? Give feedback.
-
First case does not include anything "tree-like", and (most) of it is as easy to represent in the EM model as in the EZ model. The one difference between passing arguments to functions is the the EZ model relies on the order of parameters.
In the EM model all parameters are named. Both examples show the kind of bad practices that the EM model tries to prevent. Freely mixing messages everywhere will break things. In the first case the year (localizable) is used as input for the What will Second example makes BUT, as an answer to "EM can't do it": you can ABSOLUTELY do whatever you want with a bit of coding. I can write and register an Similar for But you can write and register an And that can explode or succeed the same way the default one does in the EZ model. Heck, one can write and register a But you have to explicitly write and register and use that kind of unsafe functionality. One can argue: but Mihai, True... until you try to implement it. Since variables are not available by name at runtime in most programming languages (not even in JS, because of obfuscation / minimization) you still end up with a map of variables with string keys containing the pre-compilation variable names. That is what the EZ rust implementation does. Variable and function references are in fact strings there too, used to lookup the real in thing in tables. I also fine and open to create explicit That is not the core problem. The problem is the free "mix and match" tree-like things in a path. Why would I allow BY DESIGN a message reference (the message translated, possibly containing selectors, placeholders, what not) as part of a path to a variable, or another message? Any developer who ever had to deal with translations knows better than to put non-translatable stuff in localization files, and then directly use the result at runtime to reference other things. Question, to settle the "it can't be done" argument: should I write, register, and use those 2 unsafe functions to show it can be done? Or my description above is enough? They would to the same unsafe operations that If the argument is: but writing these functions is ugly, and dangerous, basically a dangerous EVAL? The only difference is that the EZ model has it available by default, you don't need to work extra to shoot yourself in the foot. |
Beta Was this translation helpful? Give feedback.
-
Answer to 1 The "f:", "m:" "$" are not documented anywhere, and they are opaque. A company might decide that this is a company wide convention, add some helper functions to make that easier. They are not first class citizens, because I think they are bad ideas. With maybe limited and rare utility, but with the dangers outweighing the usefulness. But! If in times they prove to be good ideas, we register a function that can eval that kind of string, and it becomes standard. For a simple, cleaner approach I would define different parameters to (Stas' idea) And since these are developer written functions that we can't control, then yes, they can do Answer to 2 I don't think I understand this. Answer to 3 Since one gets to write custom functions, they can write functions that can take anything they like:
or
or
I don't think it is inherently wrong to have the "path" be a string instead of an array of items. It means that one can put in there a string-eval kind of thing (and many do): If for example 10 years from now the community decides that a well defined "eval" kind of functionality is needed, then all you need to do is register a new function in the central registry, properly documented, and it becomes as standard as the stuff that was there in day one. With the EZ model we have to say: |
Beta Was this translation helpful? Give feedback.
-
Backward compat There is something that I think I didn't wrap my head around properly during the meeting. There is always a risk to break backward compat with non standard functions in both models. Company A can define the Now company A buys company B. Conflict, problems. Same in the html world, I can write a custom component, call it It happens all the time when you allow developers the flexibility to extend standard: PUA in Unicode, BCP 47 We can easily say: all functions that start with |
Beta Was this translation helpful? Give feedback.
-
Thoughts about string / paths / trees Don't take this as an argument pro / cons. The only "natural" place to have a path-like identifier (to access a tree) would be for messages. The EZ model only has array of Parts in paths for message ref and variable ref, but not function refs. Programming languages don't have anything like this for variables except namespaces, but those apply to variables AND functions. When implementing MF2 the variable names (as they are in the source) are not available, so the dev has to put them in some kind of map anyway. Bot the EM and EZ implementations use a flat map, with the key a string. And by "flattening" we add another problem with the EZ model: Using something other than "/" does not help: I can put in my plain text Part the string used for separator when flattening, whatever that is (as long as the key of the runtime map of vars is a string) Unless we say "no, when you flatten you must escape the separator", so ["foo", "bar", "baz"] And since one of the parts is a message I can in theory have anything in there: Unicode characters and what not. And they become part of the "path to a variable". So some escaping might actually be required anyway, if we accept messages as part of a path. But now we made this so much more clunky... and "fully resolved" (array of parts) is actually clunkier than a string. TLDR: There is no real benefit of a "array of things" as path in references.
There is no current file format designed for localization that supports nested structures to store messages. JSON/ XML / YAML don't count, they are not designed for localization, they were designed for something else, and people thought "hey, wouldn't that be nice? I already know how to parse this". But major platforms lived with flat messages for many-many years, and nobody had a problem. So even messages "can live without" (maybe with some "fake namespaces", like "." or "/" in the ID). Look at gettext, MacOS strings, Windows .rc files, Java and .NET properties, Android strings.xml, etc. Even with a natural container for messages (json / xml) the problems are bigger with trees: if I "merge" the strings for my app with the strings from 10 libraries, it is easier to merge a "flat catalog of messages" than 10 full trees.
We know the implications, because a lot of people / companies, did it in the last 40-50 years, bot in localization, and in programming language design. So for me this was YAGNI ("You aren't gonna need it"). And if we need it, we can easily "fake" it. |
Beta Was this translation helpful? Give feedback.
-
I'd like to share a realization that helped me understand the crux of the discussion (or so I think). We set out to test if there are messages that can't be represented in any of the proposed data models. We didn't define represent in the data model, however. As it turned out, the EZ model leans towards expressing complexity in the data model itself (through nested, composable AST nodes), while the EM model actively avoids it by moving the complexity into the implementation (i.e. the execution model). I hypothesize that this allows both models to express the same set of messages. |
Beta Was this translation helpful? Give feedback.
-
Coming back to the data model, here's how I imagine dynamic message references expressed in EZ and EM: EZ
EM
|
Beta Was this translation helpful? Give feedback.
-
At today's meeting, @mihnita made a significant clarification regarding the EM model, as the function reference With this approach, this would be a possible EM model representation of the first three messages I posted above: {
msg_start: { id: 'msg_start', locale: 'en', parts: ['1970'] },
msg_1: {
id: 'msg_1',
locale: 'en',
parts: [
'Years ',
{
name: ['msg', 'var', 'range'],
formatter_name: 'DATETIME_RANGE',
options: {
msg_ref: 'msg_start',
msg_target: 'start',
var_ref: 'var_end',
var_target: 'end',
skeleton: 'y'
}
}
]
},
msg_2: {
id: 'msg_2',
locale: 'en',
parts: [
'Years ',
{
name: ['var', 'var', 'range'],
formatter_name: 'DATETIME_RANGE',
options: {
'0_var_ref': 'var_start',
'0_var_target': 'start',
'1_var_ref': 'var_end',
'1_var_target': 'end',
skeleton: 'y'
}
}
]
},
msg_3: {
id: 'msg_3',
locale: 'en',
parts: [
'Years ',
{
name: ['var', 'func', 'range'],
formatter_name: 'DATETIME_RANGE',
options: {
var_ref: 'var_name',
var_target: 'start',
func_name: 'NOW',
func_target: 'end',
skeleton: 'y'
}
}
]
}
} Taking
With this, I'm satisfied that the two models are indeed equally powerful in their abilities to represent messages. There are still strict differences between the models, but at least we should be able to represent all possible messages in either data model. Any message expressed in one data model can be mapped to the other. This also means that we do not need to consider e.g. the XLIFF representation of MF2 messages or the translator's view of them when deciding between data models, as any representation achievable with one model may also be used with the other. The same goes for the syntax. So all we're really left with as differentiating factors are 1) elegance and 2) the concerns of a programmer: How can we establish guarantees and confidence in correctness while writing code and during execution. Here, the two models do present different abilities and requirements, which we ought to consider in more depth. |
Beta Was this translation helpful? Give feedback.
-
To help with comparisons between the models, I added a page to the wiki: Data & Execution Model Differences As the title suggests, that expands a bit the scope from just the data model to also include the requirements each proposal puts on the execution or runtime behaviour. In particular, formatting functions are treated rather differently by the two proposals. I invite anyone interested to add or amend the contents of the page, of course. |
Beta Was this translation helpful? Give feedback.
-
Q&A about Data Model
Related documentation & issues :
MF2 data model questions @eemeli
Strawman Proposal for an XLIFF 2 MessageFormat Module
MF2 Models: the big question(s). @mihnita
#140 #141 #139 #178
Code :
Experiments by @mihnita & @echeran
Please add all references for documentation and issues here as a comment, and will update
Beta Was this translation helpful? Give feedback.
All reactions