Disagreements in token count are not handled correctly #29

JakeWharton · 2023-01-14T05:02:16Z

res/values/strings.xml:

<resources>
  <string name="library_text_argument">{name}</string>
</resources>

res/values-es/strings.xml:

<resources>
  <string name="library_text_argument">{name} {age}</string>
</resources>

They also could disagree on names:
res/values/strings.xml:

<resources>
  <string name="library_text_argument">{name}</string>
</resources>

res/values-es/strings.xml:

<resources>
  <string name="library_text_argument">{age}</string>
</resources>

Currently it seems like only tokens from the main configuration are parsed. Or maybe they're all parsed but it's first/last wins.

Do we fail and enforce all configurations to agree on token count and token names? Or,
Do we allow configurations to contribute different token names?

The text was updated successfully, but these errors were encountered:

JakeWharton · 2023-01-14T05:04:45Z

It's also worth considering how this might happen.

I may add a token to my values/ and forget to update translations. Should we fail?
In language X I need to use two tokens whereas in all other languages only one suffices. Is that allowed?

Both of these are at odds, unless we use something like XML attributes to inform behavior. We did this in Android's view binding for very exceptional cases.

The other case is that you remove a token from values/ but forget to update translations which keep requiring N+1. Do we catch this for you? Or is that just a valid state?

JakeWharton · 2023-01-14T05:12:25Z

My preference here is to fail fast and have an XML attribute in the default configuration to define the full set of supported names.

So both examples above fail the build unless you add tools:ginghamNames="name age" to the res/values/strings.xml entry.

Not sure how translation services will handle this... I assume since the res/values/strings.xml is usually the input it will be fine. Only the other translations are output?

The presence of the attribute on any non-default configurations is an error, although some tool might copy this over, so maybe if the value matches we allow it. But disagreements in the XML attribute value always fail.

theisenp · 2023-01-19T18:59:16Z

Great question. I can imagine projects with different translation pipelines wanting to treat this case differently.

First, from the Cash perspective.

I may add a token to my values/ and forget to update translations. Should we fail?

We're planning to run a cron job that updates translations periodically (say every 12 hours). Developers never make changes to other languages - those string files are generated by the tooling.

So there will be several cases where strings are temporarily out of sync on master:

Adding Strings
- You commit a new English string.
- It takes 4 days to translate the new string to some language, say French.
- If you use a master build in French in the meantime, you see the string in English.
Deleting Strings
- You delete an English string.
- In 12 hours the cron job runs and deletes all the translations.
- In the meantime a master build contains orphaned translations.
Updating Strings
- You change an existing English string.
- In 12 hours the cron job runs and deletes all the translations (since they were associated with an old version).
- It takes 4 days to translate the updated string to some language, say French.
- If you use a master build in French in the meantime, you see the string in English.
Adding Languages
- You add support for a new language, say German.
- It takes a week to get the whole app translated to German.
- New translations are added every 12 hours by the cron job as they roll in.
- If you use a master build in German throughout the week, you see some things in English.

All of that is fine. We'll have some extra validation that we run on release builds to make sure that we have a full set of up-to-date translations, but we don't block developers from merging to master.

So for us, I think that mismatched argument counts/types/names is not a huge deal (at least on master). We can treat the English strings as the source of truth, and there may be some period of time where other languages lag behind, but in most cases it's no worse than any other string change (you might see a placeholder, or an argument that you pass might not be used).

I do think that there will be some edge cases where changing a type leads to a crash when using other languages on a master build, but:

I expect it to be exceptionally rare (ballpark estimate, once a year or less)
It would automatically be resolved by the tooling within 12 hours
It's a flaw of the underlying translation pipeline, the plugin isn't doing anything to exacerbate it

So for Cash I'm fine with emitting a warning, but would prefer not to fail.

In language X I need to use two tokens whereas in all other languages only one suffices. Is that allowed?

I think not, I would consider that a mistranslation.

It might happen in the short term on a master build after updating a string, but I'm not aware of any long term situation where it would be valid.

theisenp · 2023-01-19T19:00:51Z

Other projects will have different translation pipelines though, and may want to be stricter. If they update strings synchronously across languages they might want to fail if the tokens don't match.

Maybe we could offer a configuration option that enables/disables strict argument checking? Cash could enable it only for release builds but other projects could decide for themselves?

JakeWharton · 2023-01-25T17:08:39Z

Adding Strings

👍

This should be allowed and should get a generated code binding.

Deleting Strings

👍

I don't think it's an unreasonable policy for this tool to say that you must delete all translations of a string if you delete the string.

However, regardless of whether I think you should do this or not, we don't have to enforce it. But then I think these orphaned strings do not get a generated code binding.

Updating Strings

There are two phases to this in your list of steps. The second phase is the same as the "Adding Strings" case above. The first phase is between these two steps:

You change an existing English string.

In 12 hours the cron job runs and deletes all the translations (since they were associated with an old version).

There are at least three okay forms of this phase:

You rearrange existing tokens and text. Eventually the translations will be removed or updated. This is fine.
You add a new token the set of all tokens. You'll have to pass a new argument to the generated binding. Eventually the translations will be removed or updated. This is fine.
You refine a token type to be more specific than it was (e.g., {whatever} -> {whatever,number}). You'll have to update code which calls into the generated binding. The untyped/more-loosely-typed version still works in translations. Eventually the translations will be removed or updated. This is fine.

However, there are at least three Very Bad™ forms of this phase:

You remove a token.

What does the tool do?

We could only bind the tokens from the "values" string? I don't actually know if this even works or not. But assuming it does, it means those not using automated tools could leave dead tokens in their translations until it's caught by a user.

We could bind the superset of tokens from all configurations. When our automated tool runs it means the token will finally disappear. That will change the generated binding and break the build preventing the PR from landing.

I think the default behavior should be to fail when translations disagree with the default configuration. This would require that you update/remove all translations when removing a token.

If Android does not crash on unbound tokens we could add a configuration option to opt-out of this behavior. This would allow our builds on master to wait for the translations to update. Builds on release branches would NOT opt-out.
You widen a token type (e.g., {whatever,select} -> {whatever}).

What does the tool do?

We could not change the type argument and keep it as Int until all of the translations update. After that happens it would widen to Any and your code would still compile. You would have to remember to go back and fix your code but who knows how long that would take. This isn't what a developer would want.

We could change the type argument immediately to Any, but what happens to the translations where it's still a select? Hopefully it crashes, but I haven't tested.
You change a token type incompatibly (e.g., {whatever,date} -> {whatever,select}).

What does the tool do?

This is basically a worse case of the previous one.

Adding Languages

👍 This is basically an extreme form of "Adding Strings".

So for Cash I'm fine with emitting a warning, but would prefer not to fail.

For Cash we can be lenient where possible. In general I would prefer the tool to be as strict as possible and provide opt-out configurations where it's possible to provide behavior that is not perfect.

We'll have some extra validation that we run on release builds to make sure that we have a full set of up-to-date translations, but we don't block developers from merging to master.

If we have the tool be strict by default, it will also serve as validation that someone could rely on if they don't have a separate translation service and tooling. And if the leniency is configurable, it ensures we only allow our developer builds to be the ones which opt-out of any kind of strict validation.

theisenp · 2023-01-30T20:51:45Z

I don't think it's an unreasonable policy for this tool to say that you must delete all translations of a string if you delete the string.

Yeah that's fair. We're going to have a separate plugin for managing translations and I think we could do something there to make this relatively painless.

However, regardless of whether I think you should do this or not, we don't have to enforce it. But then I think these orphaned strings do not get a generated code binding.

Agreed

I think the default behavior should be to fail when translations disagree with the default configuration. This would require that you update/remove all translations when removing a token.

I'm comfortable with that. Later on we can decide internally if we'd rather:

Keep it strict and then have tooling that makes it easy for developers to resolve conflicts (probably just running the same script as the cron job)
Add an opt-out that we use for dev builds.

JakeWharton mentioned this issue Jan 14, 2023

Disagreements in token type are not handled correctly #30

Open

JakeWharton mentioned this issue Jan 25, 2023

Introduce a more comprehensive resource processing pipeline #52

Merged

JakeWharton added the bug Something isn't working label Feb 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disagreements in token count are not handled correctly #29

Disagreements in token count are not handled correctly #29

JakeWharton commented Jan 14, 2023

JakeWharton commented Jan 14, 2023

JakeWharton commented Jan 14, 2023

theisenp commented Jan 19, 2023 •

edited

Loading

theisenp commented Jan 19, 2023

JakeWharton commented Jan 25, 2023

theisenp commented Jan 30, 2023

Disagreements in token count are not handled correctly #29

Disagreements in token count are not handled correctly #29

Comments

JakeWharton commented Jan 14, 2023

JakeWharton commented Jan 14, 2023

JakeWharton commented Jan 14, 2023

theisenp commented Jan 19, 2023 • edited Loading

theisenp commented Jan 19, 2023

JakeWharton commented Jan 25, 2023

theisenp commented Jan 30, 2023

theisenp commented Jan 19, 2023 •

edited

Loading