Skip to content

Commit

Permalink
CLDR-16617 Additional tweak to transforms spec (#3327)
Browse files Browse the repository at this point in the history
  • Loading branch information
macchiati authored Oct 17, 2023
1 parent f581268 commit d331b75
Showing 1 changed file with 33 additions and 2 deletions.
35 changes: 33 additions & 2 deletions docs/ldml/tr35-general.md
Original file line number Diff line number Diff line change
Expand Up @@ -1966,12 +1966,43 @@ x → y | z ;
z a → w ;
```

First, "xa" is converted to "yza". Then the processing will continue from after the character "y", pick up the "za", and convert it. Had we not had the "|", the result would have been simply "yza". The '@' character can be used as filler character to place the revisiting point off the start or end of the string. Thus the following causes x to be replaced, and the cursor to be backed up by two characters.
First, "xa" is converted to "yza". Then the processing will continue from after the character "y", pick up the "za", and convert it. Had we not had the "|", the result would have been simply "yza".

The '@' character can be used as filler character to place the revisiting point off the start or end of the string — but only within the context. Consider the following rules, with the table afterwards showing how they work.

```
1. [a-z]{x > |@ab ;
2. ab > J;
3. ca > M;
```
The ⸠ indicates the virtual cursor:

| Current text | Matching rule |
| - | - |
| ⸠cx | no match, cursor advances one code point |
| c⸠x | matches rule 1, so the text is replaced and cursor backs up. |
| ⸠cab | matches rule 3, so the text is replaced, with cursor at the end. |
| Mb⸠ | cursor is at the end, so we are done. |

Notice that rule 2 did not have a chance to trigger.

There is a current restriction that @ cannot back up before the before_context or after the after_context.
Consider the rules if rule 1 is adjusted to have no before_context.

```
x → |@@y;
1'. x > |@ab ;
2. ab > J ;
3. ca > M;
```

In that case, the results are different.
| Current text | Matching rule |
| - | - |
| ⸠cx | no match, cursor advances one code point |
| c⸠x | matches rule 1, so the text is replaced and cursor backs up; but only to where |
| c⸠ab | matches **rule 2**, so the text is replaced, with cursor at the end. |
| cJ⸠ | cursor is at the end, so we are done. |

#### <a name="Example" href="#Example">Example</a>

The following shows how these features are combined together in the Transliterator "Any-Publishing". This transform converts the ASCII typewriter conventions into text more suitable for desktop publishing (in English). It turns straight quotation marks or UNIX style quotation marks into curly quotation marks, fixes multiple spaces, and converts double-hyphens into a dash.
Expand Down

0 comments on commit d331b75

Please sign in to comment.