Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We are not using Java 1.4 anymore. #950

Merged
merged 42 commits into from
Oct 17, 2024
Merged
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
e443707
Remap rules, but I don’t want to restructure LineBreakTest et al.…
eggrobin Oct 9, 2024
c455a8a
Remove SegmenterCldr.txt
eggrobin Oct 9, 2024
71b6c88
Drop comments in the default
eggrobin Oct 9, 2024
2db096a
A hopeful syntax
eggrobin Oct 9, 2024
06288cc
This should still be more tractable than the alternative
eggrobin Oct 9, 2024
42b9c8c
Life doesn’t commute
eggrobin Oct 9, 2024
058e4df
Don’t generate it
eggrobin Oct 10, 2024
3090838
Don’t generate it
eggrobin Oct 10, 2024
2fc7d47
Merge branch '🚫🦭🦌' into remap-redux
eggrobin Oct 10, 2024
cab8cf1
Merge branch '🚫🦭🦌' into remap
eggrobin Oct 10, 2024
2060099
Really don’t generate it
eggrobin Oct 10, 2024
f7510a8
Merge branch '🚫🦭🦌' into remap
eggrobin Oct 10, 2024
7f66e09
Don’t pretend we can generate it
eggrobin Oct 10, 2024
596f006
Merge branch '🚫🦭🦌' into remap
eggrobin Oct 10, 2024
40f296e
If we cannot deal with lengthening we cannot deal with shortening
eggrobin Oct 10, 2024
e0d34b3
meow
eggrobin Oct 10, 2024
9bc951b
Merge branch 'remap' of https://github.com/eggrobin/unicodetools into…
eggrobin Oct 10, 2024
7015c68
abstract
eggrobin Oct 10, 2024
16c162d
Could it be working???
eggrobin Oct 11, 2024
db4ae3c
first cut of parsing remap rules
eggrobin Oct 11, 2024
22e4da9
Merge branch 'remap' into remap-redux
eggrobin Oct 11, 2024
21bd40a
Now it parses correctly but does not work. Progress!
eggrobin Oct 11, 2024
8d6b659
Merge branch 'remap' into remap-redux
eggrobin Oct 11, 2024
485354d
Much better
eggrobin Oct 11, 2024
d4b398d
Merge branch 'remap' into remap-redux
eggrobin Oct 11, 2024
7852352
*, which apparently makes it work
eggrobin Oct 11, 2024
419dea4
Regenerate
eggrobin Oct 11, 2024
f7c5fc1
Merge remote-tracking branch 'la-vache/main' into remap
eggrobin Oct 14, 2024
c81086e
Merge branch 'remap' into remap-redux
eggrobin Oct 14, 2024
b4229ef
spotless
eggrobin Oct 14, 2024
911069e
Merge branch 'remap' into remap-redux
eggrobin Oct 14, 2024
f044280
Update documentation
eggrobin Oct 14, 2024
8318bcb
also the code that generates it
eggrobin Oct 14, 2024
899beec
X
eggrobin Oct 14, 2024
c90b12e
meow
eggrobin Oct 14, 2024
0704fa1
loose ends
eggrobin Oct 14, 2024
2e716a8
spotless
eggrobin Oct 14, 2024
966fb32
Welcome to the year 2004
eggrobin Oct 14, 2024
98746ec
Merge remote-tracking branch 'la-vache/main' into remap-redux
eggrobin Oct 15, 2024
9df040d
Merge branch 'remap-redux' into no-java-1.4
eggrobin Oct 17, 2024
2ca9241
Merge remote-tracking branch 'la-vache/main' into remap-redux
eggrobin Oct 17, 2024
783c447
Merge branch 'remap-redux' into no-java-1.4
eggrobin Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 12 additions & 21 deletions unicodetools/src/main/java/org/unicode/tools/Segmenter.java
Original file line number Diff line number Diff line change
Expand Up @@ -393,8 +393,8 @@ public RegexRule(String before, Breaks result, String after, String line) {
before = ".*(" + before + ")";
String parsing = null;
try {
matchPrevious = Pattern.compile(parsing = before, REGEX_FLAGS).matcher("");
matchSucceeding = Pattern.compile(parsing = after, REGEX_FLAGS).matcher("");
this.before = Pattern.compile(parsing = before, REGEX_FLAGS);
this.after = Pattern.compile(parsing = after, REGEX_FLAGS);
} catch (PatternSyntaxException e) {
// Format: Unclosed character class near index 927
int index = e.getIndex();
Expand Down Expand Up @@ -440,8 +440,12 @@ public Breaks applyAt(
CharSequence remappedString,
Integer[] indexInRemapped,
Consumer<CharSequence> remap) {
if (matchAfter(matchSucceeding, remappedString, indexInRemapped[position])
&& matchBefore(matchPrevious, remappedString, indexInRemapped[position])) {
if (after.matcher(remappedString)
.region(indexInRemapped[position], remappedString.length())
.lookingAt()
&& before.matcher(remappedString)
.region(0, indexInRemapped[position])
.matches()) {
return breaks;
}
return Breaks.UNKNOWN_BREAK;
Expand All @@ -455,29 +459,16 @@ public String toString(boolean showResolved) {
}

// ============== Internals ================
// in Java 5, this can be more efficient, and use a single regex
// of the form "(?<= before) after". MUST then have transparent bounds
private Matcher matchPrevious;
private Matcher matchSucceeding;
// We cannot use a single regex of the form "(?<= before) after" because
// (RI RI)* RI × RI would require unbounded lookbehind.
private Pattern before;
private Pattern after;
private String name;

private String resolved;
private Breaks breaks;
}

/** utility, since we are using Java 1.4 */
static boolean matchAfter(Matcher matcher, CharSequence text, int position) {
return matcher.reset(text.subSequence(position, text.length())).lookingAt();
}

/**
* utility, since we are using Java 1.4 depends on the pattern having been built with .* not
* very efficient, works for testing and the best we can do.
*/
static boolean matchBefore(Matcher matcher, CharSequence text, int position) {
return matcher.reset(text.subSequence(0, position)).matches();
}

/** Separate the builder for clarity */

/** Sort the longest strings first. Used for variable lists. */
Expand Down
Loading