Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace go-runewidth with uniseg #71

Closed

Conversation

mikelorant
Copy link

Replace the use of RuneWidth and StringWidth from mattn/go- runewidth with equivalent functions from rivo/uniseg.

It is important to be aware that using RuneWidth will not be accurate as the width of a rune cannot be determined in isolation. This requires a shift to thinking about grapheme clusters instead.

Unfortunately due to the complexity of identifying grapheme clusters, there has been some signifcant performance regressions in two functions:

  • PrintableRuneWidth: 10x slower
  • TruncateString: 4x slower

Two other functions have had performance improvements:

  • MarginString: 2x faster
  • PaddingString: 2x faster

The documentation for rivo/uniseg mentions the use of Step and StepString performing "orders of magnitude faster" than using the NewGraphemes method. However, implementing these changes only resulted in a 10% performance increase.

@mikelorant
Copy link
Author

@rivo Would you be able to help provide some guidance on how to boost the performance of these two regressed functions?

I switched the truncate package to Step as follows:

diff --git a/truncate/truncate.go b/truncate/truncate.go
index 6905a6c..4a4558f 100644
--- a/truncate/truncate.go
+++ b/truncate/truncate.go
@@ -79,20 +79,23 @@ func (w *Writer) Write(b []byte) (int, error) {
    w.width -= uint(tw)
    var curWidth uint

-   gr := uniseg.NewGraphemes(string(b))
-   for gr.Next() {
-       rs := gr.Runes()
+   by := b
+   state := -1
+   var c []byte
+   for len(by) > 0 {
+       var boundaries int
+       c, by, boundaries, state = uniseg.Step(by, state)

        switch {
-       case len(rs) == 1 && rs[0] == ansi.Marker:
+       case len(c) == 1 && rune(c[0]) == ansi.Marker:
            // ANSI escape sequence
            w.ansi = true
-       case len(rs) == 1 && w.ansi && ansi.IsTerminator(rs[0]):
+       case len(c) == 1 && w.ansi && ansi.IsTerminator(rune(c[0])):
            // ANSI sequence terminated
            w.ansi = false
        case w.ansi:
        default:
-           curWidth += uint(gr.Width())
+           curWidth += uint(boundaries>>uniseg.ShiftWidth)
        }

        if curWidth > w.width {
@@ -103,7 +106,7 @@ func (w *Writer) Write(b []byte) (int, error) {
            return n, err
        }

-       _, err := w.ansiWriter.Write([]byte(gr.Str()))
+       _, err := w.ansiWriter.Write(c)
        if err != nil {
            return 0, err
        }

Is this a mistake in my refactor or is it a case that identifying grapheme clusters just requires more CPU cycles?

@rivo
Copy link

rivo commented Jan 27, 2024

Would you be able to help provide some guidance on how to boost the performance of these two regressed functions?

(I only see one function here, Write.)

I'm not quite sure what you're asking. Your patch looks like you switched from using the Graphemes class to the Step function. In the past, the benchmarks suggested that this would vastly improve performance. Until someone pointed out a bug in the benchmark... Now they are about the same. (I will adjust the documentation.)

Under the hood, the Graphemes class uses the Step function (actually the StepString function) and I was under the impression that the overhead of the Graphemes class will result in a performance penalty. That does not seem to be the case, however.

So using Step is not significantly slower than Graphemes, it's almost the same. I'm surprised that you're seeing major differences here. Maybe you can post the original code and the new code, both in a way I can run them without any other dependencies. Then I can check what the issue is.

Anyway, Step doesn't just calculate graphemes and widths, it also calculates word boundaries, sentence boundaries, and line breaks. It seems to me that you're not interested in all of that. I would suggest that you use FirstGraphemeCluster instead. This will only determine grapheme clusters. And it will produce a width a the same time. The benchmarks show that this is about three times faster than using Step.

It should be quite simple to replace Step with FirstGraphemeCluster. Here is an example. FirstGraphemeClusterInString also works fine, if that's more convenient for you. (It's also slightly faster than FirstGraphemeCluster.)

I haven't run comparison benchmarks with go-runewidth's functions so I can't comment on that. From a quick glance, they also have to look up properties in tables (also using binary search). The overhead in uniseg shouldn't be so big.

But again, if you're really just interested in the width and don't care about other types of boundaries, don't use the Graphemes class or the Step function, use FirstGraphemeCluster or FirstGraphemeClusterInString if you need to manually step through a string.

@rivo
Copy link

rivo commented Jan 27, 2024

See also my follow-up comment here: charmbracelet/bubbletea#831 (comment)

@mikelorant
Copy link
Author

@rivo As usual, thanks again for the very detailed explanation as well as recommendations.

A quick clarification; while I only provided the patch for a refactor of the Write function, I was focused on the benchmarks for PrintableRuneWidth and TruncateString. These were the two functions that were slower. TruncateString was relying on the Write function and that seemed easiest to switch over to using StepString.

Benchmarks

I have taken the opportunity to rerun all the benchmarks in this package to determine how the changes have impacted them.

master

Benchmark_PrintableRuneWidth-4     47716898          26.84 ns/op         0 B/op         0 allocs/op
BenchmarkWriter_Write-4             3094910          331.6 ns/op        48 B/op         1 allocs/op
BenchmarkDedent-4                   8797719          157.9 ns/op        87 B/op         1 allocs/op
BenchmarkIndentString-4             4818718          331.9 ns/op       295 B/op         4 allocs/op
BenchmarkMarginString-4              601867           2054 ns/op      1563 B/op        35 allocs/op
BenchmarkPaddingString-4             750488           1339 ns/op      1007 B/op        23 allocs/op
BenchmarkTruncateString-4           6223754          247.8 ns/op       291 B/op         3 allocs/op

refactor/go-runewidth-to-uniseg

Benchmark_PrintableRuneWidth-4      4120680         264.7 ns/op        23 B/op         1 allocs/op
BenchmarkWriter_Write-4             4554146         291.1 ns/op        48 B/op         1 allocs/op
BenchmarkDedent-4                   8991362         143.2 ns/op        87 B/op         1 allocs/op
BenchmarkIndentString-4             5173473         238.7 ns/op       295 B/op         4 allocs/op
BenchmarkMarginString-4             1192938         923.9 ns/op       763 B/op        11 allocs/op
BenchmarkPaddingString-4            2209376         550.5 ns/op       406 B/op         5 allocs/op
BenchmarkTruncateString-4           1531975         790.8 ns/op       295 B/op         4 allocs/op

refactor/go-runewidth-to-uniseg-go-1.21-uniseg-0.4.6

Benchmark_PrintableRuneWidth-4      6293344         174.3 ns/op        23 B/op         1 allocs/op
BenchmarkWriter_Write-4             4174240         282.5 ns/op        48 B/op         1 allocs/op
BenchmarkDedent-4                   8070469         163.0 ns/op        87 B/op         1 allocs/op
BenchmarkIndentString-4             4410526         255.4 ns/op       295 B/op         4 allocs/op
BenchmarkMarginString-4             1293753         838.6 ns/op       763 B/op        11 allocs/op
BenchmarkPaddingString-4            2722042         438.5 ns/op       407 B/op         5 allocs/op
BenchmarkTruncateString-4           2388880         500.8 ns/op       295 B/op         4 allocs/op

Performance

Comparing the results from before the switch to uniseg to after with the most recent performance improvements.

master -> refactor/go-runewidth-to-uniseg-go-1.21-uniseg-0.4.6

Benchmark Speed
PrintableRuneWidth 6.5x slower
Writer_Write 15% faster
Dedent Same
IndentString 25% faster
MarginString 60% faster
PaddingString 65% faster
TruncateString 2x slower

Considering that PrintableRuneWidth is a poorly named function, we should update the documentation to clearly state the problems and why performance has regressed. A new function should then be added called either Width or PrintableWidth that does the same thing.

The only major concern is the performance issues with TruncateString. I plan to look into this further and see what might be improvable.

@mikelorant mikelorant force-pushed the refactor/go-runewidth-to-uniseg branch 3 times, most recently from 484ab73 to fa505b9 Compare January 28, 2024 07:27
@mikelorant
Copy link
Author

@rivo Here are the updated benchmarks for the truncate function.

BenchmarkTruncateString-4-NewGraphemes          2514505         499.0 ns/op       279 B/op         3 allocs/op
BenchmarkTruncateString-4-FirstGraphemeCluster  5609870         211.8 ns/op       275 B/op         2 allocs/op

This has improved the truncate function to now perform faster than the original version 😄

@mikelorant mikelorant force-pushed the refactor/go-runewidth-to-uniseg branch 2 times, most recently from a5a63a6 to 26d85b8 Compare January 29, 2024 08:39
@mikelorant
Copy link
Author

Benchmarks

Before

Benchmark_PrintableRuneWidth-4     51635997          23.67 ns/op         0 B/op         0 allocs/op
BenchmarkWriter_Write-4             4415070          264.4 ns/op        48 B/op         1 allocs/op
BenchmarkDedent-4                   8735187          141.7 ns/op        87 B/op         1 allocs/op
BenchmarkIndentString-4             5119086          249.7 ns/op       295 B/op         4 allocs/op
BenchmarkMarginString-4              612964           1645 ns/op      1563 B/op        35 allocs/op
BenchmarkPaddingString-4            1045546           1150 ns/op      1007 B/op        23 allocs/op
BenchmarkTruncateString-4           6143715          188.8 ns/op       291 B/op         3 allocs/op

After

Benchmark_PrintableRuneWidth-4      9631588          122.1 ns/op        95 B/op         0 allocs/op
BenchmarkWriter_Write-4             4528722          260.0 ns/op        48 B/op         1 allocs/op
BenchmarkDedent-4                   8445904          139.2 ns/op        87 B/op         1 allocs/op
BenchmarkIndentString-4             5190386          244.7 ns/op       295 B/op         4 allocs/op
BenchmarkMarginString-4             1612345          746.7 ns/op       763 B/op        11 allocs/op
BenchmarkPaddingString-4            2926327          444.6 ns/op       407 B/op         5 allocs/op
BenchmarkTruncateString-4           5670631          210.9 ns/op       275 B/op         2 allocs/op

Performance

Benchmark Speed
PrintableRuneWidth 400% slower
Writer_Write Same
Dedent Same
IndentString Same
MarginString 50% faster
PaddingString 60% faster
TruncateString 10% faster

Outcome

All functions except PrintableRuneWidth have either stayed the same or increased in performance. PrintableRuneWidth now provides accurate results whereas before it was incorrectly calculating the width. This is a reasonable compromise and is just something that we need to make clear to the users.

All unit tests are passing which helps confirm we are getting the same results. Obviously, some emojis will provide a different result, which is to be expected and the driver for this work.

Thanks

A special thanks to @rivo for taking so much of his time for explaining and helping with this refactor. I doubt this pull request would have gotten to this state without him. Improving the performance of uniseg was well above and beyond expectations. You the real ⭐ in this work.

Nest Steps

I am hoping @muesli can help with reviewing this pull request and providing guidance on any changes necessary. I do think some mention of these changes needs to go into the documentation and some thought should be considered to bumping the major version to indicate there are breaking changes.

@muesli
Copy link
Owner

muesli commented Jan 29, 2024

We'll have to bump the required Go version to 1.18.

Replace the use of `RuneWidth` and `StringWidth` from `mattn/go-
runewidth` with equivalent functions from `rivo/uniseg`.

It is important to be aware that using `RuneWidth` will not be accurate
as the width of a rune cannot be determined in isolation. This requires
a shift to thinking about grapheme clusters instead.

Unfortunately due to the complexity of identifying grapheme clusters,
there has been some signifcant performance regressions in two
functions:

- PrintableRuneWidth: 10x slower
- TruncateString: 4x slower

Two other functions have had performance improvements:

- MarginString: 2x faster
- PaddingString: 2x faster

The documentation for `rivo/uniseg` mentions the use of `Step` and
`StepString` performing "orders of magnitude faster" than using the
`NewGraphemes` method. However, implementing these changes only
resulted in a 10% performance increase.

Signed-off-by: Michael Lorant <[email protected]>
@mikelorant mikelorant force-pushed the refactor/go-runewidth-to-uniseg branch from 26d85b8 to 7edce3e Compare January 29, 2024 22:15
@mikelorant
Copy link
Author

Bumped go.mod to 1.18 as well as updated the GitHub workflow to use the correct version when building.

@mikelorant
Copy link
Author

@muesli Would be great to get your sign off on this work. This is going to have some pretty big performance impacts on many of the bubbles components. I'd like get started on the code changes for Lipgloss and Bubbles.

@mikelorant
Copy link
Author

@muesli Any chance you can take a look at this. Without this going to master branch I cant begin the changes to switch Lipgloss and Bubbles over to it.

To drive home the point how much of a benefit this will be, we will likely see 10x performance in Bubbles components especially textarea which is very reliant on this package.

@meowgorithm
Copy link
Contributor

@mikelorant just a heads up that I talked to muesli and he's been looking into this. so far so good, from what I hear.

@mikelorant
Copy link
Author

All good, this one should take a while because this is going to have a huge ripple effect as it updated in the other Charm tools.

We likely need a test repository to verify the suite of tools works as expected. But we also need a reference terminal to go with it (crossing fingers Ghostty is near perfect for rendering emojis).

Personally, I believe if this is accepted, it may be worth using semantic versions here to indicate breaking changes. That would make it very clear and require applications depending on it to specifically use it. Risk factor would drop to zero as developers are aware that something major has changed.

aymanbagabas added a commit to charmbracelet/x that referenced this pull request Mar 13, 2024
This implements an ANSI and wide-characters aware truncation algorithm
that uses the newly merged [ANSI parser state machine][statemachine] and
the fantastic library uniseg.

Related: muesli/reflow#71

[statemachine]: https://github.com/charmbracelet/x/blob/main/exp/term/ansi/parser/transition_table.go
aymanbagabas added a commit to charmbracelet/x that referenced this pull request Mar 13, 2024
This implements an ANSI and wide-characters aware truncation algorithm
that uses the newly merged [ANSI parser state machine][statemachine] and
the fantastic library uniseg.

Since this is using the ANSI state machine, it's compatible with `CSI m`
(SGR) style sequence, `OSC 8` (hyperlinks), and basically any other
escape sequence supported in the state machine (DCS, ESC, SOS, PM, APC).

Related: muesli/reflow#71

[statemachine]: https://github.com/charmbracelet/x/blob/main/exp/term/ansi/parser/transition_table.go
aymanbagabas added a commit to charmbracelet/x that referenced this pull request Mar 13, 2024
This implements an ANSI and wide-characters aware truncation algorithm
that uses the newly merged [ANSI parser state machine][statemachine] and
the fantastic library uniseg.

Since this is using the ANSI state machine, it's compatible with `CSI m`
(SGR) style sequence, `OSC 8` (hyperlinks), and basically any other
escape sequence supported in the state machine (DCS, ESC, SOS, PM, APC).

Related: muesli/reflow#71

[statemachine]: https://github.com/charmbracelet/x/blob/main/exp/term/ansi/parser/transition_table.go
aymanbagabas added a commit to charmbracelet/x that referenced this pull request Mar 13, 2024
This implements an ANSI and wide-characters aware truncation algorithm
that uses the newly merged [ANSI parser state machine][statemachine] and
the fantastic library uniseg.

Since this is using the ANSI state machine, it's compatible with `CSI m`
(SGR) style sequence, `OSC 8` (hyperlinks), and basically any other
escape sequence supported in the state machine (DCS, ESC, SOS, PM, APC).

Related: muesli/reflow#71

[statemachine]: https://github.com/charmbracelet/x/blob/main/exp/term/ansi/parser/transition_table.go
aymanbagabas added a commit to charmbracelet/x that referenced this pull request Mar 13, 2024
This implements an ANSI and wide-characters aware truncation algorithm
that uses the newly merged [ANSI parser state machine][statemachine] and
the fantastic library uniseg.

Since this is using the ANSI state machine, it's compatible with `CSI m`
(SGR) style sequence, `OSC 8` (hyperlinks), and basically any other
escape sequence supported in the state machine (DCS, ESC, SOS, PM, APC).

Related: muesli/reflow#71

[statemachine]: https://github.com/charmbracelet/x/blob/main/exp/term/ansi/parser/transition_table.go
aymanbagabas added a commit to charmbracelet/x that referenced this pull request Mar 13, 2024
This implements an ANSI and wide-characters aware truncation algorithm
that uses the newly merged [ANSI parser state machine][statemachine] and
the fantastic library uniseg.

Since this is using the ANSI state machine, it's compatible with `CSI m`
(SGR) style sequence, `OSC 8` (hyperlinks), and basically any other
escape sequence supported in the state machine (DCS, ESC, SOS, PM, APC).

Related: muesli/reflow#71

[statemachine]: https://github.com/charmbracelet/x/blob/main/exp/term/ansi/parser/transition_table.go
aymanbagabas added a commit to charmbracelet/x that referenced this pull request Mar 13, 2024
This implements an ANSI and wide-characters aware truncation algorithm
that uses the newly merged [ANSI parser state machine][statemachine] and
the fantastic library uniseg.

Since this is using the ANSI state machine, it's compatible with `CSI m`
(SGR) style sequence, `OSC 8` (hyperlinks), and basically any other
escape sequence supported in the state machine (DCS, ESC, SOS, PM, APC).

Related: muesli/reflow#71

[statemachine]: https://github.com/charmbracelet/x/blob/main/exp/term/ansi/parser/transition_table.go
aymanbagabas added a commit to charmbracelet/x that referenced this pull request Mar 14, 2024
This implements an ANSI and wide-characters aware truncation algorithm
that uses the newly merged [ANSI parser state machine][statemachine] and
the fantastic library uniseg.

Since this is using the ANSI state machine, it's compatible with `CSI m`
(SGR) style sequence, `OSC 8` (hyperlinks), and basically any other
escape sequence supported in the state machine (DCS, ESC, SOS, PM, APC).

Related: muesli/reflow#71

[statemachine]: https://github.com/charmbracelet/x/blob/main/exp/term/ansi/parser/transition_table.go
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Mar 14, 2024
Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Mar 15, 2024
Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Mar 17, 2024
Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Mar 28, 2024
Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Mar 28, 2024
Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Mar 28, 2024
Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Mar 28, 2024
Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Mar 28, 2024
Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Mar 29, 2024
* feat: switch to term/ansi for text manipulation

Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220

* Update get.go
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Mar 29, 2024
* feat: switch to term/ansi for text manipulation

Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220

* fix: combining both conditional and unconditional wrapping

Uses `ansi.SmartWrap` charmbracelet/x#57

Fixes: muesli/reflow#43

* chore: update deps

* Update get.go
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Apr 23, 2024
Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220
aymanbagabas added a commit to charmbracelet/lipgloss that referenced this pull request Apr 29, 2024
Use ANSI aware, wide characters support, uniseg backed term/ansi package
to calculate string widths, truncate, and wrap strings.

Related: muesli/reflow#71
Fixes: #258
Fixes: #220
@mikelorant
Copy link
Author

Withdrawing this pull request as this issue has been resolved for the Charm suite of tools by the creation of the new ansi package. This package has already replaced most references to reflow in Bubble Tea, Lip Gloss and Bubbles.

@mikelorant mikelorant closed this May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants