-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize CodedBufferWriter._copyInto
to memcpy
#885
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dart2wasm currently can't optimize loops into memcpy, however `setRange` methods have type tests to generate `array.copy` (Wasm's `memcpy`). Replacing the loops in `CodedBufferWriter._copyInto` with `setRange` improves an internal benchmark extracted from a real use case significantly in all targets: | | Before | After | Diff | |------------------------------|------------|------------|---------------------| | AOT | 127,587 us | 95,634 us | -31,953 us, -25.0% | | JIT | 106,880 us | 92,800 us | -14,080 us, -13.1% | | dart2js -O4 | 285,587 us | 262,222 us | -23,365 us, -8.1% | | dart2wasm --omit-type-checks | 337,000 us | 236,100 us | -100,900 us, -29.9% |
mkustermann
approved these changes
Oct 23, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice \o/
osa1
added a commit
to osa1/protobuf.dart
that referenced
this pull request
Oct 24, 2023
By avoiding allocating empty `Uint8List`s this saves a few percent in dart2js -O4 in the benchmark reported in google#885. | | Before | After | Diff | |------------------------------|------------|------------|---------------------| | AOT | 122,917 us | 122,741 us | | | JIT | 93,376 us | 94,880 us | | | dart2js -O4 | 271,111 us | 258,250 us | -12,861 us, -4.7% | | dart2wasm --omit-type-checks | 196,454 us | 195,300 us | | Number of splices in the benchmark before this change: 21,123 After: 20,857
osa1
added a commit
to osa1/protobuf.dart
that referenced
this pull request
Oct 24, 2023
…edBufferWriter` Similar to google#885, this optimizes some more buffer copying to `memcpy`. Becuase the chunks can be very small, we use a simple loop as before on small chunks and `setRange` on large chunks. The chunk size to check for when to use a loop is somewhat arbitrarily chosen as 20. In many benchmarks the chunks are either too small (less than 10 bytes) or a more than a hundred bytes, so in the benchmarks I've tried it doesn't matter whether the threshold is 20, 30, or 40. The slowness in dart2js is probably caused by the `Uint8List` allocation, to be passed to `setRange` as the source. Results from the same benchmark in google#885. Before numbers are using the changes in google#886. | | Before | After | Diff | |------------------------------|------------|------------|---------------------| | AOT | 122,741 us | 114,582 us | - 8,159 us, -6.6% | | JIT | 94,880 us | 92,317 us | - 2,483 us, -2.6% | | dart2js -O4 | 258,250 us | 266,000 us | + 7,750 us, +3.0% | | dart2wasm --omit-type-checks | 195,300 us | 169,166 us | -26,134 us, -13.3% | AOT and JIT tested on x64.
osa1
added a commit
that referenced
this pull request
Oct 24, 2023
…edBufferWriter` (#887) Similar to #885, this optimizes some more buffer copying to `memcpy`. Results from the same benchmark in #885: | | Before | After | Diff | |------------------------------|------------|------------|---------------------| | AOT | 114,713 us | 109,838 us | - 4,875 us, -4.2% | | JIT | 91,960 us | 92,887 us | + 927 us, +1.0% | | dart2js -O4 | 259,125 us | 257,000 us | - 2,125 us, -0.8% | | dart2wasm --omit-type-checks | 196,909 us | 182,333 us | -14,576 us, -7.4% | AOT and JIT tested on x64.
osa1
added a commit
that referenced
this pull request
Oct 26, 2023
By avoiding allocating empty `Uint8List`s this saves a few percent in dart2js -O4 in the benchmark reported in #885. | | Before | After | Diff | |------------------------------|------------|------------|---------------------| | AOT | 122,917 us | 122,741 us | | | JIT | 93,376 us | 94,880 us | | | dart2js -O4 | 271,111 us | 258,250 us | -12,861 us, -4.7% | | dart2wasm --omit-type-checks | 196,454 us | 195,300 us | | Number of splices in the benchmark before this change: 21,123 After: 20,857
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
dart2wasm currently can't optimize loops into memcpy, however
setRange
methods have type tests to generatearray.copy
(Wasm'smemcpy
).Replacing the loops in
CodedBufferWriter._copyInto
withsetRange
improves an internal benchmark extracted from a real use case significantly in all targets:cl/575887404