-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Rust built-in UTF-16 conversion #8
Comments
I have been working on this issue (diff here). On the whole it is a straightforward change, mainly involving the replacement of some unsafe code with In most places where we pass a I wanted to reach out and get thoughts on this before moving forward. |
I am not super convinced that this change is actually necessary for
`rust_icu` in particular. But please note that my opinion on this is
biased, and a possible way forward is laid out below.
I feel that some background is in order to explain my opinion. The folks
who started `rust_icu` are mostly working on Fuchsia OS. If you're an
operating system, it pays to use just one library to work with Unicode,
because you get to reuse it many times. So for us, using Rust code to do
something that ICU can already do is not essential.
This tradeoff is different for other uses. Most "regular" users are not
super concerned with binary size, so it makes sense to them to simplify the
APIs as you did.
In a world where I could make a wish, I'd leave both approaches, possibly
hidden behind a feature switch, so that there is an option to use pure ICU
or a hybrid approach. There will be wrinkles as you noted, where the ICU
API is not self-consistent: it's not clear to me why for msgfmt in
particular they departed from the "pascal" array passing convention
(size+data buffer). I think that were the built-in UTF-16 conversion hidden
behind a flag, you could simply use the conversion that copies the bytes.
This way you could get a choice between using ICU only, without the
conversion; and rust+icu which is a better API but slightly inefficient.
Thoughts?
I apologize if this isn't exactly the feedback you may have hoped for. We
should have discussed this issue ahead of time to avoid wasted effort.
F
|
I realized we were missing a label that says "this issue has open questions to be resolved", so I added the label and added it to this issue. |
I'm not hard set on making this change. When I was first getting acquainted with Do you have any suggestions for a better issue to pick up? Perhaps one of the missing ICU4X APIs (#138 is an example)? |
A couple of notes:
|
Just checking in, @bdragon, Is there something else from the list above that piqued your interest? |
@filmil sorry for letting this collect dust. I will take another look at the UTF-16 work and reply in more detail this weekend. Regarding the remaining ECMA 402 tasks, I would be happy to work on any of them, really, though I don't believe I can assign one to myself. To be less indifferent, #134 (Implement Intl.Collator) is appealing. |
That's all OK. I just wanted to understand if there is a task that piques your interest. No need to be super-formal about assignment, especially if there are permission issues -- just drop a note on the bug that you intend to take and I'll assign you to it. |
@filmil I can see a path for making this an optional feature. Conversion to UTF-16 with ICU via FFI is obviously fallible; conversion in Rust with Offering conversion in Rust as an optional feature would enable this slightly friendlier API for end users, but I think having one fallible and one infallible API, depending on how the conversion is performed, would burden those call sites within rust_icu with Therefore I think it would be best to continue using What do you think? Looking now at my initial concern, since the problem seems to be scoped to rust_icu_umsg, maybe the solution is to ensure that any string arguments passed to |
I'm a bit busy at the moment, but I have this on my TODO list, and will
respond to this by the end of the week.
Thank you for your patience.
In general, the direction of (1) using a feature and (2) making a minimally
invasive change would be most welcome. I still need to look in detai.
|
Sounds good—no rush on my end. |
This avenue seems to me like the least amount of complication for the end user. A single constructor is used throughout. I am not super-worried about the fallibility of the constructor remaining even if the underlying code is infallible. Here are a few alternatives, none spectacularly better:
Thanks for your patience. |
Note, I hope to have answered your question here, let me know if I can be of more help. |
Yes, thank you—I think I have enough to get this wrapped up. |
Split from #6
The code currently calls
u_strToUTF8
. It would be better to use Rust standard library functions like std::str::encode_utf16. You would avoid having to go across the FFI boundary when performing the conversion.The text was updated successfully, but these errors were encountered: