A path to ex_cldr 3.0 (Localize 1.0) #244
Replies: 5 comments 7 replies
-
I like the direction of all of the things you mentioned. I'd be curious mostly about the backend less architecture, but also very interested in message format 2.0 out of personal curiousity. |
Beta Was this translation helpful? Give feedback.
-
"No backend" architectureMy working hypothesis is that storing locale data in Of course this hypothesis has to be validated. And there is some very tricky work for those areas - like number formatting - where the data is compiled to code. For those few but important cases, generating modules at runtime will be required. I suspect that will be trickier than I currently think. |
Beta Was this translation helpful? Give feedback.
-
Message Format 2.0This format is syntactically and semantically more complex that Message Format 1.0 (which is implemented in There is a lot to unpack here - and the actual formatting is the most straight forward. In addition I want to break away from a Gettext dependency and a One of the other design decisions is how to manage "message keys". In Gettext, the embedded message is the key. For complex messages in Message Format 2.0 this is very likely not a sustainable or scalable approach. Suffice to say, there is a lot of design work required first to fulfil the vision I have - including being a first class part of the Phoenix ecosystem. |
Beta Was this translation helpful? Give feedback.
-
Will this help with the issue of localizing strings containing HTML tags formatting? @kipcole9 and I have talked about this a few times and there has been some discussion on the forums about this issue as well, but I've never seen a really great solution. Essentially the problem (as I see it) is that of localizing something like the following:
Right now, this cannot be done without fooling around with |
Beta Was this translation helpful? Give feedback.
-
Let me know if/when you reach that point! I'd love to collaborate. I still have very fond memories of the time we worked on the calendar and datetime support in Elixir core. (And also, thanks for your flattering words! 😅) |
Beta Was this translation helpful? Give feedback.
-
After considerable reflection I'm preparing to embark on the next generation of
ex_cldr
. And since it will have breaking changes it will be a3.0
release. One of the objectives is to make the library easier to discover, understand and use for developers. I will likely also rename it tolocalize
, version 1.0.I expect that development will take most of 2025 to complete.
Localize 1.0 (ex_cldr 3.0) goals
Since the first commit in May 2016,
ex_cldr
has been installed nearly 5m times, suggesting it has good support from the Elixir community. At the same time OTP and Elixir have improved. A new type system is in active development. And some of the constraints of the existing family of nearly 30ex_cldr
libraries have continued to irritate.The main goals are:
Rename to Localize
Make the libraries more discoverable and name them in a way that is more reflective of what the libraries do.
CLDR
is the core data and specification. The CLDR teams main library is called icu. The javascript implementation is intl. In my opinion,icu
is just as opaque asCLDR
. Internationalisation implies a distinction between "us" and "others" - reinforcing the common perception that localisation is something that can be deferred to "later". As Wikipedia defines:Therefore the implementation of
localize
(ex_cldr
) is "Internationalisation". The use of the libraries is "Localization". My hope is that the libraries will become more mainstream, and the standard go-to for formatting numbers, dates, times, Phoenix routes and so on. Whether the application is targeting only one language or culture or many.Remove the "backend" approach to hosting CLDR data
During the first implementation of
ex_cldr
, gettext was the architectural model. Hence the CLDR data is encapsulated in "backend modules" that are configured at compile time. There are benefits to his approach on the BEAM, but there are also limitations:Separate the core CLDR data ingestion and data loading
The current
ex_cldr
implementation incorporates the modules that important, transform and store the raw CLDR locale data. From a developers point-of-view, these modules are irrelevant at best and and overhead at worst. It also makes contirbuting toex_cldr
more difficult because of the combination of concerns. The updated implementation with clearly separate the data transform process and more strictly define the the transformed locale data struct.Simplify library packaging and testing
The proliferation of libraries has historically intended to serve a few purposes:
This new development will re-package into a smaller number of libraries. One possible packaging might be:
localize_transformer
to be the CLDR data transformerex_cldr
,ex_cldr_numbers
,ex_cldr_currencies
,ex_cldr_dates_times
,ex_cldr_calendars
,ex_cldr_units
(formatting only),ex_cldr_territories
andex_cldr_languages
into a single library. These are the core formatting libraries.ex_cldr_html
andex_cldr_localized_routes
that could be combined intolocalize_phoenix
library.Canonical error and exception handling
Today the libraries consistently return
{:error, {ExceptionStruct, binary_message}}
which date back to when I didn't correctly understand exception structs. In the new version the return should be{:error, ExceptionStruct}
with correctly defined exceptions including a message formatter that can, itself, be translated.Leverage the emerging type system
A simple majority of bug reports on
ex_cldr
libraries has been around dialyzer warnings. Today most of the common libraries are quite robust and will pass dialyzer with with flags:error_handling
,:unknown
,:underspecs
,:extra_return
,:missing_return
but these tend to lend little overall developer benefit.The aim of the new libraries will be to fully leverage the emerging type system and type notation. The most significant work will likely be to type the locale data itself given how complex that data structure is. However the benefit is likely to be more correct code faster for library development (I don't expect it will provide great benefit to library consumers).
Build a new translator library
With the approval of the UInicode Message Format 2.0 in November 2024, the time is right to implement it for Elixir. In addition, there is room for a more fully-featured translation ecosystem for Elixir that goes beyond
gettext
. Gettext's backend architecture, compile-time configuration and.pot
serialisation has benefits - especially consistency with other language ecosystems. Howevergettext
messages have important drawbacks with plurals and grammatical gender and in how translators can easy understand the intent and context of a message. Message Format 2.0 seeks to overcome these issues and the timing is write from an Elixir perspective.One possibility will be to explore making message translation pluggable in Phoenix after the library is able to provide a solid foundation.
The broader goals are to be able to:
Re-implement units of measure
The library I most enjoy from a functional perceptive is
ex_cldr_units
but it has some major flaws due to my misunderstanding of the spec (to be fair, the spec has improved a lot in its readability in the last few years too). Remediating it to be more extendable, correct and maintainable is dififcult.I plan to reimplement it and split into two libraries. One will focus on parsing, conversion and localisation. The second will focus on unit math - basically an algebra engine that should be able to work with any struct that supports a protocol I will define. Similar to how the ratio operates today. Indeed it may be possible to leverage the
ratio
library directly given that @Qqwy is far smarter than I am.Fill out the missing pieces
On this new foundation, there is still work to be done to implement:
These are very interesting topics in their own right and deserve the time to implement them. I'm hoping with the new implementation there could be volunteers to help.
Other topics
Feedback is welcome on other areas of improvement. Especially with the goal of being a flexible runtime platform with improved DX, packaging, testing and integration and CLDR coverage.
cc: @LostKobrakai @Schultzer
Beta Was this translation helpful? Give feedback.
All reactions