-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core): support normalization in Core 🌱 #9999
Closed
7 tasks done
Comments
This was referenced Nov 22, 2023
mcdurdin
added a commit
that referenced
this issue
Jan 15, 2024
First half of #9999. Adds support for normalization (to NFD) of input app context into the cached context. The keyboard processor will work with the NFD cached context. Adds unit tests for the normalization as part of the LDML keyboard processor test suite. TODO: * Comparing modified cached context to app context to determine the transform required to send to the app * Handling illegal unicode and unpaired surrogates on input context
mcdurdin
added a commit
that referenced
this issue
Jan 15, 2024
mcdurdin
changed the title
feat(core): support normalization in Core
feat(core): support normalization in Core 🌱
Jan 16, 2024
mcdurdin
added a commit
that referenced
this issue
Jan 16, 2024
mcdurdin
added a commit
that referenced
this issue
Jan 16, 2024
Relates to #9999. Fixes #10384. The context API endpoints should no longer be considered as part of the standard Core API. The only consumers that have a need to access these APIs are the IMX integration in Engine for Windows, and the Keyman Developer Debugger. These symbols are currently used by Developer: * `km_core_context` struct * `km_core_context_type` enum * `km_core_context_item` struct * `KM_CORE_CONTEXT_ITEM_END` macro * `km_core_state_context()` * `km_core_context_set()` * `km_core_context_clear()` These symbols are currently used by Windows IMX: * `km_core_context` struct * `km_core_context_type` enum * `km_core_context_item` struct * `KM_CORE_CONTEXT_ITEM_END` macro * `km_core_context_items_dispose()` * `km_core_context_item_list_size()` * `km_core_state_get_intermediate_context()` The following functions and symbols are moving to keyman_core_api_context.h: * `km_core_context` struct * `km_core_context_type` enum * `km_core_context_item` struct * `KM_CORE_CONTEXT_ITEM_END` macro * `km_core_state_context()` function * `km_core_state_get_intermediate_context()` function * `km_core_context_set()` function * `km_core_context_clear()` function * `km_core_context_get()` function * `km_core_context_items_from_utf16()` function * `km_core_context_items_from_utf8()` function * `km_core_context_items_to_utf8()` function * `km_core_context_items_to_utf16()` function * `km_core_context_items_to_utf32()` function * `km_core_context_items_dispose()` function * `km_core_context_length()` function * `km_core_context_append()` function * `km_core_context_shrink()` function * `km_core_context_item_list_size()` function
mcdurdin
added a commit
that referenced
this issue
Jan 16, 2024
mcdurdin
added a commit
that referenced
this issue
Jan 16, 2024
Relates to #9999. Establishes functions, unit test sources, normalization entry point and an effectively no-op unit test for normalization support.
2 tasks
mcdurdin
added a commit
that referenced
this issue
Jan 17, 2024
Fixes #9999. Note TODO items: - [ ] Renormalize cached_context across action boundary. Blocked by #10369. - [ ] Add extra tests for surrogate pairs - [ ] Move set_context_from_string into helper module - [ ] if we don't apply normalization, we still need to fixup the app_context, to keep it coherent with cached_context (or at least we need to verify that app_context is never used in this situation)
mcdurdin
added a commit
that referenced
this issue
Jan 17, 2024
mcdurdin
added a commit
that referenced
this issue
Jan 17, 2024
Relates to #9999. Fixes #10384. The context API endpoints should no longer be considered as part of the standard Core API. The only consumers that have a need to access these APIs are the IMX integration in Engine for Windows, and the Keyman Developer Debugger. These symbols are currently used by Developer: * `km_core_context` struct * `km_core_context_type` enum * `km_core_context_item` struct * `KM_CORE_CONTEXT_ITEM_END` macro * `km_core_state_context()` * `km_core_context_set()` * `km_core_context_clear()` These symbols are currently used by Windows IMX: * `km_core_context` struct * `km_core_context_type` enum * `km_core_context_item` struct * `KM_CORE_CONTEXT_ITEM_END` macro * `km_core_context_items_dispose()` * `km_core_context_item_list_size()` * `km_core_state_get_intermediate_context()` The following functions and symbols are moving to keyman_core_api_context.h: * `km_core_context` struct * `km_core_context_type` enum * `km_core_context_item` struct * `KM_CORE_CONTEXT_ITEM_END` macro * `km_core_state_context()` function * `km_core_state_get_intermediate_context()` function * `km_core_context_set()` function * `km_core_context_clear()` function * `km_core_context_get()` function * `km_core_context_items_from_utf16()` function * `km_core_context_items_from_utf8()` function * `km_core_context_items_to_utf8()` function * `km_core_context_items_to_utf16()` function * `km_core_context_items_to_utf32()` function * `km_core_context_items_dispose()` function * `km_core_context_length()` function * `km_core_context_append()` function * `km_core_context_shrink()` function * `km_core_context_item_list_size()` function
mcdurdin
added a commit
that referenced
this issue
Jan 17, 2024
2 tasks
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Impacted modules
Steps
Add a Core API to convert
km_core_actions
tokm_core_action_item*
(mirrorsaction_item_list_to_actions_object
)Update Engines to always call
km_core_state_context_set_if_needed
, and stop managing their own cached context.LDML processor can then generate
km_core_actions
and existing engines can consume it with no further modification.Normalization support for LDML keyboards:
km_core_state_set_context_if_needed
)km_core_state_get_actions
andactions_object_to_action_item_list
)17.0 target API usage is:
17.0 TODO
Move to
km_core_state_context_set_if_needed
km_core_state_context_set_if_needed
➡ chore(windows): remove cached context management from Keyman Engine #10052km_core_state_context_set_if_needed
➡ feat(mac): invoking Keyman Core from Keyman Engine for Mac 🍕 #8403km_core_state_context_set_if_needed
#10212km_core_state_context_set_if_needed
#10213km_core_state_context_set_if_needed
🌱 #10215Longer term
Implementation notes
in
km_core_state_context_set_if_needed
we'll need to do something like this:In
km_core_state_get_actions
andactions_object_to_action_item_list
, Core will need to track the context back to the last base codepoint (per Unicode spec) to give a starting anchor for normalization of output.Core will need to maintain a copy of the exact text from the app as well as the NFD cached context for LDML, and need to sync those up to determine the number of codepoints to delete from the app. Remember that the input context is in NFU "Normalization Form Unknown" -- and may well have mixed content, and we need to know exactly how many codepoints to delete from that mixed content.
Application context synchronization
Note: this discussion is working in UTF32; if working in UTF16 then consideration needs to be given to whether counts are in codepoints or codeunits.
Given an application context in NFU and a cached context in NFD (for the purposes of this discussion, stripped of markers), we need to be able to calculate the number of characters to delete.
del
(number of characters) andins
(string to insert).cached_context
is always NFD.app_context
is always NFU.The LDML keyboard processor will return a number of NFD codepoints to delete. However, this number cannot be passed straight to the app, because the normalization of the string may not not match up. We also need to consider normalization with string concatenation. This requires a two step process:
del
characters fromcached_context
cached_context
is not empty, and the last character ofcached_context
is not a stable character, prepend it to theins
and incrementdel
.app_context
, and additional fixups that need to be prepended to the inserted textins
because the change happened over a normalization boundary:app_del
will then be the number of characters to delete from the documentins
will be the text to insertUTF-16 Considerations
Core has enough data to provide the number of codepoints to delete and number of UTF-16 codeunits to delete. Recommend that this information is provided in the action object to consumers, as that covers both of our deletion use cases (injecting backspaces vs direct string manipulation).
The text was updated successfully, but these errors were encountered: