-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Arabic/Hebrew/Persian languages. #3137
Conversation
@ebraminio the fix for issues reported in #1429 was submitted in this PR. Do you know who should review it ? I just wonder if we need to attract anybody's attention to it. |
I am not familiar with Ace project maintainers but this looks good, even the fact I was not in favor of adding a bidi algorithm but if this in not avoidable, perhaps ace should go for it. Also, I just spotted a minor issue on reverting functionality (double click a RTL word, remove it then press Ctrl+Z, highlighting area is misplaced) which IMO, this totally can go without it; so functionally, this LGTM. BTW, "Pharsi" is misspelling of "Farsi" but even better you can use "Persian" to refer to the language. Thanks for this great effort, I like to see this ASAP on websites using Ace. |
@ebraminio thanks a lot for the feedback. This patch is far from being perfect / final. It is basically a prototype which illustrate basic approach to solution (and make essential features to work - cursor movement, typing etc.). Once we finalize the approach we will obviously polish the implementation to make sure all use cases are covered. |
@nightwing it looks like you were involved in discussion and review of this issue from the very beginning. Would you please have a look at proposed solution and share your comments ? Many thanks in advance. |
@ebraminio Thank you for your comments. |
Thank you very much for cooperation @ebraminio. |
@ashensis Thanks so much for the detailed write up. It was very informative. |
Sorry for not finding time to review this earlier. Hardcoding "Courier" font doesn't work for mac, and doesn't work well with wrapping, since character limits are computed for the main font, maybe for now we should just ask users pick a monospace font that supports their language. Caret movement is generally different from that of textarea. E.g. up/down keep logical column instead of the visual one, left/right move by logical column too, is this intentional? |
Thank you @nightwing for your comments.
|
|
Please rename "Pharsi" to "Persian" (or, "Farsi", which is less favorable) on the title. |
@nightwing Anything I can help with to get this merged in? Perhaps with testing? |
Thank you @nightwing. |
@nightwing when you think @ashensis code can be merged ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good, there are only a couple of changes still needed to merge this.
Since this has non-trivial impact on performance, and possible optimizations for very long lines, it would be better to merge this as a preference turned off by default for now.
We'll turn it on after testing this for a while.
lib/ace/bidihandler.js
Outdated
* @param {Number} the document row to be checked [optional] | ||
* @param {Number} the wrapped screen line index [ optional] | ||
**/ | ||
this.isBidiRow = function(screenRow, docRow, splitIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you change tabs to 4 spaces to match the rest of the code, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/ace/bidihandler.js
Outdated
this.bidiUtil = bidiUtil; | ||
/* Arabic/Hebrew character width differs from regular character width */ | ||
this.charWidths = []; | ||
this.EOL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- bidiUtil references lightweight Bidi engine used to produce Logical <-> Visual maps. These maps are used to correctly calculate caret position.
- charWidths contains widths of various characters (English, Bidi, Zero width, Arabic Lam-Alef). Again, they are used to correctly calculate caret position.
- EOL - end of line character, it may vary in ACE. Line may be followed by end-of-line or end-of-file symbol (when show invisibles) and they are used in caret position calculations (sometimes EOL occurs inside the Bidi line, not at the end)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the issue is this.EOL;
is somehow NOOP here and it is not a function call like this.EOL();
or an assignment and the question is how this line and the next contributing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the suggestion would be putting some default sane value here instead this NOOP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this for both EOL and showInvisibles
lib/ace/lib/bidiutil.js
Outdated
/************************************************************************************************************************************/ | ||
/* 0 1 2 3 4 5 6 7 8 9 A B C D E F */ | ||
/************************************************************************************************************************************/ | ||
/*0-*/ TB00, L , L , L , L , TB05, TB06, TB07, R , L , L , L , L , L , L , L , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to reduce the size of these tables? Maybe we could use something similar to https://github.com/codemirror/CodeMirror/blob/master/src/util/bidi.js#L62?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of fact, approach implemented in codemirror is a rough one, it doesn't account for many outstanding cases.
However, after revising my classification tables I concluded that few of them might be disposed off without much harm.
But I would prefer to retain, partly, at least few tables, they might be compressed to incur less upload impact.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should find a way to compress this significantly, otherwise we have to keep it as a separate extension, and not include in ace.js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retained only 2 character classification tables (actually only part of each one) and compressed them, the rest has been replaced by regular expressions
lib/ace/layer/text.js
Outdated
@@ -311,6 +319,21 @@ var Text = function(parentEl) { | |||
row++; | |||
} | |||
this.element.innerHTML = html.join(""); | |||
if (this.session.$bidiHandler.isCourierFontFace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it would be better to switch font for the whole editor, or let the author of the site to pick font.
This doesn't work on mac anyway, and we are going to add proper support for variable width fonts soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you would agree, I can just remove dynamic font change and let site owner to pick correct font.
But I would say that the font name suitable for Chrome should be somehow documented, isn't it?
May be in the the same editor.css
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove dynamic font change
lib/ace/edit_session.js
Outdated
@@ -1116,6 +1118,7 @@ var EditSession = function(text, mode) { | |||
* | |||
**/ | |||
this.insert = function(position, text) { | |||
this.$bidiHandler.markAsDirty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly speaking I do not understand this proposal.
Line 250 of edit-session.js falls with 'onChangeFold' method that has no bearing on our subject.
The intended changes of mine (invocation of 'markAsDirty') serve to rebuilt Bidi info pertinent to line being edited, in order to be able to correctly calculate caret position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i meant line 255 https://github.com/ashensis/ace/blob/8b21722f4dff1e43a5c4debf066a54b0bcf6a9d3/lib/ace/edit_session.js#L255, session can be changed without calls to insert/remove. The only reliable way to handle that is to listen to the change event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/ace/edit_session.js
Outdated
@@ -1128,6 +1131,7 @@ var EditSession = function(text, mode) { | |||
* | |||
**/ | |||
this.remove = function(range) { | |||
this.$bidiHandler.markAsDirty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for this and the other markAsDirty calls
lib/ace/edit_session.js
Outdated
@@ -2430,6 +2443,10 @@ var EditSession = function(text, mode) { | |||
this.$stopWorker(); | |||
}; | |||
|
|||
this.isFullWidth = function(c) { | |||
return isFullWidth(c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to do this.isFullWidth = isFullWidth;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@nightwing I addressed all your comments, please have a look at submitted changes. |
@nightwing Did @ashensis answered all your comments ? Are you happy with his changes ? |
looks good, i've merged with some trivial style changes, and will try using it for a while. Thanks for all your hard work on implementing this, and your persistence and patience for getting this merged. |
Thank you @Nigtwing |
I am very eager to test this too. How do I access a build with this change included? |
@lyndondrake clone this repository, install node, then run
or simply open https://rawgit.com/ajaxorg/ace/255c3a1793276b77eeba90e40df99c3ccc941780/kitchen-sink.html |
This may not be exactly related to this PR but asking here as most knowledgeable people seem to be here: The spacing issues as described at #2667 and #3037 seem to still not be fixed even at https://rawgit.com/ajaxorg/ace/255c3a1793276b77eeba90e40df99c3ccc941780/kitchen-sink.html -- is that expected? For example if I put some random Indic text on a long line:
and click somewhere in it and hit Enter, the line break gets inserted somewhere other than where the cursor was. Is there a way to resolve it by using a different font? Because it would be surprising if RTL text with varying char widths can work, but not left-to-right. |
(All I wanted was a textarea into which I can put say 2 MB of plain text, with some minimal highlighting for From what I can understand of this PR by looking at the code, it looks like there are two parts to this support for Arabic/Hebrew/Persian:
This PR does both! But what we need for supporting Indic scripts (or even characters like ⟦ ♞ ℝ as on the linked issues) is just the first part (correctly handling characters whose widths vary). Is there a way of extracting out just the first part, so that it would be more useful? |
Initially chose Ace for "performance" but the text so far is barely 1MB, also Ran into issues with Indic text: ajaxorg/ace#3137 (comment) So trying CodeMirror, which certainly has the features I want (https://codemirror.net/demo/bidi.html, https://codemirror.net/demo/variableheight.html) and doesn't have these bugs.
So am I correct in understanding that glyph fallback replacement on Macs breaks this font width calculation, since it replaces with glyphs that are not monospaced ? Maybe there is a font feature setting that we can use to correct this ? I'm not sure.. anyone more experience with fonts than me ? :) |
Hi @shreevatsa. |
Hi @hartman. |
Hi @ashensis -- from your comments it looks like you've done most of the work for supporting variable-width left-to-right scripts, and only some minor tweak remains. I am hopeful: if you have a few minutes could you look into the example I posted earlier (#3137 (comment) : you can install a Devanagari font like Noto Sans Devanagari if it doesn't render) and see why it has the issues I mentioned? My guess is that, if the text is simply detected / marked as being "bidi" (in terms of your code in this PR — even if it's strictly left-to-right), then everything would actually work, which would be great. AFAIK the Ace team has no other plans in progress for variable-width characters, and as you've already done such a lot of work I feel there's probably only a minor change or two required for everything to work perfectly. |
Hi @shreevatsa. lastly I said that this PR was targeting only Arabic/Hebrew. |
I didn't try to install Devanagari font but if you already have it installed, you can play around by measuring the widths of various Indic characters to see if it is truly monospaced from this respect. |
And then to measure respecting character for script being configured. |
@ashensis Thanks for that explanation, it was really helpful and cleared up my misunderstanding. I somehow seem to have missed that what's now supported is monospaced Arabic/Hebrew; I didn't even know those existed. :-) Thanks, guess the goal is further than I thought (as I'm not aware of monospaced Indic fonts, and what's really needed is variable-width support which as you say implies performance costs). Sorry for the noise arising from my misunderstanding. |
This PR is intended to provide ACE with proper Arabic/Hebrew/Farsi (aka. Bidi) characters typing/display. (references the discussion under issue 'RTL editing' #1429)
Currently ACE fails to do this for 2 primary reasons.
ACE bases itself on monospaced font in calculating screen columns offsets.
Unfortunately, Bidi character's width differs from that of regular (say English)
characters. This poses challenge, especially when Bidi and none Bidi (English or neutral) characters
are present in the same line.
Moreover, for Chrome engine based browsers (like Chrome and Opera), on some OS the default 'Monako' font
defined in editor.css is not monospaced when it comes to Bidi characters.
This might happen or not happen even on different Windows7 systems.
Apparently this happens since 'Monaco' has no glifs for Bidi characters and font substitution
on different OS(s) picks up different fonts from awailable stock.
Bidi languages differ from none Bidi ones in their flow direction (from right to left),
thus, character order on display (right-to-left) differs from respective order in text buffer (left-to-right).
The respective character positions on display versus buffer are defined by Bidi algorithm implemented by platform
(either by OS for native applications or by browser for web based applications).
The correspondence between respective character positions on display (aka Visual) and in buffer (aka Logical)
is defined by Visual<->Logical maps produced by Bidi Engine implementing Bidi algorithm.
Between additional miscellaneous challenges it worth mentioning:
For references to Bidi algorithm see, for instance, http://unicode.org/reports/tr9/
For general references regarding Bidi see, for instance, https://www.w3.org/International/articlelist#direction
As a consequence, the character screen position (offset) in pure Bidi or mixed Bidi/None-Bidi text lines
should be calculated basing on character screen position rather then buffer one.
The following implementation details are important for understanding how this PR deals with aforementioned problems.
Statefull 'bidihandler' class is introduced to concentrate all Bidi related activities.
It depends on lightweight Bidi engine implemented by library class 'bidiutil'.
'bidihandler' class stores information related to text line under processing, this info
consists of so named 'bidiMap' that contains Bidi levels and Visual<->Logical map (see above for terms definition).
Above mentioned information is updated when caret is moved to another text line.
In order to cope with the problem of calculating correct horizontal offsets of displayed characters
'bidihandler' class stores width of Bidi characters alongside with widths for None Bidi characters, end-of-lines etc.
In cases when 'Monaco' font doesn't furnish monospaced glyphs for Bidi characters, this font gets dynamically
replaced by "Courier' font on per line basis (for lines containing Bidi characters)
Special methods of 'bidihandler' class, like 'getPosLeft', 'offsetToCol', 'getSelections' ensure correct caret display,
calculation of mouse click position and text (single or multiple) selection.
This solution has been tested on IE, FF, Chrome/Opera, Safary.
The test cases will be added to this PR after current approach is approved and code implementation is finalized.