-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[i18n] Support of non-european languages and non-latin scripts #36
Comments
Same solution might help us to get rid of
Maybe a good solution would be making it look like this: |
I fully agree.
I agree this is probably the best option. I can see three issues:
But:
Agreed.
I'm not sure, I think you're right. Regardless, all other attributes in the root element refer to the format itself (format; revision), all details of the dictionary content are in
Here I disagree. Transliteration, pronunciation and key-phrases are very different semantically and should be handled differently and generally be displayed differently by the DS. This is only possible defined separately.
(I will be using Chinese as an example, the situation should be similar in other non-phonetic scripts.)
E.g.: the word 词 (zh-Hans) or 詞 (zh-Hant), meaning “word”, is pronounced cí (zh-Latn-pinyin). 词 and 詞 are exactly the same word, cí is just how it is pronounced; it's not a key-phase, a dozen other words have the same pronunciation. So if I type 词 the DS should show this word's entry but if I type cí it should show a list of entries with the same pronunciation, including 瓷 porcelain, 雌 female, 磁 magnetism, 慈 compassion, etc., all of which are pronounced cí. All of these words are Also visually these tags are different, In short, I believe the transliteration is semantically (very) different from a key-phrase and should be displayed differently by the DS. It is not a headword and should not be defined as such. The transliteration is very important for non-phonetic witting systems, and defining them separately from the headword can make it easier for DS developers to support these languages better.
Transcription and pronunciation are different from transliteration: they are used for languages that already use alphabetic scripts, they're not nearly as important (many dictionaries don't have them), they're not standardised (different dictionaries will have different pronunciations for the same word, even if publish in the same country), it's not expected the DS will recognize different ways of inputting it (such as v for ü, ou ci2 for cí ) and, at least in the case of IPA transcription many (maybe most) users don't even know how to read it. Clearly, they should not be indexed (at least by default), as no one looks up a word by pronunciation in European languages. These two ways of representing may possibly be defined by the same element with different attributes, but it should not be (Note:
If I understood this correctly, the point of
I find this much more clear. The DS would list it under U, not T, and if a user started typing "United", he should still easily find this entry. [I'm sorry I'm not able to reply in more timely manner. I am overloaded with work, and it took me several days just to reply to this comment of yours. And it might still not be very clear, you know, "if I had the time, I'd write a shorter reply". I am reading the comments of this repo as they are made, I just need time to reply.] |
Choosing between
|
I read that article before. If I understand it correctly, it's appropriate use
You are much more knowledgeable about this than me so I'm going to give opinions about this. I think an actual example with different writing systems (simplified and traditional Chinese), variant pronunciations (Mainland and Taiwan) and different transliterations (pinyin and bopomofo) would be helpful. This is how the 各个, "every", entry from Cross-straight Dictionary looks in GoldenDict in my current conversion:
I believe in your current proposal (and retaining my insistence on transliteration element) this would be:
Is this right? This works for me as it allows for script variants and transliterations, as well regional differences. It is already a huge improvement for languages in non-alphabetic scripts. However, this is what I would prefer (while still using the BCP47-mandated ISO codes):
This is because it just looks much more clear but also because it states planinly what is being defined.
|
I think that it is important to introduce new tags slowly, since dictionary software is not very fast to accommodate changes. I have 2 solutions:
<ar>
<k xml:lang="zh-Hans">各个</k>
<k xml:lang="zh-Hant">各個</k>
<k xml:lang="zh-Latn-pinyin-CN">ɡèɡè</k>
<k xml:lang="zh-Latn-pinyin-TW">ɡèɡe</k>
<k xml:lang="zh-Bopo-CN">ㄍㄜˋ ㄍㄜˋ</k>
<k xml:lang="zh-Bopo-TW">ㄍㄜˋ ˙ㄍㄜ</k>
<def>
...
</def>
</ar> Either way, all |
Here is a list of proposals :
1. Writing systems and scripts
@k-sl wrote:
Proposed solution:
We allow putting
<k>
with and without a specification, which language or script or country variant this<k>
is:How to encode language and scripts? The most reasonable and taking the least amount of work is to use BCP47 standard to support various writing systems.
What to do with multilingual dictionaries?
lang_to
andlang_from
to supportxml:lang
and allow us to encode several languages for multilingual dictionaries. This is not possible with<!ATTLIST>
I think. So I guess we will have to create new<!ELEMENT>
insidemeta_info
. Am I wrong?For this reason, I don't think that we need additional tags
<tl>
(for transliteration) and<pr>
(for pronunciation).The text was updated successfully, but these errors were encountered: