Notation for inline auxiliary for any kind of character set #177

MrBrezina · 2024-08-27T10:47:43Z

There are situations when we want to list auxiliary characters for other kinds of things (e.g. punctuation, numerals), perhaps it would be better to have an inline notation for optional/auxiliary characters that could be used in any list of characters.

Instead of:

base: a b c
auxiliary: x y z

we could have the following (or use different escape character):

base: a b c \x \y \z

The text was updated successfully, but these errors were encountered:

kontur · 2024-08-28T15:12:19Z

I'm torn on this one. In a way the base vs auxiliary is a very binary distinction, and as you mention, it could be extended to more than just the characters of base. However, adding more implicit notation seems like it will be less clear and less simple to author.

That said, this would be pretty neat. What if any auxiliary chars would be in parenthesis (I think that's not interfering with yaml parsing, but would need custom parsing all yaml strings)?:

name: English
orthographies:
- autonym: English
  characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Æ Œ a b c d e f g h i j k l m n o p q r s t u v w x y z æ œ (À Á Ç È É Ê Ë Ï Ñ Ô Ö à á ç è é ê ë ï ñ ô ö)
  currency: $ ¢ £ € (¥)
  marks: (◌̀ ◌́ ◌̂ ◌̃ ◌̈ ◌̧)
  numerals: 0 1 2 3 4 5 6 7 8 9
  punctuation: '. , ; : ? ! “ ” ‘ ’ ' ( ) (% & ¿ ¡)

Food for thought.

Also, I wonder if "extended" makes more sense as a term in the docs and CLI parameters. Like checking for basic language support vs checking for extended language support.

kontur · 2024-08-28T15:14:03Z

Also, for now we're not set on what kind of requirement the currency/numerals/punctuation) for — we talked about them either as opt-in or auxiliary level requirements, but having this more nuanced notation might open the door to also having some core currency/numerals/punctuation as base level required.

MrBrezina · 2024-08-30T07:57:29Z

So far, I have actually managed without it. See #155 We could use this notation to distinguish the Standard and Alternative notions as described here: https://en.wikipedia.org/wiki/Quotation_mark

moyogo · 2024-08-30T12:31:10Z

Why is putting them in auxiliary not an option?

MrBrezina · 2024-09-04T06:37:12Z

We would not be able to say whether it is punctuation, currency, numeral or character. d

…

On Fri, Aug 30, 2024 at 14:31, Denis Moyogo Jacquerye ***@***.***(mailto:On Fri, Aug 30, 2024 at 14:31, Denis Moyogo Jacquerye <<a href=)> wrote: Why is putting them in auxiliary not an option? — Reply to this email directly, [view it on GitHub](#177 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AADWQY7PBZVHSV2UYXF6JXDZUBQ2HAVCNFSM6AAAAABNF3F4QSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRRGA4TGNZRGI). You are receiving this because you were assigned.Message ID: ***@***.***>

moyogo · 2024-09-04T07:24:52Z

Unicode character categories would help but there may be a few exceptions where a character category doesn't match its use in a language orthography system I guess.

kontur · 2024-09-04T09:39:30Z

I suppose this very much has pros and cons, regardless of what such an implementation would look like. Either is conceptually nice, having and not having an auxiliary attribute. Not having it, we're not over-crowding the attributes and leaving the reader to guess what exactly "auxiliary" means. Syntactic highlight of such characters could be more readable overall and, case at hand, their categories would be obvious. However, having the dedicated attribute is a clear signal of the different levels and that the database does indeed make this distinction.

Unicode character categories would help but there may be a few exceptions where a character category doesn't match its use in a language orthography system I guess.

Yes, I was thinking about this as well, and had the same reservation. Firstly, it's just less distinct, but secondly, I too saw cases where e.g. modifier characters or e.g. apostrophe-like symbols may be auxiliary, and it is unclear if they are punctuation or character.

What do we think of the above proposed (...) notation? I think it would be "typographically" easy to comprehend and doesn't add more programmery syntax to editing the database files.

MrBrezina added the question Further information is requested label Aug 27, 2024

MrBrezina assigned MrBrezina and kontur Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notation for inline auxiliary for any kind of character set #177

Notation for inline auxiliary for any kind of character set #177

MrBrezina commented Aug 27, 2024

kontur commented Aug 28, 2024 •

edited

Loading

kontur commented Aug 28, 2024

MrBrezina commented Aug 30, 2024 •

edited

Loading

moyogo commented Aug 30, 2024

MrBrezina commented Sep 4, 2024 via email

moyogo commented Sep 4, 2024

kontur commented Sep 4, 2024

Notation for inline auxiliary for any kind of character set #177

Notation for inline auxiliary for any kind of character set #177

Comments

MrBrezina commented Aug 27, 2024

kontur commented Aug 28, 2024 • edited Loading

kontur commented Aug 28, 2024

MrBrezina commented Aug 30, 2024 • edited Loading

moyogo commented Aug 30, 2024

MrBrezina commented Sep 4, 2024 via email

moyogo commented Sep 4, 2024

kontur commented Sep 4, 2024

kontur commented Aug 28, 2024 •

edited

Loading

MrBrezina commented Aug 30, 2024 •

edited

Loading