Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notation for inline auxiliary for any kind of character set #177

Open
MrBrezina opened this issue Aug 27, 2024 · 7 comments
Open

Notation for inline auxiliary for any kind of character set #177

MrBrezina opened this issue Aug 27, 2024 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@MrBrezina
Copy link
Member

There are situations when we want to list auxiliary characters for other kinds of things (e.g. punctuation, numerals), perhaps it would be better to have an inline notation for optional/auxiliary characters that could be used in any list of characters.

Instead of:

base: a b c
auxiliary: x y z

we could have the following (or use different escape character):

base: a b c \x \y \z
@MrBrezina MrBrezina added the question Further information is requested label Aug 27, 2024
@kontur
Copy link
Contributor

kontur commented Aug 28, 2024

I'm torn on this one. In a way the base vs auxiliary is a very binary distinction, and as you mention, it could be extended to more than just the characters of base. However, adding more implicit notation seems like it will be less clear and less simple to author.

That said, this would be pretty neat. What if any auxiliary chars would be in parenthesis (I think that's not interfering with yaml parsing, but would need custom parsing all yaml strings)?:

name: English
orthographies:
- autonym: English
  characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Æ Œ a b c d e f g h i j k l m n o p q r s t u v w x y z æ œ (À Á Ç È É Ê Ë Ï Ñ Ô Ö à á ç è é ê ë ï ñ ô ö)
  currency: $ ¢ £ € (¥)
  marks: (◌̀ ◌́ ◌̂ ◌̃ ◌̈ ◌̧)
  numerals: 0 1 2 3 4 5 6 7 8 9
  punctuation: '. , ; : ? ! “ ” ‘ ’ ' ( ) (% & ¿ ¡)

Food for thought.

Also, I wonder if "extended" makes more sense as a term in the docs and CLI parameters. Like checking for basic language support vs checking for extended language support.

@kontur
Copy link
Contributor

kontur commented Aug 28, 2024

Also, for now we're not set on what kind of requirement the currency/numerals/punctuation) for — we talked about them either as opt-in or auxiliary level requirements, but having this more nuanced notation might open the door to also having some core currency/numerals/punctuation as base level required.

@MrBrezina
Copy link
Member Author

MrBrezina commented Aug 30, 2024

So far, I have actually managed without it. See #155 We could use this notation to distinguish the Standard and Alternative notions as described here: https://en.wikipedia.org/wiki/Quotation_mark

@moyogo
Copy link
Contributor

moyogo commented Aug 30, 2024

Why is putting them in auxiliary not an option?

@MrBrezina
Copy link
Member Author

MrBrezina commented Sep 4, 2024 via email

@moyogo
Copy link
Contributor

moyogo commented Sep 4, 2024

Unicode character categories would help but there may be a few exceptions where a character category doesn't match its use in a language orthography system I guess.

@kontur
Copy link
Contributor

kontur commented Sep 4, 2024

I suppose this very much has pros and cons, regardless of what such an implementation would look like. Either is conceptually nice, having and not having an auxiliary attribute. Not having it, we're not over-crowding the attributes and leaving the reader to guess what exactly "auxiliary" means. Syntactic highlight of such characters could be more readable overall and, case at hand, their categories would be obvious. However, having the dedicated attribute is a clear signal of the different levels and that the database does indeed make this distinction.

Unicode character categories would help but there may be a few exceptions where a character category doesn't match its use in a language orthography system I guess.

Yes, I was thinking about this as well, and had the same reservation. Firstly, it's just less distinct, but secondly, I too saw cases where e.g. modifier characters or e.g. apostrophe-like symbols may be auxiliary, and it is unclear if they are punctuation or character.

What do we think of the above proposed (...) notation? I think it would be "typographically" easy to comprehend and doesn't add more programmery syntax to editing the database files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants