-
Notifications
You must be signed in to change notification settings - Fork 0
nasin Token
This is a document of ilo Token's own style of Toki Pona. It'll be mainly focused at the grammar.
ilo Token is inherently prescriptive. It is a bunch of rules, that's how ilo Token is. We tried our best to keep everything as descriptive as possible. ilo Token's limitations are documented. Error messages are "unexpected" instead of "incorrect". However, ilo Token also uses telo misikeke for error messages and that is out of our control.
But at its core, ilo Token is composed of a bunch of rules. This document shall describe ilo Token's grammar rules and will never be meant for people to abide to. Don't try to learn Toki Pona with this, you'll be confused.
Just remember these rules can change. You're welcome to challenge ilo Token's nasin for better.
If you've encountered "a priori term" it means the term is invented just for the sole purpose of having a descriptor inside ilo Token. There can be better terms, if you know one, please tell us. jan Koko, one of main developers, is not a linguist. They are a programmer with background in programming language development. You can see them borrowing terms from the field such as "Abstract Syntax Tree".
This is an a priori term to describe a single word or multiple words which acts as a singular unit and are applicable as headword, modifier, preverb, preposition, etc.
These are:
- A single word
- X ala X constructions
- Reduplication such as "mute mute" in "pona mute mute"
- Number phrase such as "tu wan" in "kili tu wan"; ilo Token recognizes nasin nanpa pona
- Any of these followed by emphasis particle "a"
By defining word unit, we can handle more complex phrases: In "sina sona mute ala mute", "sona mute ala mute" is one of the phrases with "sona" being the headword and "mute ala mute" being the modifier.
Word units are either headed or unheaded. Unheaded word unit only include number phrases. The rest are headed word units. This distinction is applicable for preverbs and preposition which can take multiple forms: "wile a moku", "wile ala wile moku", "wile wile moku", etc., but it cannot be number phrases hence preverbs and preposition can only take headed word units.
Don't forget ilo Token considers many grammatical constructions. For example, "kili tu wan", "tu wan" can be grouped and considered a word unit, or separated and considered distinct modifiers. Such interpretation will be translated as "2 singular fruits".
We've came up with an a priori term for this: partial parsing. Partial parsing happens when words expected to be grouped are separated, this happens with the examples above.
Partial parsing can be very granular. For example "kili luka tu wan", the "luka" can be considered separate from "tu wan", and "tu wan" can be grouped together.
This is how ilo Token parses Toki Pona texts. However this doesn't mean it's going to be translated and shown in the final output. ilo Token avoids multiple numerical determiner or quantifier determiner, you won't see "5 3 fruits", but you can see "3 hand-related fruits" since "hand-related" is an adjective.
There are exceptions to partial parsing word unit: X ala X constructions and reduplicated modifiers such as "pona mute mute" aren't partially parsed unless turned on by the settings. Phrases such as "sitelen sitelen" are allowed to be partially parsed since the first "sitelen" is a headword, not a modifier.
From now on in this document, we'll assume any combination is possible unless specified.
Perhaps this is a bias in jan Koko's part but interpreting X ala X not as an interrogative construction happens to be very rare (This needs validation). This also helps with reducing output number.
Perhaps this is another bias in jan Koko's part. However, the main reason this is disallowed is for reducing output number. It'll be explained later as it'll need further context.
ilo Token recognizes nasin nanpa pona. The pu system is recognized as usual. As a recap, the pu system works by adding the numbers:
Word | Number |
---|---|
ala | 0 |
wan | 1 |
tu | 2 |
luka | 5 |
mute | 20 |
ale/ali | 100 |
So 6 is "luka wan" and 36 is "mute luka luka luka wan".
This isn't one to one. 20 could be "mute" but could also be "luka luka luka luka". This fact is often used for disambiguation.
There is a restriction however: It must be in descending order. So it's "luka wan", but not "wan luka". Restricting the order seems necessary for supporting nasin nanpa pona where order partly matters.
Another restriction is: "ala" can only use to mean 0, it cannot be used for adding. We cannot use "luka ala" to mean 5. This restriction seems obvious but is worth mentioning for completeness.
TODO
For completeness, here's the complete syntax rule in EBNF:
sub hundred = {"mute"}, {"luka"}, {"tu"}, {"wan"};
non empty sub hundred = sub hundred - "";
ale = "ale" | "ali";
hundredths unit = non empty sub hundred, ale, {ale};
number = "ala"
| hundredths unit, {hundredths unit}, sub hundred
| {ale}, sub hundred;