-
Notifications
You must be signed in to change notification settings - Fork 1
Pronunciation Rules
Lexicanter uses a custom-built sound change engine to allow you to configure automatic orthography to phonetic notation conversion.
If you're not familiar with sound change appliers, you might be lost here, and if you are familiar with sound change appliers, you might be wondering how such a thing is supposed to be used to convert your romanization to IPA. In any case, let's start from the beginning.
Let's say you write the phoneme /θ/ as ⟨th⟩. To convert your orthograph ⟨th⟩ to phonetic notation /θ/, we can use a rule.
th > θ
This is straightforward enough. ⟨th⟩ becomes /θ/. But what if /θ/ only occurs before front-vowels, and everywhere else it's /tʰ/. How would we write a rule to accomodate for this? One way is to use a context.
th > θ / _e
th > θ / _i
th > tʰ
Now we're telling it only to turn ⟨th⟩ into /θ/ if it comes before /e/ or /i/. But this is a little bulky; we can condense these rules.
th > θ / _{e, i}
th > tʰ
These two rules do the exact same thing as the last set, but now we're using an anonymous category {e, i}
to mean "either e or i".
But let's say we have several rules which make use of the same anonymous category:
th > θ / _{e, i}
th > tʰ
c > s / _{e, i}
c > k
g > ʤ / _{e, i}
We can avoid writing {e, i}
a bunch of times by defining a named category.
E :: e, i
th > θ / _E
th > tʰ
c > s / _E
c > k
g > ʤ / _E
That's a little better.
But several of these rules are very similar. There's something we can do about that:
E :: e, i
{th, c, g} > {θ, s, ʤ} / _E
{th, c} > {tʰ, k}
We can create anonymous categories of equal length on both sides of the conversion in order to convert between them. We can also do this with named categories, if we prefer:
P :: th, c, g
F :: θ, s, ʤ
K :: tʰ, k, g
E :: e, i
P > F / _E
P > K
This takes up a few more lines, but now we can re-use these named categories in further rules should we need to.
Now that we understand the basics, let's look at how Lexicanter's sound change engine can deal with common and important edge cases.
Before we get into those, though, let's briefly talk about the rule format. On this page, I've been using the formats
pattern > substitution
pattern > substitution / context
In the first case, the implicit context is just _
, or 'always'.
But many of you may be familiar with another standard sound change notation format:
pattern/substitution/context
Lexicanter accepts rules written this way as well. In fact, all of the following are valid and will be accepted:
pattern>substitution
pattern / substitution
pattern/substitution / context
pattern > substitution/context
The rule parser ignores all spaces, and the implicit 'always' context shorthand can be done with either >
or /
rules.
Often times, the pronunciation of a grapheme changes depending on whether or not it's at the beginning or end of a word. Lexicanter reserves two symbols for this purpose: ^
and #
. Both of them have the exact same purpose, and which one you use is a matter of personal preference. I'll be using ^
.
Turn ⟨e⟩ into /ə/ at the end of words:
e > ə / _^
Turn ⟨y⟩ to /ɪ/ at the beginning of words:
y > ɪ / ^_
Sometimes a grapheme has a pronunciation in all except a specific case. For example, this rule turns all ⟨x⟩ into /ks/ as long as they are preceded by anything except /e/ or /i/:
x > ks / {!e, i}_
Note that this does assume {!e}
must be some character, just not the ones in the group. If you would like to make this rule also apply at word boundaries, you can write it as so:
x > ks / {^, {!e, i}}_
or write it as two rules:
x > ks / {e, i}_
x > ks / ^_
There are usually multiple ways to get things to happen.
Sometimes, you just need to specify any character at all must be in a certain position. For that, another character is reserved: .
.
The following rules turns ⟨h⟩ to /ç/ when it comes before /u/ and two other character:
h > ç / _u..
Other times, you may need to specify that any number of certain characters can appear. Two more symbols are reserved: *
for 0 or more and +
for 1 or more.
The following rule turns ⟨y⟩ to /i/ when it comes before any number of optional /d/ and at least one /o/ before the end of a word:
y > i / _ d* o+ ^
These two symbols can also be used with categories. Note that the spaces here will be ignored and are just for readability; the parser ignores all spaces.
There is also a way to specify that something is optional without necessarily being allowed to repeat indefinitely. The character ?
is reserved for this.
The following rule changes ⟨n⟩ to /ŋ/ before a single optional /g/ or /k/ followed by the end of the word.
n > ŋ / _ {g, k}? ^
As you can see, this works with categories as well.
In rare cases, a grapheme or sequence of graphemes may always be pronounced multiple times. For these cases, we can use _
in the substitution to represent the entire captured pattern.
This is also useful in some cases for auto-inflection rules.
The following rule turns /er/ into /erer/ wherever it comes in the middle of a word:
er > __ / ._.
When writing rules for auto-inflections, you may need to add prefixes and suffixes to a word. You can write such rules using an insertion format. (For these, #
typically looks better than ^
).
Adds a suffix:
/ate/._#
Adds a prefix:
/eta/#_.
Alternatively, you can replace the word boundary itself. Equivalent to the rules above:
# > ate / ._
# > eta / _.