-
Notifications
You must be signed in to change notification settings - Fork 21
Haiku (Cog)
This cog attempts to find haikus within certain messages and potentially respond if it finds one. It also contains the \syllables
command which returns the number of syllables in a word.
If you're from the future and just want to fix a single word, have a look at the "How to Fix Miscounts" section.
Haikus are detected by counting the number of syllables in each line using the _number_of_syllables_in_word
function (read below). Note that only non-bot messages in any channel specified by ALLOWED_CHANNEL_NAMES
are checked. It also calculates a probability of how likely the haiku is intentional or "good". Messages with haiku-like syllables (i.e. that can be turned into liens of 5-7-5) are given a default probability of HAIKU_BASE_PROBABILITY
. Having punctuation (or more specifically a non-alpha numeric character) at the end of a line increases the probability by an "index" given by HAIKU_PUNCTUATION_PROBABILITY_INCREASE
. The modification of the probability by the index is done by the _increased_probability
function (see below). Certain words can also increase the probabilities by indexes given in HAIKU_FAVOURITE_WORD_LIST
.
Haiku response probabilities can be updated to reflect how strongly the bot believes that a message is a good haiku. This is done by the function _increased_probability
which takes in a prior probability and an index. It then calculates the probability that index coin flips (each weighted with the prior probability) would have at least one success. This means that the probabilities will always be within the range 0 to 1 and also allows for improving or decreasing the posterior probability based upon the index. A graph of how this function works can be seen below.
Counting syllables is a non-definable problem and has many subjective qualities, so there are a few options of implementation. The aim of the cog is to try to count the syllables using as general as possible rules and without outsourcing the problem. This is so that it can be adapted to work with computer science jargon and UQCS slang. That being said, it would be nice to incorporate an external syllable dictionary and include this within the probability calculations.
Also, a YAML file is used over a json file purely because it allows comments. Comments are required for the context and examples given.
A general outline of the process of syllable calculation of a single word is as follows:
- Convert to lowercase and remove emotes.
- Fix any words that need an extra syllable based on accents (e.g. "résumé" has 3 syllables whereas "resume" has only 2 syllables).
- Replace or remove any "illegal" characters (i.e. non-aplhabetic characters). Accents are replaced with relevant letters and most other characters are removed. Special care is needed when removing "'s", as this may add another syllable (e.g. "church's").
- Check if the word is a special exception. These should be very limited.
- If the word has no vowels, treat it as an acronym, so each letter will count as a separate syllable.
- Check if there are any suffixes to remove. If the suffix leaves no vowels behind, do not remove it (e.g. "less"). For each group of consecutive vowels in the suffix, add a syllable to the count. Some suffixes (as specified in the YAML) will need an extra syllable or one less syllable added to the count.
- For each group of consecutive vowels in the word, add an extra syllable to the count. This is the most fundamental step.
- Remove any terminating "s" character, and add a syllable for certain word endings (e.g. "ges" for words like "ages").
- Apply general suffix rules. Currently, this only applies to words ending with "ed" or a suspected terminal "silent e".
- Remove a syllable for words ending in "Xed" where "X" is a consonant that isn't "t" or "d". The string "Xed" will add a syllable (during step 7), but this is not pronounced as a syllable (e.g. "flipped"). Thus, we subtract 1 from the syllable count. ii) If a word ends in "Xe" where "X" is a consonant and the word contains a vowel other than this terminal "e", then it is likely this is a "silent e" (i.e. does not contribute to a syllable). Thus, we subtract 1 from the syllable count.
- Apply general prefix rules. Currently this only applies to words starting with "reX" where "X" is a vowel, by adding another syllable. This is because words like "reapply" are missing a syllable within the count (from step 7).
- Apply the more specific rules (given in the YAML file) for both prefixes and suffixes. These are for affixes that step 7 gets wrong. Often these should be changed if a word has the incorrect number of syllables. Note that only one rule of each category can be applied to each word. This is why the more general rules exist.
All the smaller exceptions for syllable counting are kept in uqcsbot/static/syllable_rules.yaml
. When a word does not seem to produce the right number of syllables, try similar words to find the core issue. It is much better to find the root cause, and apply an exception to many words.
For most corrections you will need to modify a list starting with prefixes_needing
or suffixes_needing
, with the exact list depending on whether the bot is currently over-counting or under-counting the number of syllables. Even if the exception is an entire word, we will treat it as a prefix or suffix, as this allows for many variants of the word. If you can't decide whether a prefix or suffix is more appropriate, use a prefix rule, as these tend to be more effective.
Note that only one rule of each list can apply to a word. If an exception needs to be applied to a larger category (so that the lists can be for finer adjustments), consider modifying the general rules within haiku.py
.
Once you have made an exception, make sure to detail why you made it and potentially list some examples. Also include your exceptional word as a new test case in tests\test_haiku.py
. Also feel free to add test cases of related words or words that are similar but should not have the exception applied to.