Skip to content

Use dictionaries from several languages to create new words, by applying a Markov chain!

License

Notifications You must be signed in to change notification settings

moraesvic/neologism-generator

Repository files navigation

Note

This project was developed in October 2021 to improve my abilities with C, algorithms and data structures. It works as intended but several improvements could be made, such as: documentation, tests, code organization and performance. Please take it with a grain of salt.

What is this?

This is a neologism generator. A neologism is a freshly coined word. Although some creativity is involved in the creation of new words, as we properly credit the geniuses of Shakespeare, Joyce and Guimarães Rosa, neologisms follow phonological and morphological patterns which can be replicated by a mechanical process.

Starting from frequency words from natural languages, obtained from this blog referred by Wikipedia, we produce new words, that follow their patterns as a Markov (anamnesic) process. For example, the i-th letter in a word depends solely one the last N letters defined by the parameter "Trie-Depth". If you use a too low "Trie-Depth", you will end up with almost random words, as the next letter only depends on the last one or two. When the parameter goes up, we get words that look somewhat funny, but could eventually be found in the language, even if as a pun. With a high Trie-Depth, the words are probably proper words of the language, but were not in the original list because they were too specific (too many derivational affixes and so on).

Examples

English: generate words with a trie-depth of 3: joken, mattoo, worrow English: generate words with a trie-depth of 5: caughter, knowledged, bundless English: generate words with a trie-depth of 10: air-conditioner, understandings, unconvention, indistinguish, misinterpreter Portuguese: generate words with a trie-depth of 5: profissionado, massassinando, encontratou

Example in English

Example in English

Example in German

Example in German

Example in Portuguese

Example in Portuguese

How was it implemented?

The backend was implemented with C, with my own implementation of the Trie data structure. This makes it quite fast to process frequency lists. Most of the work is actually done trying to find a word which follows the given Markov process, but is not in the original list. Anyway, even in the worst reasonable scenarios, the server usually responds with only 100 MB RAM usage and 4 seconds delay.

The web-backend is implemented with NodeJS.

About

Use dictionaries from several languages to create new words, by applying a Markov chain!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published