Chat bot that translates Japanese words into English using JMdict.
You can talk to the bot right now with Telegram The bot's handle is @TranslateJPBot. The bot will try to translate any text that you send to it.
The bot parses the Jmdict XML file and inserts each word, in both hiragana and kanji forms, into a Red Black Tree. Because of this, finding a word is a O(log(n)) operation. Once the tree is ready it spins up a Scotty server exposing a single endpoint for the telegram Webhook. When telegram posts an update, it tries to reply with a translation.
There's still a lot of room for improvement. Here's a list of current limitations that I want to change in the future.
- Telegram is the only platform supported. Eventually it should support other popular chat apps like Facebook or Discord.
- The bot can't translate sentences, let alone verb/adjective conjugations. The bot lacks an algorithm to map conjugated forms to dictionary entry forms, as well as an algorithm to identify each word in a sentence. Right now it only works if you pass the word exactly as it is on a dictionary.
- The bot is memory intensive. Since it loads the whole Japanese dictionary into a binary tree in memory, it uses about 2 GB of RAM when running.
Clone with git and build with cmake and Stack
git clone https://github.com/GAumala/TranslateJPBot
make
stack setup
stack build
Run the server with:
TELEGRAM_TOKEN=<MY_SECRET_TOKEN> stack exec bot
If you already have an nginx server setup with SSL, you can easily deploy the bot by adding a new location
to your existing server
block.
# /etc/nginx/nginx.conf
server {
# Existing configuration...
+ location /telegram/ {
+ proxy_pass http://localhost:4000;
+ }
}
After that, you need to register a webhook to the Telegram API. We use the secret token as part of the webhook URL to avoid malicious attackers to try to talk to the bot. To register the url just use curl
curl -F “url=https://<YOURDOMAIN.EXAMPLE>/telegram/<MY_SECRET_TOKEN>" https://api.telegram.org/bot<MY_SECRET_TOKEN>/setWebhook
That's it! You're done! If you want to test that the server is running correctly you can modify test/serverTests.sh
to point to your server and run the tests.
token=$(printenv TELEGRAM_TOKEN)
-host=http://localhost:4000
+host=https://<YOURDOMAIN.EXAMPLE>
url=$host/telegram/$token
If you see status 200 on each request, then the bot is running correctly.