-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Use muan/emojilib dataset #286
Comments
I think the aliases that are listed in muan/emojilib are too broad for this project. |
I don't know if you saw this, but our database is actually just a python-dict. We recently needed to compress the dict into a single line, which makes it unreadable, but you can look at an older version to see how everything is stored: Extending the dict with custom aliases is possible during runtime. See #268 (comment) on how to add a single alias. So it would be easy to just load the JSON file from muan/emojilib and add aliases. Regarding demojize, there is also the function def repl(emj, emj_data):
name_list = [emj_data['en']]
if 'alias' in emj_data:
name_list += emj_data['alias']
# Here you could also add aliases from muan/emojilib
# just look up `emj` in their json data
return "_".join(name_list)
print(emoji.replace_emoji('Test 🤗', replace=repl))
# Outputs: Test :smiling_face_with_open_hands:_:hugging_face:_:hugs:
# In the repl function:
# emj = "🤗"
# emj_data = {
# 'match_start': 5,
# 'match_end': 6
# 'en': ':smiling_face_with_open_hands:',
# 'status': 2,
# 'E': 1,
# 'alias': [':hugging_face:', ':hugs:'],
# 'de': ':gesicht_mit_umarmenden_händen:',
# 'es': ':cara_con_manos_abrazando:',
# ...} |
Oh awesome stuff! Thank you so much for that! I guess the remaining question is would you like me to open a pr to merge any of the other aliases? I see myself using this functionality in a couple of downstream repos and it seems a bit silly to write a wrapper library to include this if it would be useful here too? Also as a general question, how come the data is directly in python? I'm assuming this has a performance benefit? Thanks again for your help :) |
Did you already do anything or is it just a plan at this point? I am not so sure it is feasible. As I said, the aliases need to be unique, one alias can only belong to one emoji. For each alias that has multiple meanings in muan/emojilib you would have to decide to which emoji it should belong. Presumably you would have to do this manually for each emoji.
It was already directly in Python when I started contributing to this project and the original developers are no longer contributing, so I don't know. There is a performance benefit compared to a JSON file, but it is not that big a difference (at least with newer Python versions). I am thinking about moving to JSON and also splitting the file into several smaller files. I recently did a comparison between the python-dict and JSON: #280 (comment) |
FYI there is a proposed major change in keywords in muan/emojilib, see I am thinking it might be better to include muan/emojilib keywords as a separate entry as '\U0001F917': {
'en': ':smiling_face_with_open_hands:',
'status': fully_qualified,
'E': 1,
'keywords': ['hugging_face', 'face', 'smile', 'hug'],
'alias': [':hugging_face:', ':hugs:'],
'de': ':gesicht_mit_umarmenden_händen:',
'es': ':cara_con_manos_abrazando:',
...
}, An then add a function to retrieve them - as you suggested Btw we also use a script to update the emoji and aliases. The aliases specifically are added here: emoji/utils/get_codes_from_unicode_emoji_data_files.py Lines 586 to 601 in ceddc11
Adding a new entry keywords to this script would be simple.
|
Hi thanks for getting back to me on this and apologies for the silence for a few months
I've not had the oppotunity to look at this unfortunately, happy to help however I can though - appreciate this may be too little too late - as I should have some more free time
Makes sense tbh, and yeah I wonder if that'll help with some of the maintaining stuff? But yeah the perf improvements in python have helped a lot. I guess one option is a ci/cd step which glues together a load of json files and wraps them in some python for the best of both worlds?
Thanks again :) |
I forgot about this project, I had started implementing a JSON solution in April. One JSON file for each language and the possibility to extend it with custom data like this emojilib. As far as I remember my implementation was almost ready, I'll try to find some time in the next weeks for a pull request. |
I just realized that extending the database with custom data is not as simple as a I thought. I guess the solution is to use a class/object to keep separate databases, something like this (pseudo code): emoji_config = emoji.new_emoji_instance() # Create a new copy of the database
emoji_config.extend_database(custom_aliases) # Modify the new database
emoji_config.emojize(':a_custom_alias:') # Use emojize/demojize with the new database |
Background
I've fallen down a bit of a rabbit hole as I've been looking for ways to search emoji by plaintext which often calls for the use of aliasses to do so. For example: '🤗' isn't called 'hug' however, this is a useful alias (which this lib supports)
I innstalled element on my phone a few months ago for discussion on another project and saw that is has a really great search functionality for a ton of aliases that are not present in this lib For example, ':)' for '😊'. And started digging
Basically, they use the following python script during build to fetch the latest emoji and aliasses from a few other sources https://github.com/element-hq/element-android/blob/def2a8a83351c06cb65fdbd4d483ac811329b023/tools/import_emojis.py#L20
One of these is the dataset available from https://github.com/muan/emojilib which seems really good for this
The questions/ feature request
Would you accept a pr to add the aliasses from muan/emojilib to this project?
Also, I noticed that the demojise function only exposes the first alias if available so I've written my own implementation for a lib that returns a underscore seperated string of keywords. Is such a function (maybe called get_aliases) something you'd accept a pr for?
Thanks for your time and for the awesome project
The text was updated successfully, but these errors were encountered: