Skip to content

Latest commit

 

History

History
146 lines (139 loc) · 4.88 KB

README.md

File metadata and controls

146 lines (139 loc) · 4.88 KB

AraNet: A Deep Learning Toolkit for Arabic Social Media

drawing

AraNet, a deep learning toolkit for a host of Arabic social media processing. AraNet predicts age, dialect, gender, emotion, irony, and sentiment from social media posts. It delivers either state-of-the-art or competitive performance on these tasks. It also has the advantage of using a unified, simple framework based on the recently-developed BERT model. AraNet has the potential to alleviate issues related to comparing across different Arabic social media NLP tasks, by providing one way to test new models against AraNet predictions (i.e., model-based comparisons). AraNet can be used to make important discoveries about the Arab world, a vast geographical region of strategic importance. It can enhance also enhance our understating of Arabic online communities, and the Arabic digital culture in general.

How to install

  • Using pip
 pip install git+https://github.com/UBC-NLP/aranet
  • Clone and install
 git clone https://github.com/UBC-NLP/aranet
 cd aranet
 pip install .

Download models

How to use

You can easily add AraNet in your code

load the model

from aranet import aranet
dialect_obj = aranet.AraNet(model_path)

predict one sentance

dialect_obj.predict(text=text_str)

Load from file/batch

dialect_obj.predict(path=file_path)

You can use AraNet from Terminal

!python ./aranet/aranet.py \
    --path model_path \
    --batch file_path

Examples

Dialect

#load AraNet dialect model
model_path = "./models/dialect_aranet/"
dialect_obj = aranet.AraNet(model_path)
text_str="انا هاخد ده لو سمحت"
dialect_obj.predict(text=text_str)

[('Egypt', 0.9993844)]

text_str="العشا اليوم كان عند الشيخ علي حمدي الحداد ، لمؤخذة بقى على الخيانة ، ايش مشاك غادي"
dialect_obj.predict(text=text_str)

[('Libya', 0.763)]

text_str ="يعيشك برقا"
dialect_obj.predict(text=text_str)

[('Tunisia', 0.998887)]

Sentiment

#load AraNet sentiment model
model_path = "./models/sentiment_aranet/"
senti_obj = aranet.AraNet(model_path)
text_str ="ما اكره واحد قد هذا المنافق"
senti_obj.predict(text=text_str)

[('neg', 0.8975404)]

text_str ="يعيشك برقا"
senti_obj.predict(text=text_str)

[('pos', 0.747435)]

Emotion

#load AraNet emotion model
model_path = "./models/emotion_aranet/"
emo_obj = aranet.AraNet(model_path) 
text_str ="الله عليكي و انتي دائما مفرحانا"
emo_obj.predict(text=text_str)

[('happy', 0.89688617)]

text_str ="لم اعرف المستحيل يوما"
emo_obj.predict(text=text_str)

[('trust', 0.27242294)]

Gender

#load AraNet gender model
model_path = "./models/gender_aranet/"
gender_obj = aranet.AraNet(model_path)
text_str ="الله عليكي و انتي دائما مفرحانا"
gender_obj.predict(text=text_str)

[('female', 0.8405795)]

Load from file/batch

input_text file: sentance a line, for example
--------------
انا هاخد ده لو سمحت
العشا اليوم كان عند الشيخ علي حمدي الحداد ، لمؤخذة بقى على الخيانة ، ايش مشاك غادي
----------------
model_path = "./models/dialect_aranet/"
dialect_obj = aranet.AraNet(model_path)
dialect_obj.predict(path=file_path)

[('Egypt', 0.9993844), ('Libya', 0.76300025)]

Inquiries?

If you have any questions about this dataset please contact us @ muhammad.mageed[at]ubc[dot]ca.


Reference/Citation:

Please cite our work:

@inproceedings{abdul-mageed-etal-2020-aranet,
    title = "{A}ra{N}et: A Deep Learning Toolkit for {A}rabic Social Media",
    author = "Abdul-Mageed, Muhammad  and Zhang, Chiyu  and Hashemi, Azadeh  and Nagoudi, El Moatez Billah",
    booktitle = "Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resource Association",
    url = "https://www.aclweb.org/anthology/2020.osact-1.3",
    pages = "16--23",
    language = "English",
    ISBN = "979-10-95546-51-1",
}