This extension to ASReview implements a multilingual feature extractor algorithm. This algorithm allows for the usage of records in multiple languages. These languages are:
Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, Turkish.
The extension implements sentence-transformers/distiluse-base-multilingual-cased-v1
.
This is a sentence-transformers model and maps sentences to a 512 dimensional dense
vector space and is multilingual. For more information about the feature extraction
method, see
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. ArXiv, abs/1908.10084. https://arxiv.org/abs/1908.10084
Install the multilingual feature extractor with:
pip install .
or
pip install git+https://github.com/asreview/asreview-multilingual-feature-extractor.git
ASReview LAB users can select the model in the Model Selection step of the project setup. Select "Multilingual Sentence Transformer" under "Feature extraction".
The new feature extractor Multilingual Sentence Transformer
is defined in
asreviewcontrib/models/distiluse-base-multilingual.py
and can be used in a simulation.
asreview simulate example_data_file.csv -e multilingual
Test the feature extractor with:
asreview simulate benchmark:van_de_Schoot_2017 -e multilingual -m svm
For any questions or remarks, please send an email to [email protected] or open an issue.