ldig (Language Detection with Infinity Gram)

This is a prototype of language detection for short message service (twitter). with 99.1% accuracy for 17 languages

About this fork

In this fork we just add some conveniences to enable the usage of this as a library in python programs.

Changes include an updated .gitignore so that it ignores unpacked models, __init__.py file so that it can be treated as a module, added a ldig_standalone.py file with a convenient class to detect language on text and not on files, updated model file permissions and a setup.py to install it easily.

The original c++ branch, is merged with the original master for convenience too, since it takes away nothing from the python point of view, yet adds an extra C++ version.

All real work was done by the author of the original, Nakatani Shuyo / Cybozu Labs Inc. under a MIT License (see below or at https://github.com/shuyo/ldig).

Usage

Extract model directory tar xvzf models/[select model archive]
Detect ldig.py -m [model directory] [text data file]

Data format

As input data, Each tweet is one line in text file as the below format.

[label]\t[some metadata separated '\t']\t[text without '\t']

[label] is a language name alike en, de, fr and so on. It is also optional as metadata. (ldig doesn't use metadata and label for detection, of course :D)

The output data of ldig is as the below.

[correct label]\t[detected label]\t[original metadata and text]

Estimation Tool

ldig has a estimation tool.

./server.py -m [model directory]

Open http://localhost:48000 and input target text into textarea. Then ldig outputs language probabilities and feature parameters in the text.

Supported Languages

cs Czech
da Dannish
de German
en English
es Spanish
fi Finnish
fr French
id Indonesian
it Italian
nl Dutch
no Norwegian
pl Polish
pt Portuguese
ro Romanian
sv Swedish
tr Turkish
vi Vietnamese

Documents

Copyright & License

All codes and resources are available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
ldig		ldig
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ldig (Language Detection with Infinity Gram)

About this fork

Usage

Data format

Estimation Tool

Supported Languages

Documents

Copyright & License

About

Releases

Packages

Languages

antoniomo/ldig

Folders and files

Latest commit

History

Repository files navigation

ldig (Language Detection with Infinity Gram)

About this fork

Usage

Data format

Estimation Tool

Supported Languages

Documents

Copyright & License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages