Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use your library? #4

Open
it19862 opened this issue Oct 14, 2018 · 6 comments
Open

How to use your library? #4

it19862 opened this issue Oct 14, 2018 · 6 comments

Comments

@it19862
Copy link

it19862 commented Oct 14, 2018

Could you give a small example of using your library?

win 7x64
vs - 2017

Installed "ntextcat" through "nuget"
I need to determine the language of the text that is entered in "textBox2.Text".
Result output in "textBox1.Text"
It is supposed to enter the text: European languages, languages ​​with hieroglyphs (Chinese, Japanese) and others

Found sample code.
I get a string error
var identifier = factory.Load("NTextCat 0.2.1.1\\LanguageModels\\Core14.profile.xml");

cod

using NTextCat;

namespace rsh
{
    public partial class Form2 : Form
    {
        public Form2()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {
            var factory = new RankedLanguageIdentifierFactory();
            var identifier = factory.Load("NTextCat 0.2.1.1\\LanguageModels\\Core14.profile.xml");
            var languages = identifier.Identify(textBox2.Text);
            var mostCertainLanguage = languages.FirstOrDefault();

            textBox1.Text = mostCertainLanguage.Item1.Iso639_3;
        }
    }
}

How to solve the problem?

2018-10-14_18-48-10

@mohammad-khoddami
Copy link

How to detect unsupported languages text as unknown, not to another language.
for example "Aţi văzut ce moacă a făcut?" is Romanian, but NTextCat detects it as English.

@ivanakcheurov
Copy link
Owner

I don't understand the problem from the description. If your code works correctly, then identifier would contain the language code (for example, eng for English).
Perhaps you get an error and could post its screenshot?

@ivanakcheurov
Copy link
Owner

@mohammad-khoddami , you can assess how confident NTextCat is with the language tag.

var factory = new RankedLanguageIdentifierFactory();
var identifier = factory.Load("Core14.profile.xml");
var languages = identifier.Identify("some text");
var mostCertainLanguage = languages.FirstOrDefault();

var languageCode = mostCertainLanguage.Item1.Iso639_3;
var confidenceLevel = mostCertainLanguage.Item2;

@diegosasw
Copy link

diegosasw commented Apr 12, 2022

How is the confidence level measured?
I get values like 3495.569 for a long Spanish text that is detected properly

But I get values like 3924.144 for text in Czech which is incorrectly detected as English

Nechť již hříšné saxofony ďáblů rozezvučí síň úděsnými tóny waltzu, tanga a quickstepu.

or 3928.28 for text in Bulgarian which is incorrectly detected as Russian

Ах чудна българска земьо, полюшвай цъфтящи жита.

I suppose the models are not too accurate?


I've tried with Wiki82.profile.xml and Wiki280.profile.xml and I get better results with Wiki82.profile.xml because with Wiki280.profile.xml the texts are often detected as aa.

One thing I've noticed is that the detected language ISO code is not correct. With Core14.profile.xml I get 3 digits code properly in mostCertainLanguage.Item1.Iso639_3 but when using Wiki82.profile.xml or Wiki280.profile.xml I get 2 letter code there (which is incorrect).

@andreyka26-git
Copy link

@ivanakcheurov

Hello, thank you very much for your work.

May I ask about the profiles as well?

  1. As was asked above, what the weight numbers mean? As I understood the closer they to 4000 the less accurate they are, but what is the point after which we can consider them as accurate? > 3700, > 3500?

  2. I'm using wiki82.profile.xml, and sometimes I'm getting "simple" or "new" language as a result from pure english text. What do they mean?

image

@diegosasw
Copy link

I suppose this library is abandoned. Any luck @andreyka26-git ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants