Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocess Text: Add Spacy POS tagger #1070

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ajdapretnar
Copy link
Collaborator

Issue

Implements #596.

Description of changes

Add Spacy, first as a POS tagger because it is most sorely missed.

Later: implement Spacy for NER (also sorely missed). And for other NLP tasks (solo Spacy preprocessor).

Includes
  • Code changes
  • Tests
  • Documentation

@ajdapretnar
Copy link
Collaborator Author

The only thing left is to discuss the problem of additional dependencies in certain models (Chinese, Japanese, Russian and Ukrainian). Remove or somehow gracefully handle?

@VesnaT
Copy link
Contributor

VesnaT commented Jul 19, 2024

I get this, if the model is not installed.
image

def __getitem__(self, language: str) -> str:
model = find_model(language)
if model not in self.installed_models:
download(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.installed_models should be updated at this point. If not the package keeps getting downloaded.

@ajdapretnar
Copy link
Collaborator Author

So, the downloaded models are indeed packages. We have to warn the user that selecting a given language will install additional dependencies to the Orange environment (think about the wording).

@ajdapretnar ajdapretnar marked this pull request as draft August 29, 2024 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants