Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Word Expression Tokenization #3

Open
Fennec2000GH opened this issue Sep 13, 2021 · 0 comments
Open

Multi-Word Expression Tokenization #3

Fennec2000GH opened this issue Sep 13, 2021 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@Fennec2000GH
Copy link
Collaborator

Description

Enable rule-based tokenization that regroups neighboring tokenized terms that logically belong together under the same entities. Think compound words or full names.

Objectives

  1. Edit tokenization functions to allow a variable number of parameters to allow for specific rules and exceptions during tokenization.
@Fennec2000GH Fennec2000GH added the enhancement New feature or request label Sep 13, 2021
@Fennec2000GH Fennec2000GH added this to the Preprocessing milestone Sep 13, 2021
@Fennec2000GH Fennec2000GH changed the title Multi-Word Tokenization Multi-Word Expression Tokenization Sep 13, 2021
Patrick-Lapid added a commit that referenced this issue Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant