Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stm handling contractions #293

Open
val-pf opened this issue Oct 29, 2024 · 0 comments
Open

stm handling contractions #293

val-pf opened this issue Oct 29, 2024 · 0 comments

Comments

@val-pf
Copy link

val-pf commented Oct 29, 2024

it seems that the stopword list does not handle contractions well, such as "we've", "they're", etc. These are common in spoken language. Is there a recommended way to preprocess a corpus to check and replace contractions, or a way to enable specifically removing them?
I see them come up in my topic FREX words as "weve" or "theyr" so perhaps the order of punctuation removal and stemming matters, too.
Long term, maybe it would be great to have a spoken language option for prepDocuments() that can handle these cases (and others).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant