You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it seems that the stopword list does not handle contractions well, such as "we've", "they're", etc. These are common in spoken language. Is there a recommended way to preprocess a corpus to check and replace contractions, or a way to enable specifically removing them?
I see them come up in my topic FREX words as "weve" or "theyr" so perhaps the order of punctuation removal and stemming matters, too.
Long term, maybe it would be great to have a spoken language option for prepDocuments() that can handle these cases (and others).
The text was updated successfully, but these errors were encountered:
it seems that the stopword list does not handle contractions well, such as "we've", "they're", etc. These are common in spoken language. Is there a recommended way to preprocess a corpus to check and replace contractions, or a way to enable specifically removing them?
I see them come up in my topic FREX words as "weve" or "theyr" so perhaps the order of punctuation removal and stemming matters, too.
Long term, maybe it would be great to have a spoken language option for
prepDocuments()
that can handle these cases (and others).The text was updated successfully, but these errors were encountered: