Replies: 1 comment
-
The only way to do this currently is indeed by creating a number of custom patterns like so: pos_patterns = [
[{'POS': 'ADJ'}, {'POS': 'NOUN'}],
[{'POS': 'NOUN'}], [{'POS': 'ADJ'}]
] and extend them with the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
RE: https://maartengr.github.io/BERTopic/api/representation/pos.html
I need to make sure I am properly differentiating prepositions:
if pos in ignored_words and pos_patterns:
ignored_words.remove(pos)
The issue here is I only want to exclude the prep if it is acting as an actual stopword, and not an integral part of any particular candidate. E.G. the word 'of':
'Secretary of State': candidate; 'I think that's one of her purses': non-candidate; especially if both of these examples are in the same document! In other words, PPN PREP PPN, but not PRN PREP PRN. NOUN PREP NOUN remains a grey area for now. I do not assume that any word or syntax in pos_patterns can simply be pointed to for this to work. Thanks!
somewhat related:
#933
Beta Was this translation helpful? Give feedback.
All reactions