Stop words still in top_n_words of topics #933
Unanswered
DominikMann
asked this question in
Q&A
Replies: 1 comment 8 replies
-
Could you also share your code for instantiating the BERTopic model? Also, could you share some of the topics themselves that still have those custom stop words? After having trained your BERTopic model, you can inspect the vectorizer to see if the stopwords were properly added to the model with: topic_model.vectorizer_model.stop_words_ |
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am removing english and custom stop words with the CountVectorizer:
I am giving the
vectorizer_model
to the BERTopic model but still when inspecting the topics there are some with custom stop words as top_n_words. Is there a way to remove them (without the need to pre process the dataset by myself)?Beta Was this translation helpful? Give feedback.
All reactions