-
Hi, Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Sure, you can use the from sklearn.feature_extraction.text import CountVectorizer
from bertopic import BERTopic
vectorizer_model = CountVectorizer(stop_words=a_list_of_keywords_i_want_to_exclude)
# Train a model
topic_model = BERTopic(vectorizer_model=vectorizer_model)
topics, probs = topic_model.fit_transform(docs)
# If you want to update an already trained model
topic_model.update_topics(docs, vectorizer_model=vectorizer_model) |
Beta Was this translation helpful? Give feedback.
Sure, you can use the
CountVectorizer
to decide how the words will be tokenized before ending up in the topic representation. Here, you can decide which words you want to include and exclude in the resulting topic representation. More specifically, we can view this exclusion as stopwords that should not be put in the topic labels. In other words, we can approach it like this: