Skip to content

Can a keyword be excluded from topic labels? #917

Answered by MaartenGr
salderma asked this question in Q&A
Discussion options

You must be logged in to vote

Sure, you can use the CountVectorizer to decide how the words will be tokenized before ending up in the topic representation. Here, you can decide which words you want to include and exclude in the resulting topic representation. More specifically, we can view this exclusion as stopwords that should not be put in the topic labels. In other words, we can approach it like this:

from sklearn.feature_extraction.text import CountVectorizer
from bertopic import BERTopic

vectorizer_model = CountVectorizer(stop_words=a_list_of_keywords_i_want_to_exclude)

# Train a model 
topic_model = BERTopic(vectorizer_model=vectorizer_model)
topics, probs = topic_model.fit_transform(docs)

# If you want to u…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@salderma
Comment options

Answer selected by salderma
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants