How to fold back in new topics after redistributing outliers #711
-
Hi! First off I looove this library and can't believe it's something I can just download for free! My question is this, I followed this line from the faq about reducing the number of outlier topics using the following line: probability_threshold = 0.01
new_topics = [np.argmax(prob) if max(prob) >= probability_threshold else -1 for prob in probs] This works perfectly, however after doing this the order of the topics is no longer correct because after adding in the outliers the frequency of each topic changes significantly, and the frequency of words in each topic is no longer correct because after the outliers are added the frequencies can change a lot, this also affects the bar chart because it relies on selecting the n most frequent words per topic. So I was just wondering if there was a way to tell the model about the What i've tried but doesn't seem to work: df2 = pd.DataFrame({"Topic": new_topics, "Document": data})
model._update_topic_size(df2)
model._sort_mappings_by_frequency(df2) Would appreciate any insight! Thanks :) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
Although it is possible to add Having said that, as of version v0.12, you can use |
Beta Was this translation helpful? Give feedback.
Although it is possible to add
new_topics
back into the model, there are a number of things that you should be careful of. First, when you assign these outliers to different topics it is difficult for the model to track which topics should be mapped to one another. For example, if you have two outliers but one is assigned to topic 1 and the other to topic 5 there is no fixed way to map one topic to another. Second, after updating the topics with your method, it is not possible to further merge topics since no clean mapping between topics will then exist. This means that updating topics that way might prevent other features from working. Third, it is necessary that all unique topics intopics