-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get top n words that are nearest to cluster centroid #16
Comments
In this regard I have another question. |
@fkolokathi @PabloRR100 apologies, I haven't had a chance to look back at this in quite some time. In regards to @fkolokathi's question--I'm not sure beyond words what else would comprise the cluster centroid? As @PabloRR100 points out, the centroid is really a "fake film synopsis", not a fake word. @PabloRR100 I think you're correct if my memory serves. Do you have any suggestions for how things could be improved for clarity? |
Thank you so much for replaying @brandomr. What do you think about using the k words with the highest IDF, considered as most important for the list of documents (or some metric using an average(TF) across documents and the IDF) for the words that appear in the documents of the cluster as their importance for the Wordcloud? |
@PabloRR100 I think that makes sense. I'd definitely spot check things to ensure that the results you are seeing are actually logical. You might check out this paper on vennclouds and the associated repo that automatically generates dynamic word clouds comparing documents. That methodology might be useful for you. |
I cannot understand how by taking the indices of the words with max tf-idf per cluster center, you find the top words that are nearest to cluster centroid.Moreover, I want to ask you, cluster centroid is the center of each cluster?
The text was updated successfully, but these errors were encountered: