Get top n words that are nearest to cluster centroid #16

fkolokathi · 2017-11-22T11:08:02Z

I cannot understand how by taking the indices of the words with max tf-idf per cluster center, you find the top words that are nearest to cluster centroid.Moreover, I want to ask you, cluster centroid is the center of each cluster?

PabloRR100 · 2019-11-21T22:43:46Z

In this regard I have another question.
If you are clustering synopses (therefore films), the centroid should represent a "fake" film, not a fake word. The points closer to the center should be the closest films, but no the closets words to the film right?

brandomr · 2019-11-21T22:57:33Z

@fkolokathi @PabloRR100 apologies, I haven't had a chance to look back at this in quite some time. In regards to @fkolokathi's question--I'm not sure beyond words what else would comprise the cluster centroid? As @PabloRR100 points out, the centroid is really a "fake film synopsis", not a fake word.

@PabloRR100 I think you're correct if my memory serves. Do you have any suggestions for how things could be improved for clarity?

PabloRR100 · 2019-11-22T08:02:21Z

Thank you so much for replaying @brandomr.
I am making my head around this since I have a bunch of documents that I want to cluster and then plot a WordCloud of the most relevant words around it. So essentially the same use-case. I was using this "closeness" to the center before to give the importance for the Wordcloud.

What do you think about using the k words with the highest IDF, considered as most important for the list of documents (or some metric using an average(TF) across documents and the IDF) for the words that appear in the documents of the cluster as their importance for the Wordcloud?

brandomr · 2019-11-22T17:16:04Z

@PabloRR100 I think that makes sense. I'd definitely spot check things to ensure that the results you are seeing are actually logical.

You might check out this paper on vennclouds and the associated repo that automatically generates dynamic word clouds comparing documents. That methodology might be useful for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get top n words that are nearest to cluster centroid #16

Get top n words that are nearest to cluster centroid #16

fkolokathi commented Nov 22, 2017

PabloRR100 commented Nov 21, 2019

brandomr commented Nov 21, 2019

PabloRR100 commented Nov 22, 2019

brandomr commented Nov 22, 2019

Get top n words that are nearest to cluster centroid #16

Get top n words that are nearest to cluster centroid #16

Comments

fkolokathi commented Nov 22, 2017

PabloRR100 commented Nov 21, 2019

brandomr commented Nov 21, 2019

PabloRR100 commented Nov 22, 2019

brandomr commented Nov 22, 2019