You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(word1, frequncy), (word2, frequency), ...
then trying to measure how far that distribution is from uniform
one simple nice way is entropy P(word1)*log(P(word1)) + P(word2)*log(P(word2)) + ...
where P(word1) is just frequency of word1 / total words
It's nice because it measures how "unpredictable" the signal is. If most words are zero, and only a few words are common, then it's predictable. Or, if all words are exactly the same, then it's predictable. But if it's crazy town, then it's not predictable.
From Bill:
This seems like a decent resource: http://normal-extensions.com/2013/08/04/entropy-for-n-grams/
The text was updated successfully, but these errors were encountered: