Dataset | Structure | Size | Source |
---|---|---|---|
Capitals and countries | ca1 co1 ca2 co2 | 505 | Word2Vec |
Currency (and Countries) | cu1 co1 cu2 co2 | 866 | Word2Vec |
Cities and State | ci1 st1 ci2 st2 | 2,467 | Word2Vec |
(All) capitals and countries | ca1 co1 ca2 co2 | 4,523 | Word2Vec |
The task takes the quadruplets (v_1, v_2, v_3, v_4) and works on the first three vectors to predict the fourth one. Among all the vectors, the nearest to the predicted one is retrieved, where the closest vector is computed by the dot product.
def default_analogy_function(a, b, c){ return b - a + c }
The vector returned by the function (the predicted vector) gets compared with the top_k most similar ones. If the actual forth vector is among the top_k most similar ones, the answer is considered correct.
The analogy function to compute the predicted vector and the top_k value can be customised.
Metric | Range | Optimum |
---|---|---|
Accuracy | [0,1] | Highest |