Using deep metric learning for imbalanced dataset #637

learnfromgroundup · 2023-06-04T11:07:00Z

learnfromgroundup
Jun 4, 2023

Hi Kevin! Thanks for creating such a powerful library!

I have a question regarding using deep metric learning for imbalanced dataset.
My dataset is 3-class and size of dataset is just around 1400 in total.
Class 0,1&2 accounts for 91%,4% & 5% of the dataset.
My task is image classification/retrieval.

I used trainWithClassifier with triplet loss, and used optuna with cross validation for hyper-parameter tuning.
During test time, i used KNN and majority voting to classify my test set.

Based on the confusion matrix, the test set results looks ok to me.
Yet when I look at the mean_average_precision from accuracyCalculator, the value looks pretty small, which is just around 0.035

Below is how i get the score:

def get_all_embeddings(dataset, trunk_model, embedder_model=None):
    tester = testers.BaseTester()
    return tester.get_all_embeddings(dataset, trunk_model=trunk_model, embedder_model=embedder_model)

def pml_testing(train_set, test_set, model, accuracy_calculator, embedder_model=None):
    """
    model: trunk model
    embedder: embedder model
    """
    train_embeddings, train_labels = get_all_embeddings(train_set, model, embedder_model)
    test_embeddings, test_labels = get_all_embeddings(test_set, model, embedder_model)
    train_labels = train_labels.squeeze(1)
    test_labels = test_labels.squeeze(1)
    accuracies = accuracy_calculator.get_accuracy(
        test_embeddings, test_labels, train_embeddings, train_labels, False
    )
    return accuracies

accuracy_calculator = AccuracyCalculator(include=("precision_at_1","mean_average_precision"), k=3, return_per_class=True)
accuracies = pml_testing(train_dataset, val_dataset, trunk, accuracy_calculator, embedder_model=embedder)
mean_average_precision = np.mean(accuracies["mean_average_precision"])

Do you have any intuition why the mean_average_precision is so low? Am I overfitting?
Although i can see clear cluster from the UMAP plot, I don't know if my embedding space is a good one because of the low mean_average_precision score...

Answered by KevinMusgrave

Jun 4, 2023

It's low because if there are say 1500 relevant items, and 1497 of them aren't retrieved, it gets penalized for that regardless of the value of k:

(from https://nlp.stanford.edu/IR-book/pdf/08eval.pdf)

Try setting k=None so that it finds all nearest neighbors instead of just the closest k.

View full answer

KevinMusgrave · 2023-06-04T11:28:12Z

KevinMusgrave
Jun 4, 2023
Maintainer

It's low because if there are say 1500 relevant items, and 1497 of them aren't retrieved, it gets penalized for that regardless of the value of k:

(from https://nlp.stanford.edu/IR-book/pdf/08eval.pdf)

Try setting k=None so that it finds all nearest neighbors instead of just the closest k.

0 replies

learnfromgroundup · 2023-06-04T15:32:56Z

learnfromgroundup
Jun 4, 2023
Author

Thanks for your answer, Kevin!

Just to clarify, do you mean that in my case, assuming my only test case is the image with label 0 (the major class) and there are 1500 train images with label 0, m_j is 1500 and Precision (R_jk) is 0 for k>3 in the MAP(Q) equation?

1 reply

KevinMusgrave Jun 4, 2023
Maintainer

Yes

learnfromgroundup · 2023-06-05T01:08:44Z

learnfromgroundup
Jun 5, 2023
Author

Great. Thanks a lot!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using deep metric learning for imbalanced dataset #637

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Using deep metric learning for imbalanced dataset #637

learnfromgroundup Jun 4, 2023

Replies: 3 comments · 1 reply

KevinMusgrave Jun 4, 2023 Maintainer

learnfromgroundup Jun 4, 2023 Author

KevinMusgrave Jun 4, 2023 Maintainer

learnfromgroundup Jun 5, 2023 Author

learnfromgroundup
Jun 4, 2023

Replies: 3 comments 1 reply

KevinMusgrave
Jun 4, 2023
Maintainer

learnfromgroundup
Jun 4, 2023
Author

KevinMusgrave Jun 4, 2023
Maintainer

learnfromgroundup
Jun 5, 2023
Author