Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Comparing Contrastive and Triplet Loss: Variance Analysis and Optimization Behavior

Created by
  • Haebom

Author

Donghuo Zeng

Outline

This paper theoretically and empirically compares the representational quality of contrastive loss and triplet loss, widely used in deep metric learning. Focusing on intra- and inter-class variance and optimization behavior (e.g., greedy update), we conduct task-specific experiments on synthetic data and real-world datasets such as MNIST and CIFAR-10. We find that triplet loss maintains greater intra- and inter-class variance, supporting fine-grained distinctions. Contrastive loss, on the other hand, tends to compress intra-class embeddings, obscuring subtle semantic differences. Furthermore, by analyzing the loss-decay rate, activity ratio, and gradient norm, we demonstrate that contrastive loss induces many small initial updates, while triplet loss generates fewer but more robust updates that facilitate learning on challenging examples. Results from classification and retrieval tasks on the MNIST, CIFAR-10, CUB-200, and CARS196 datasets show that triplet loss outperforms.

Takeaways, Limitations

Takeaways:
Triplet loss can be used to preserve details, and contrastive loss can be used for smoother and more comprehensive embedding refinement.
Triplet loss tends to focus on hard samples.
Contrastive loss induces many small updates in the early stage.
Limitations:
The presented study is limited to specific datasets (MNIST, CIFAR-10, CUB-200, CARS196), and further research is needed to determine generalizability to other datasets.
There is a lack of in-depth analysis of the impact of specific optimization algorithms (e.g., Adam) and performance variations depending on hyperparameters.
No exploration has been made of combinations or variations of the two loss functions.
👍