Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval

Created by
  • Haebom

Author

Michael Gunther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Bo Wang, Sedigheh Eslami, Scott Martens, Maximilian Werk, Nan Wang, Han Xiao

Outline

Jina-embeddings-v4 is a multi-modal embedding model with 3.8 billion parameters that integrates text and image representations through a novel architecture. It supports both single-vector and multi-vector embeddings and uses a late-interaction approach. It integrates task-specific low-rank adaptation (LoRA) adapters to optimize performance in various retrieval scenarios (query-document retrieval, semantic text similarity, code retrieval, etc.). Comprehensive evaluations show that jina-embeddings-v4 achieves state-of-the-art performance on both single-modal and cross-modal retrieval tasks, and is particularly strong in handling visually rich content such as tables, charts, diagrams, and mixed media formats. We also introduce a new benchmark for visually rich image retrieval, Jina-VDR.

Takeaways, Limitations

Takeaways:
Effectively integrate text and images with a 3.8 billion-parameter multimodal embedding model.
Supports single-vector and multi-vector embeddings, enabling application to various search scenarios.
Performance optimization for a variety of tasks is possible through LoRA adapters.
Demonstrates strength in processing visually rich content and achieves cutting-edge performance.
Introducing Jina-VDR, a new benchmark for visually rich image retrieval.
Limitations:
The paper lacks specific Limitations or reference to future research directions.
Additional information is needed on the details and reliability of the Jina-VDR benchmark.
Further analysis is needed on the efficiency and generalization performance of LoRA adapters.
👍