This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval
Created by
Haebom
Author
Michael Gunther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Bo Wang, Sedigheh Eslami, Scott Martens, Maximilian Werk, Nan Wang, Han Xiao
Outline
Jina-embeddings-v4 is a multi-modal embedding model with 3.8 billion parameters that integrates text and image representations through a novel architecture. It supports both single-vector and multi-vector embeddings and uses a late-interaction approach. It integrates task-specific low-rank adaptation (LoRA) adapters to optimize performance in various retrieval scenarios (query-document retrieval, semantic text similarity, code retrieval, etc.). Comprehensive evaluations show that jina-embeddings-v4 achieves state-of-the-art performance on both single-modal and cross-modal retrieval tasks, and is particularly strong in handling visually rich content such as tables, charts, diagrams, and mixed media formats. We also introduce a new benchmark for visually rich image retrieval, Jina-VDR.
Takeaways, Limitations
•
Takeaways:
◦
Effectively integrate text and images with a 3.8 billion-parameter multimodal embedding model.
◦
Supports single-vector and multi-vector embeddings, enabling application to various search scenarios.
◦
Performance optimization for a variety of tasks is possible through LoRA adapters.
◦
Demonstrates strength in processing visually rich content and achieves cutting-edge performance.
◦
Introducing Jina-VDR, a new benchmark for visually rich image retrieval.
•
Limitations:
◦
The paper lacks specific Limitations or reference to future research directions.
◦
Additional information is needed on the details and reliability of the Jina-VDR benchmark.
◦
Further analysis is needed on the efficiency and generalization performance of LoRA adapters.