Sign In

Google unveils Gemini, a model that outperforms GPT-4

Haebom
Gemini has now been applied directly to Bard.AI technology has become a key turning point that transforms human life. Google's Gemini AI is one of the latest technologies leading this change— it's a multimodal AI model capable of understanding and processing various types of information, such as text, images, audio, and video.
In its tech report, Gemini claimed to outperform GPT-4, the most powerful foundation model, and released experimental results showing that it not only excels at text generation but also leads in multimodal understanding and processing. In addition to a single model,they introduced three models by size: Gemini Ultra, Gemini Pro, and Gemini Nano,and have disclosed Nano's parameter counts publicly—Nano-1 at 1.8B and Nano-2 at 3.25B. It really looks like a true sLM.
Gemini_tech_report_Resized.pdf1.66MB

Confidence in its performance

Text processing abilities

On the MMLU benchmark spanning 57 subjects, Gemini Ultra achieved90.0% performance, surpassing human expert level.
In the same test, OpenAI'sGPT-4 achieved 86.4%, a bit lower than Gemini Ultra. Even on Big-Bench Hard for complex mathematical reasoning, Gemini Ultra edged out GPT-4 with 83.6% to 83.1%.

Multimodal processing abilities

In image understanding, Gemini Ultra scored 77.8% , slightly higher than GPT-4V's 77.2%.
In document understanding,Gemini Ultra achieved 90.9%, outperformingGPT-4V's 88.4%.

Points of note

Multimodal Understanding: Gemini AI surpasses current state-of-the-art models in multimodal comprehension, demonstrating the ability to understand and solve problems within images without needing help from OCR systems.
Code generation: It can produce high-quality code in popular programming languages like Python, helping developers launch apps and improve services faster and more efficiently.

Key features by model size

Gemini Ultrais the largest model, delivering the most powerful performance for handling complex tasks.
Extremely complex tasks: Gemini Ultra is engineered to handle very complex challenges and excels in this segment. It achieves cutting-edge results on several major benchmarks.
Multimodal understanding: As a multimodal model, Gemini Ultra shows powerful performance in comprehending and reasoning over diverse data types like text, images, audio, and video.
Scale and efficiency: Trained on large-scale TPUv4 accelerators, it is optimized for highly efficient large-scale operations.
Cutting-edge capability: Gemini Ultra reached an impressive 90.04% accuracy on the MMLU benchmark, and also demonstrates strong performance in other domains like mathematics and coding.
Gemini Prois a model designed for efficient scalability across a broad array of tasks.
Scalable for various tasks: Gemini Pro is best suited to scale over diverse tasks. Thanks to its infrastructure and training algorithms, it allows rapid pre-training with fewer resources than Gemini Ultra.
Optimized performance: It offers finely-tuned performance for diverse AI tasks, making it ideal for enterprise clients and developers building and scaling AI.
Versatility: While not as large as Gemini Ultra, Gemini Pro delivers similar performance and operates more efficiently.
Gemini Nanois the smallest model, designed for efficient on-device operations.
Efficiency for on-device tasks: The Nano model is built for on-device deployment, prioritizing speed and efficiency.
Small but mighty: Despite its compact size, the Nano model shows impressive results on tasks like summarization and reading comprehension.
Accessibility: Gemini Nano models, equipped to run on diverse platforms and devices, make advanced AI features more approachable.
Gemini AI opens a new chapter in Google's AI technology advancement. With outstanding performance across everything from text to multimodal domains and the ability to understand and process complex information efficiently, it shines a bright light on the future of AI. It is expected to bring significant value to everyone who uses AI.

Launch plan

Gemini Pro
We're bringing Gemini to billions worldwide through Google products.

Starting today, Bard uses a refined version of Gemini Proto offer even more advanced reasoning, planning, and understanding. This marks Bard’s biggest upgrade since its launch.
Available in English in over 170 countries and regions, with plans to soon expand to other modalities, languages, and locations.

Gemini Nano
Gemini runs right on your smartphone

The Pixel 8 Pro is the first smartphone designed to run Gemini Nano,and it brings new features to Smart Reply in Gboard—starting with the Recorder app's 'Summary' feature and WhatsApp.
We're planning to add support for even more messaging apps next year.

More products and services

Within the next few months, Gemini will be added to more Google products and services—like Search, Ads, Chrome, and Duet AI.It's already been trialed in Search, reducing latency by 40% and improving quality for English searches in the U.S.

Access for developers and business users

Starting from December 13, 2023, developers and enterprise customers will be able to access Gemini Pro in Google AI Studio or Google Cloud Vertex AI. Google AI Studio is a free, web-based development tool for quickly prototyping and launching apps with your API key. Vertex AI is a fully-managed AI platform that allows you to tailor Gemini with user data controls and additional Google Cloud capabilities.

Android developers

Android developers can now build with Gemini Nano, the most efficient for on-device tasks, thanks to a new system feature called AICore in Android 14.
Check out the full Gemini keynote in the video below. 2024 looks like it’s going to bring even bigger shifts.
Subscribe to 'haebom'
📚 Welcome to Haebom's archives.
---
I post articles related to IT 💻, economy 💰, and humanities 🎭.
If you are curious about my thoughts, perspectives or interests, please subscribe.
haebom@kakao.com
Subscribe
2
Haebom
AlphaCode2_Tech_Report.pdf650.85KB
See latest comments