Google unveils Gemini, a model that outperforms GPT-4

Haebom

Dec 7, 20233y ago

Gemini has now been applied directly to Bard.AI technology has become a key turning point that transforms human life. Google's Gemini AI is one of the latest technologies leading this change— it's a multimodal AI model capable of understanding and processing various types of information, such as text, images, audio, and video.

Hands-on with Gemini: Interacting with multimodal AI

Gemini is our natively multimodal AI model capable of reasoning across text, images, audio, video and code. This video highlights some of our favorite interactions with Gemini. Learn more and try the model: https://deepmind.google/gemini Explore our prompting approaches here: https://goo.gle/how-its-made-gemini For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity. Subscribe to our Channel: https://www.youtube.com/google Tweet with us on Twitter: https://twitter.com/google Follow us on Instagram: https://www.instagram.com/google Join us on Facebook: https://www.facebook.com/Google 0:00 Intro 0:19 Multimodal Dialogue 1:32 Multilinguality 2:04 Game Creation 2:31 Visual Puzzles 3:17 Making Connections 3:39 Image & Text Generation 4:06 Logic & Spatial Reasoning 4:55 Translating Visuals 5:27 Cultural Understanding

youtu.be

In its tech report, Gemini claimed to outperform GPT-4, the most powerful foundation model, and released experimental results showing that it not only excels at text generation but also leads in multimodal understanding and processing. In addition to a single model,they introduced three models by size: Gemini Ultra, Gemini Pro, and Gemini Nano,and have disclosed Nano's parameter counts publicly—Nano-1 at 1.8B and Nano-2 at 3.25B. It really looks like a true sLM.

Gemini_tech_report_Resized.pdf1.66MB

Gemini: Explaining reasoning in math and physics

Introducing Gemini — Google’s newest and most capable AI model. Gemini was trained to recognize and understand text, images, audio, and more at the same time, so it better understands nuanced information and can answer questions relating to complicated topics. This makes it especially good at explaining reasoning in complex subjects like math and physics. Join Google Interaction Designer Sam Cheung as she uses Gemini’s multimodal capabilities and sophisticated reasoning to check a handwritten homework sheet. Watch Gemini create customised explanations and practice questions to help test and expand her knowledge of physics. Check out more Gemini demos: https://goo.gle/4164rNO Find out more about Gemini: https://deepmind.google/gemini Read the blog post: https://goo.gle/3uRyug7 Subscribe to our Channel: https://www.youtube.com/google Tweet with us on Twitter: https://twitter.com/google Follow us on Instagram: https://www.instagram.com/google Join us on Facebook: https://www.facebook.com/Google

youtu.be

Confidence in its performance

Text processing abilities

•

On the MMLU benchmark spanning 57 subjects, Gemini Ultra achieved90.0% performance, surpassing human expert level.

•

In the same test, OpenAI'sGPT-4 achieved 86.4%, a bit lower than Gemini Ultra. Even on Big-Bench Hard for complex mathematical reasoning, Gemini Ultra edged out GPT-4 with 83.6% to 83.1%.

Multimodal processing abilities

•

In image understanding, Gemini Ultra scored 77.8% , slightly higher than GPT-4V's 77.2%.

•

In document understanding,Gemini Ultra achieved 90.9%, outperformingGPT-4V's 88.4%.

Points of note

•

Multimodal Understanding: Gemini AI surpasses current state-of-the-art models in multimodal comprehension, demonstrating the ability to understand and solve problems within images without needing help from OCR systems.

•

Code generation: It can produce high-quality code in popular programming languages like Python, helping developers launch apps and improve services faster and more efficiently.

Gemini: Unlocking insights in scientific literature

Introducing Gemini — Google’s newest and most capable AI model. Watch Google DeepMind Research Scientist Sebastian Nowozin and Software Engineer Taylor Applebaum use Gemini to read, understand and filter 200,000 scientific papers to extract crucial scientific information. All in a lunch break. Check out more Gemini demos: https://goo.gle/4164rNO Find out more about Gemini: https://deepmind.google/gemini Read the blog post: https://goo.gle/3uRyug7 Subscribe to our Channel: https://www.youtube.com/google Tweet with us on Twitter: https://twitter.com/google Follow us on Instagram: https://www.instagram.com/google Join us on Facebook: https://www.facebook.com/Google

youtu.be

Key features by model size

•

Gemini Ultrais the largest model, delivering the most powerful performance for handling complex tasks.

◦

Extremely complex tasks: Gemini Ultra is engineered to handle very complex challenges and excels in this segment. It achieves cutting-edge results on several major benchmarks.

◦

Multimodal understanding: As a multimodal model, Gemini Ultra shows powerful performance in comprehending and reasoning over diverse data types like text, images, audio, and video.

◦

Scale and efficiency: Trained on large-scale TPUv4 accelerators, it is optimized for highly efficient large-scale operations.

◦

Cutting-edge capability: Gemini Ultra reached an impressive 90.04% accuracy on the MMLU benchmark, and also demonstrates strong performance in other domains like mathematics and coding.

•

Gemini Prois a model designed for efficient scalability across a broad array of tasks.

◦

Scalable for various tasks: Gemini Pro is best suited to scale over diverse tasks. Thanks to its infrastructure and training algorithms, it allows rapid pre-training with fewer resources than Gemini Ultra.

◦

Optimized performance: It offers finely-tuned performance for diverse AI tasks, making it ideal for enterprise clients and developers building and scaling AI.

◦

Versatility: While not as large as Gemini Ultra, Gemini Pro delivers similar performance and operates more efficiently.

•

Gemini Nanois the smallest model, designed for efficient on-device operations.

◦

Efficiency for on-device tasks: The Nano model is built for on-device deployment, prioritizing speed and efficiency.

◦

Small but mighty: Despite its compact size, the Nano model shows impressive results on tasks like summarization and reading comprehension.

◦

Accessibility: Gemini Nano models, equipped to run on diverse platforms and devices, make advanced AI features more approachable.

Gemini AI opens a new chapter in Google's AI technology advancement. With outstanding performance across everything from text to multimodal domains and the ability to understand and process complex information efficiently, it shines a bright light on the future of AI. It is expected to bring significant value to everyone who uses AI.

Launch plan

Gemini Pro
We're bringing Gemini to billions worldwide through Google products.

•

Starting today, Bard uses a refined version of Gemini Proto offer even more advanced reasoning, planning, and understanding. This marks Bard’s biggest upgrade since its launch.

•

Available in English in over 170 countries and regions, with plans to soon expand to other modalities, languages, and locations.

Gemini Nano
Gemini runs right on your smartphone

•

The Pixel 8 Pro is the first smartphone designed to run Gemini Nano,and it brings new features to Smart Reply in Gboard—starting with the Recorder app's 'Summary' feature and WhatsApp.

•

We're planning to add support for even more messaging apps next year.

More products and services

Within the next few months, Gemini will be added to more Google products and services—like Search, Ads, Chrome, and Duet AI.It's already been trialed in Search, reducing latency by 40% and improving quality for English searches in the U.S.

Access for developers and business users

Starting from December 13, 2023, developers and enterprise customers will be able to access Gemini Pro in Google AI Studio or Google Cloud Vertex AI. Google AI Studio is a free, web-based development tool for quickly prototyping and launching apps with your API key. Vertex AI is a fully-managed AI platform that allows you to tailor Gemini with user data controls and additional Google Cloud capabilities.

Android developers

Android developers can now build with Gemini Nano, the most efficient for on-device tasks, thanks to a new system feature called AICore in Android 14.

Check out the full Gemini keynote in the video below. 2024 looks like it’s going to bring even bigger shifts.

Gemini: Google’s newest and most capable AI model

Gemini marks the next phase on our journey to making AI more helpful for everyone. Unlike other AI models, Gemini was trained to recognize, understand, and combine different types of information including text, images, audio, video, and code. Its state-of-the-art performance gives it remarkable new capabilities. And it’s built with safety and responsibility at its core. Welcome to the Gemini era. Check out more Gemini videos: https://goo.gle/47I6kTp Find out more about Gemini: https://deepmind.google/gemini Read the blog post: https://goo.gle/3uRyug7 Subscribe to our Channel: https://www.youtube.com/google Tweet with us on Twitter: https://twitter.com/google Follow us on Instagram: https://www.instagram.com/google Join us on Facebook: https://www.facebook.com/Google

youtu.be

Subscribe to 'haebom'

📚 Welcome to Haebom's archives.
---
I post articles related to IT 💻, economy 💰, and humanities 🎭.
If you are curious about my thoughts, perspectives or interests, please subscribe.
haebom@kakao.com

Haebom

Dec 7, 2023

AlphaCode2_Tech_Report.pdf650.85KB

Loading PDF preview

Haebom

Dec 8, 2023

영상 마지막에 연출 되었다고 기재 하였지만 오해를 불러일으킬 수 있음 + 비교군이 GPT-4와 완전 일치가 아님이라는 이유로 비판 받고 있습니다.

Google's Gemini isn't the generative AI model we expected | TechCrunch

Google's Gemini model is finally here. But it's probably not what you were expecting.

techcrunch.com

See latest comments

Google unveils Gemini, a model that outperforms GPT-4

Confidence in its performance

Text processing abilities

Multimodal processing abilities

Points of note

Key features by model size

Launch plan

Gemini ProWe're bringing Gemini to billions worldwide through Google products.

Gemini NanoGemini runs right on your smartphone

More products and services

Access for developers and business users

Android developers

Google unveils Gemini, a model that outperforms GPT-4

Confidence in its performance

Text processing abilities

Multimodal processing abilities

Points of note

Key features by model size

Launch plan

Gemini ProWe're bringing Gemini to billions worldwide through Google products.

Gemini NanoGemini runs right on your smartphone

More products and services

Access for developers and business users

Android developers

Gemini Pro
We're bringing Gemini to billions worldwide through Google products.

Gemini Nano
Gemini runs right on your smartphone

Gemini Pro
We're bringing Gemini to billions worldwide through Google products.

Gemini Nano
Gemini runs right on your smartphone