A concise overview of the key highlights from Google I/O 2024

Haebom

May 15, 20242y ago

Gemini Model Lineup Gemma Model Family Other Announcements Personal Comments

Gemini Model Lineup

•

Gemini 1.5 Pro has announced support for 2 million tokens. (Waitlist currently open) The official blog mentions "a series of quality improvements across key use cases such as translation, coding, and reasoning," but did not release any benchmarks.

Gemini 1.5 Pro updates, 1.5 Flash debut and 2 new Gemma models

Today we’re updating Gemini 1.5 Pro, introducing 1.5 Flash, rolling out new Gemini API features and adding two new Gemma models.

blog.google

•

A fourth model, Gemini Flash, has been added to the previous three. This new model is described as "optimized for fast, frequently-needed AI tasks" and is said to offer a 1 million token capacity at a slightly lower price point than GPT3.5, though no specific speed figures were revealed. The Gemini lineup released so far includes:

Gemini Flash

Our lightweight model, optimized for when speed and efficiency matter most, with a context window of up to one million tokens.

deepmind.google

◦

Ultra: “Largest model” (available only on Gemini Advanced)

◦

Pro: “Best model optimized for general performance” (API available from today, General Availability scheduled for June)

◦

Flash: “Lightweight model for speed/efficiency” (API available from today, general availability in June)

◦

Nano: "On-device model" (to be built into Chrome 126)

•

Gemini Gems: Gemini version of the custom GPT

Google

Whether you need a yoga bestie or calculus tutor, in the coming months you’ll be able to customize Gemini, saving time when you have specific ways you interact with Gemini again and again. We’re calling these Gems. #GoogleIO pic.twitter.com/YQOHsUbMWE— Google (@Google) May 14, 2024

twitter.com

•

Gemini Live: “Lets you have in-depth, natural two-way conversations by voice,” which leads directly into Project Astra, a real-time video-understanding personal assistant chatbot with a 2-minute demo.

Google

This summer, we’re expanding Gemini’s multimodal capabilities — including the ability to have an in-depth two-way conversation using your voice. This new experience is called Live. #GoogleIO pic.twitter.com/eAZbaO5WKz— Google (@Google) May 14, 2024

twitter.com

•

LearnLM: “New model family based on Gemini, fine-tuned for learning”

Google

Introducing LearnLM: our new family of models based on Gemini and fine-tuned for learning. LearnLM applies educational research to make our products — like Search, Gemini and YouTube — more personal, active and engaging for learners. #GoogleIO pic.twitter.com/oqPD1iOV6O— Google (@Google) May 14, 2024

twitter.com

Gemma Model Family

•

Gemma 2, previously available in 7B and 2B, has now expanded up to 27B, and is a model currently in training that can nearly match Llama-3-70B's performance at half the size (fits on 1 TPU). This too is planned to be released for free to run locally.

Google

Get a sneak peek of Gemma 2, our next generation of models that will include a 27B parameter instance launching in a few weeks. Built on new architecture, Gemma 27B outperforms models twice its size and can run on a single TPU host in Vertex AI. #GoogleIO pic.twitter.com/MtmXLzlufa— Google (@Google) May 14, 2024

twitter.com

•

PaliGemma - The first vision-language open model inspired by PaLI-3, complementing CodeGemma and RecurrentGemma.

Google

Built with the same technology as Gemini, our Gemma family of models offers industry-leading performance in lightweight 7B and 2B sizes. Today, we're introducing PaliGemma, our first vision-language open model. #GoogleIO pic.twitter.com/JwG39dbrFz— Google (@Google) May 14, 2024

twitter.com

Other Announcements

•

Veo: A video generation model like Sora and Vidu, introduced by DeepMind, with demo videos only shown so far.

Google

🎥Introducing Veo, our new generative video model from @GoogleDeepMind.With just a text, image or video prompt, you can create and edit HQ videos over 60 seconds in different visual styles. Join the waitlist in Labs to try it out in our new experimental tool, VideoFX #GoogleIO pic.twitter.com/RnMsWu9s1q— Google (@Google) May 14, 2024

twitter.com

•

Imagen 3: Google's image generation model, offering stronger prompt comprehension and interpretation than previous versions, making it easier to use. (This is the next generation model after the previous Imagen.)

Imagen 3

Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models.

deepmind.google

•

Music AI Sandbox - A music generation model being developed by YouTube and DeepMind to compete with Udio/Suno.

Google DeepMind

Together with @YouTube, we’ve been building Music AI Sandbox, a suite of AI tools to transform how music can be created. 🎵To help us design and test them, we’ve been working closely with musicians, songwriters and producers. ↓ #GoogleIO pic.twitter.com/pMLa3aCveu— Google DeepMind (@GoogleDeepMind) May 14, 2024

twitter.com

•

SynthID watermarking is now extended to text as well as images, audio, and video (including Veo).

◦

SynthID is a watermarking technology developed by Google to help identify AI-generated content.

Google

Last year we introduced SynthID, which adds imperceptible watermarks to AI-generated images and audio, making them easier to distinguish. Today we’re expanding SynthID to text and video outputs, including our new Veo model. #GoogleIO pic.twitter.com/XINWHZ37Q4— Google (@Google) May 14, 2024

twitter.com

•

A new hardware called TPUv6, named Trillium, has been unveiled. It boasts vastly superior performance over previous TPUs. (4.7x improvement)

Google

Trillium is our latest generation of TPUs and delivers a 4.7x improvement in compute performance per chip over the previous generation, TPU v5e. #GoogleIO— Google (@Google) May 14, 2024

twitter.com

They also announced AI integration across Google products, including Workspace, Email, Docs, Sheets, Photos, Search Overviews, multi-step reasoning search, Android Circle to Search, and Lens.

CNET has posted a 12-minute summary. If you're interested, please check out the video below or the summarized details on Release AI.

Lilys AI : 릴리스에이아이 - 영상을 넣으면 깔끔한 요약노트로

릴리스에이아이(Lilys AI)에서 영상의 요약 노트를 몇 분만에 만들고, PDF에서 여러분이 원하는 것을 더 쉽게 찾아보세요

lilys.ai

Personal Comments

At Google I/O, Gemini 1.5 Pro stood out for its faster processing and improved MMLU scores, expected to significantly enhance user experience by allowing much longer context compared to older models. Also, despite being lightweight, Gemini 1.5 Flash maintains the 1M token capability and shows a remarkable boost in text generation speed. This really demonstrates the strength of Google's infrastructure integration, enabling super fast and effective responses.

Innovative features like Project Astra's real-time audio/video data processing and response generation make real-time conversation possible even on prototypes such as Google Glass—a noteworthy step forward. In addition, the rapid evolution of open-source models like Gemma has improved accessibility to AI research and development through collaboration with the developer community. These strategies show Google's ongoing efforts to deliver better services to both users and developers.

Introducing Context Caching could reduce repeated inputs for long contexts, cut costs, and greatly improve user convenience. Upgraded interfaces that accept a wider range of inputs will also help diversify and enrich the user experience. These advancements and innovations clearly show that technologies introduced at Google I/O are having a significant impact on user experience and the developer ecosystem.

Still, I couldn't help feeling a bit let down. Despite the lengthy and extensive presentation, there wasn't anything that truly made me go "Wow." There wasn't innovation in terms of brand-new products, but I did get the sense that there were some business innovations. The presentation showed solid improvements in cost, operation, and usability, but maybe it's just that OpenAI and Apple already got the spotlight for those kinds of highlights first.

Subscribe to 'haebom'

📚 Welcome to Haebom's archives.
---
I post articles related to IT 💻, economy 💰, and humanities 🎭.
If you are curious about my thoughts, perspectives or interests, please subscribe.
haebom@kakao.com

Haebom

May 15, 2024