Gemini 1.5 Pro has announced support for 2 million tokens. (Waitlist currently open) The official blog mentions "a series of quality improvements across key use cases such as translation, coding, and reasoning," but did not release any benchmarks.
A fourth model, Gemini Flash, has been added to the previous three. This new model is described as "optimized for fast, frequently-needed AI tasks" and is said to offer a 1 million token capacity at a slightly lower price point than GPT3.5, though no specific speed figures were revealed. The Gemini lineup released so far includes:
Gemini Live: “Lets you have in-depth, natural two-way conversations by voice,” which leads directly into Project Astra, a real-time video-understanding personal assistant chatbot with a 2-minute demo.
Gemma 2, previously available in 7B and 2B, has now expanded up to 27B, and is a model currently in training that can nearly match Llama-3-70B's performance at half the size (fits on 1 TPU). This too is planned to be released for free to run locally.
Imagen 3: Google's image generation model, offering stronger prompt comprehension and interpretation than previous versions, making it easier to use. (This is the next generation model after the previous Imagen.)
They also announced AI integration across Google products, including Workspace, Email, Docs, Sheets, Photos, Search Overviews, multi-step reasoning search, Android Circle to Search, and Lens.
CNET has posted a 12-minute summary. If you're interested, please check out the video below or the summarized details on Release AI.
At Google I/O, Gemini 1.5 Pro stood out for its faster processing and improved MMLU scores, expected to significantly enhance user experience by allowing much longer context compared to older models. Also, despite being lightweight, Gemini 1.5 Flash maintains the 1M token capability and shows a remarkable boost in text generation speed. This really demonstrates the strength of Google's infrastructure integration, enabling super fast and effective responses.
Innovative features like Project Astra's real-time audio/video data processing and response generation make real-time conversation possible even on prototypes such as Google Glass—a noteworthy step forward. In addition, the rapid evolution of open-source models like Gemma has improved accessibility to AI research and development through collaboration with the developer community. These strategies show Google's ongoing efforts to deliver better services to both users and developers.
Introducing Context Caching could reduce repeated inputs for long contexts, cut costs, and greatly improve user convenience. Upgraded interfaces that accept a wider range of inputs will also help diversify and enrich the user experience. These advancements and innovations clearly show that technologies introduced at Google I/O are having a significant impact on user experience and the developer ecosystem.
Still, I couldn't help feeling a bit let down. Despite the lengthy and extensive presentation, there wasn't anything that truly made me go "Wow." There wasn't innovation in terms of brand-new products, but I did get the sense that there were some business innovations. The presentation showed solid improvements in cost, operation, and usability, but maybe it's just that OpenAI and Apple already got the spotlight for those kinds of highlights first.
Subscribe to 'haebom'
📚 Welcome to Haebom's archives. --- I post articles related to IT 💻, economy 💰, and humanities 🎭. If you are curious about my thoughts, perspectives or interests, please subscribe. haebom@kakao.com