EMMA is an end-to-end multimodal model for autonomous driving based on a multimodal giant language model like Gemini. EMMA directly maps raw camera sensor data to various driving-related outputs, such as planner paths, recognized objects, and road graph elements. It maximizes the global knowledge of the pre-trained giant language model by expressing both non-sensor inputs, such as navigation instructions and vehicle status, and outputs, such as paths and 3D positions, as natural language text. This allows EMMA to jointly process various driving tasks within a unified language space and generate outputs for each task using task-specific prompts. Its effectiveness has been experimentally demonstrated by achieving competitive results in motion planning on nuScenes, in WOMD, and in camera-based 3D object detection on WOD. Jointly training EMMA on planner paths, object detection, and road graph tasks improves performance across all three domains, highlighting EMMA's potential as a generalizable model for autonomous driving applications.