CameraBench is a large-scale dataset and benchmark designed to evaluate and improve camera motion understanding. It consists of approximately 3,000 diverse internet videos, annotated by experts through a rigorous, multi-step quality control process. Collaborating with cinematographers, we propose a taxonomy of camera motion primitives. For example, some actions, such as "tracking," require understanding scene content, such as moving subjects. Large-scale human studies quantify human annotation performance, demonstrating that domain expertise and tutorial-based training can significantly improve accuracy. For example, novice users may confuse zooming in (an intrinsic parameter change) with moving forward (an extrinsic parameter change), but training allows them to distinguish between the two. Using CameraBench to evaluate Structure-from-Motion (SfM) and Video-Language Models (VLM), we found that SfM models struggle to capture semantic primitives that depend on scene content, while VLM struggles to capture geometric primitives that require accurate trajectory estimation. We then fine-tune the generative VLM on CameraBench to achieve the best of both worlds, demonstrating applications including motion-augmented captioning, video question answering, and video-to-text search. With this taxonomy, benchmarks, and tutorials, we anticipate future efforts toward the ultimate goal of understanding camera motion in all videos.