Best multimodal models still can't crack 50 percent on basic visual entity recognitionA new benchmark called WorldVQA tests whether multimodal AI models actually recognize what they see or just make it up. Even the best performer, Gemini 3 Pro, tops out at 47.4 percent when asked for specific details like exact species or product names instead of generic labels. Worse, the models are convinced they're right even when they're wrong.