PixelHumor is a dataset of 2,800 annotated multi-panel comics designed to evaluate the ability of large-scale multimodal models (LMMs) to understand multi-panel comics. Experimental results using the best LMMs show that panel ordering accuracy is only 61%, significantly below human performance. This highlights the critical limitations of current models in integrating visual and textual cues into a coherent narrative and humor comprehension. PixelHumor provides a rigorous framework for evaluating multimodal context and narrative inference, aiming to develop LMMs that better enable natural and socially responsive interactions.