This paper addresses the problem of modern BPE tokenizers splitting dates into meaningless fragments. To address this, we introduce a novel metric, the date fragment ratio, and release the DateAugBench dataset, which encompasses three temporal inference tasks: context-based date interpretation, format-invariant puzzles, and date operations across historical, contemporary, and future timelines. Furthermore, we investigate how a large-scale language model (LLM) combines date fragments to perform temporal inference using layer-by-layer investigation and causal attention-hop analysis. We show that excessive date fragmentation leads to poor accuracy, especially for rare dates (historical and future dates). Finally, we demonstrate that the LLM's process for combining date fragments differs from human interpretation (year → month → day). The dataset and code are publicly available.