This paper introduces the NordDRG-AI-Benchmark, the first publicly available benchmark for evaluating the reasoning ability of diagnosis-related groups (DRGs), a crucial component of hospital funding. Given that trillions of dollars in healthcare spending in OECD countries are channeled through DRG systems, transparency and auditability are crucial. The NordDRG-AI-Benchmark includes a machine-readable NordDRG definition table, an expert manual, and a change log template. It provides two benchmarks: a logic benchmark (13 tasks) and a grouper benchmark (13 tasks). The logic benchmark includes code lookups, cross-table reasoning, grouping functions, multilingual terminology, and CC/MCC validation, while the grouper benchmark requires perfect emulation of the DRG grouper. Experimental results show that GPT-5 Thinking and Opus 4.1 achieved high scores on the logic benchmark, but even GPT-5 Thinking failed to perfectly emulate the grouper benchmark. This benchmark can contribute to objectively evaluating the performance of LLMs in the field of hospital financing.