DM-Bench is the first benchmark designed to evaluate the performance of large-scale language models (LLMs) on daily life decision-making tasks for people with diabetes. It provides a comprehensive evaluation framework specifically designed for prototyping patient-centered AI solutions in the areas of diabetes, glycemic management, and metabolic health. Covering seven task categories, it generates 360,600 personalized questions based on one month of time-series data (blood glucose tracking from continuous glucose monitoring (CGM) and behavioral logs such as meal and activity patterns) collected from 15,000 individuals across three diabetes populations (Type 1, Type 2, and prediabetes/general health and wellness). Model performance is evaluated across five metrics: accuracy, evidence base, safety, clarity, and feasibility, through analysis of eight state-of-the-art LLMs.