This paper highlights that current benchmarks for evaluating large-scale language models (LLMs) are heavily focused on standardized writing styles, failing to adequately reflect diverse human communication patterns. To test the hypothesis that LLMs may be vulnerable to non-standard input, we leverage persona-based LLM prompting to mimic diverse writing styles and analyze the impact of variations in writing style and format of prompts with identical semantic content on LLM performance. Our results demonstrate that specific writing styles consistently lead to lower or higher performance across diverse LLM models and tasks, regardless of model type, size, or recentness. This study presents a scalable approach that extends existing benchmarks to enhance the external validity of LLM performance evaluations based on linguistic variation.