This paper systematically analyzes personality traits in large-scale language models (LLMs), assessing the dynamics of trait expression across training stages, the predictive validity of self-reported traits, and the impact of interventions such as persona infusion. Our findings demonstrate that instructional tuning (e.g., RLHF) stabilizes trait expression and strengthens trait correlations similar to human data, but self-reported traits do not reliably predict behavior, and observed correlations often do not align with human patterns. Persona infusion successfully steers self-reports in the desired direction, but has little or inconsistent effects on actual behavior. Therefore, by distinguishing between superficial trait expression and behavioral consistency, we challenge assumptions about personality in LLMs and highlight the need for a deeper evaluation of alignment and interpretability.