This paper studies the evaluation and monitoring of agent-based artificial intelligence (AI), a multi-agent system that combines large-scale language models, external tools, and autonomous planning. Based on the five-axis framework and preliminary indicators (e.g., target bias and damage reduction) presented in previous studies, we present an algorithmic implementation and empirical evidence. Specifically, we propose an Adaptive Multidimensional Monitoring (AMDM) algorithm that normalizes heterogeneous indicators, applies exponentially weighted moving average thresholds across axes, and performs joint anomaly detection using the Mahalanobis distance. We verify the effectiveness of the algorithm through simulations and field experiments. AMDM demonstrates reduced anomaly detection latency and reduced false positive rates. Furthermore, we enhance the reproducibility of the study by disclosing related code, data, and a reproducibility checklist.