This paper presents an evaluation framework for agent AI systems in mission-critical negotiation situations. To address the need for AI agents that can adapt to diverse human operators and stakeholders, we systematically evaluated how personality traits and AI agent characteristics influence the outcomes of social negotiations simulated with the LLM (Low-Low-Level Management) through two experiments using the Sotopia simulation testbed. This is essential for a variety of applications, including inter-team coordination and civil-military interactions. In Experiment 1, we used causal discovery methods to measure the impact of personality traits on price negotiations, finding that agreeableness and extraversion significantly impact trustworthiness, goal achievement, and knowledge acquisition outcomes. A sociocognitive vocabulary scale extracted from team communication detects subtle differences in agents' empathic communication, moral foundations, and opinion patterns, providing actionable insights for agent AI systems that must operate reliably in high-risk operational scenarios. In Experiment 2, we evaluated human-AI job negotiations by manipulating simulated human personality traits and AI system characteristics (specifically transparency, competence, and adaptability) to demonstrate how the trustworthiness of AI agents influences mission effectiveness. These results directly support operational requirements for robust AI systems by establishing a repeatable evaluation methodology for testing the reliability of AI agents across diverse operator personalities and human-agent team dynamics. This research advances the evaluation of agent AI workflows by moving beyond standard performance metrics and incorporating the social dynamics essential for mission success in complex operations.