Existing conversational AI benchmarks simulate single-control environments where only the AI agent interacts with the environment using tools, while the user remains a passive information provider. This differs from real-world scenarios such as technical assistance, where the user must actively modify the state of the (shared) environment. To address this difference, this paper presents $\tau^2$-bench, which models the telecom dual-control domain with Dec-POMDP, where both the agent and the user act in a shared dynamic environment using tools. $\tau^2$-bench tests both agent coordination and communication, and provides a fine-grained analysis of agent performance through a configurable task generator, a reliable user simulator tightly coupled to the environment, and multiple elimination experiments that separate inference errors from communication/coordination errors. Experimental results show that the agent’s performance degrades significantly when it transitions to dual-control without the user, highlighting the challenges of guiding the user. $\tau^2$-bench provides a controlled testing environment for agents that need to effectively reason and guide the user’s actions.