This paper focuses on unsupervised dictionary learning in multi-agent environments, especially task-agnostic exploration. While task-agnostic exploration has been well studied in single-agent environments through entropy maximization, it remains an unexplored area in multi-agent environments. This paper characterizes various problem formulations and emphasizes that the problem is difficult in practice despite its theoretical solvability. Then, we present a scalable and distributed trust region policy search algorithm to solve the problem in real environments, and show through numerical verification that it achieves a balance between ease of inference and performance through mixture entropy optimization.