This paper presents the results of an exploratory study on multi-agent systems that leverage the inference capabilities of modern large-scale language models (LLMs) to apply them to domain-specific applications. In particular, we focus on how to combine inference techniques, code generation, and software execution via multiple specialized LLMs. Unlike previous studies that evaluate LLMs, inference techniques, and applications separately, this paper defines a clear specification for a multi-agent LLM system and introduces an agent schema language to present a method for implementing and evaluating the specification via a multi-agent system architecture and prototype. We demonstrate the feasibility of the architecture and evaluation approach through test cases involving cybersecurity tasks, and present evaluation results through successful completion of question answering, server security, and network security tasks using LLMs from OpenAI and DeepSeek.