This paper evaluates whether a structured multi-agent system (MAS) can more effectively manage requirements elicitation, functional decomposition, and simulator code generation than a simple two-agent system (2AS) to overcome the limitations of existing large-scale language model (LLM) workflows in early-stage engineering design that requires complex iterative reasoning. Targeting solar water treatment system design, we introduce a design state graph (DSG), a JSON-serializable representation that bundles requirements, physical implementations, and Python-based physical models as graph nodes, and compare a nine-role MAS with a 2AS consisting of a generator-reflector loop. We conduct a total of 60 experiments using Llama 3.3 70B and a reasoning-distilled DeepSeek R1 70B model and measure JSON validity, requirements fulfillment rate, implementation presence, code compatibility, workflow completion rate, execution time, and graph size. As a result, both systems maintain perfect JSON integrity and implementation tagging, but the requirements fulfillment rate is low, less than 20%. Code compatibility reached 100% in a specific 2AS setup, but MAS was below 50%. DeepSeek R1 70B-based MAS produced more fine-grained DSGs (average 5-6 nodes), and the reasoning-distilled model improved the workflow completion rate, but the problems of low requirements fulfillment rate and lack of coding fidelity persisted.