In this paper, we propose CoordField, a novel system for efficiently coordinating heterogeneous unmanned aerial vehicles (UAVs) swarms in complex urban environments. CoordField leverages a large-scale language model (LLM) to translate high-level human commands into executable commands (e.g., patrolling, target tracking) for UAV swarms, and distributes and adaptively allocates UAV movements and task selections through a coordination field mechanism. Through 50 comparative experiments in a 2D simulation environment, we demonstrate that CoordField outperforms existing methods in terms of task coverage, response time, and adaptability to dynamic changes.