This paper proposes EAI-Avatar, a novel conversational avatar generation framework for recognizing emotions in two-way conversational situations. To overcome the limitations of existing one-way portrait animation generation methods, we leverage the dialogue generation capabilities of large-scale language models (LLMs, e.g., GPT-4) to generate virtual avatars with rich, temporally consistent emotional changes. Specifically, we design a Transformer-based head mask generator that learns temporally consistent motion features in a latent mask space. This allows us to generate temporally consistent mask sequences of arbitrary length to control head movements. Furthermore, we introduce an interactive dialogue tree structure, where each node represents child/parent/sibling node information and the current character's emotional state, thereby representing conversational state transitions. Through reverse level traversal, we extract rich past emotional cues from the current node to guide facial expression synthesis. Extensive experiments demonstrate the superior performance and effectiveness of the proposed method.