This paper presents a native solution that merges multiple channels at each time step to achieve low latency in a full-duplex conversational model. To address the problem that existing word-level alignment methods degrade language modeling performance, we introduce "natural monologues," which consist of continuous sentences and pauses that mimic human conversational behavior. To achieve semantic alignment between natural monologues and audio, we develop a dual learning method that alternates the positions of the monologues to learn the language. This dual learning method, FLM-Audio, a full-duplex conversational chatbot with 7B parameters, is then developed. Experimental results demonstrate that FLM-Audio provides superior response quality and conversational experience compared to existing models, while requiring significantly less training data.