In this paper, we propose a novel plug-and-play module, Att-Adapter, to address the problem of simultaneously and precisely controlling multiple attributes in a pre-trained diffusion model. Att-Adapter learns a single control adapter from a set of sample images containing unpaired multiple visual attributes. It utilizes a decoupled cross-attention module to naturally harmonize multiple domain attributes with textual conditions, and uses a conditional variational autoencoder (CVAE) to mitigate overfitting and accommodate the diverse characteristics of the visual world. Evaluation results on two public datasets show that Att-Adapter outperforms all LoRA-based baseline models in continuous attribute control, demonstrating a wider control range and improved inter-attribute separation. In addition, it has the advantage of not requiring paired synthetic data for training and can be easily extended to multiple attributes.