The widespread adoption of large-scale language models (LLMs) has increased the demand for high-quality and personalized output. However, existing alignment methods suffer from the difficulty of requiring retraining large pre-trained models. To address these limitations, this paper proposes a novel residual alignment model (RAM), which formalizes the alignment process as a form of importance sampling. In this framework, the unaligned top-level model acts as a proposal distribution, and the alignment process consists of secondary sampling based on an autoregressive alignment module that serves as an estimator of importance weights. RAM decouples the alignment module from the target alignment model, enhancing flexibility and scalability. Furthermore, we develop an efficient sequence-level training strategy for the alignment module, which operates independently of the proposal module, and a resampling algorithm using iterative token-level decoding to address the first token delay issue common in similar methods. Experimental evaluations on two major open-source LLMs across various tasks show that the proposed approach consistently outperforms baseline models.