To overcome the shortcomings of existing explainable recommender systems, this paper proposes a dynamic interaction optimization framework based on human-like feedback. This framework utilizes a large-scale language model (LLM) as a human simulator to predict human feedback and enhances the LLM's language understanding and logical reasoning capabilities through a user-tailored reward scoring method. Furthermore, Pareto optimization is introduced to address the trade-off between explanation quality from various perspectives, and an off-policy optimization pipeline is used to achieve efficient model learning. Experimental results demonstrate that the proposed method outperforms existing methods.