This paper proposes the RL-MoE framework to address the serious conflict between the need for rich visual data and privacy rights arising from the proliferation of AI-powered cameras in Intelligent Transportation Systems (ITS). RL-MoE is a novel framework that converts sensitive visual data into privacy-preserving text descriptions, eliminating the need for direct image transmission. It performs detailed multifaceted scene decomposition using a Mixed-Experts (MoE) architecture and optimizes the generated text for both semantic accuracy and privacy using a reinforcement learning (RL) agent. Experimental results demonstrate that RL-MoE offers superior privacy-preserving performance, reducing the replay attack success rate to 9.4% on the CFP-FP dataset, and generates richer text content than existing methods. This research provides a practical and scalable solution for building trustworthy AI systems in privacy-critical areas, paving the way for safer smart cities and autonomous vehicle networks.