Large-scale language models (LLMs) excel in tasks such as mathematics, factual question answering, and code generation, but their ability to perform these tasks across multiple languages remains underdeveloped. Especially in low-resource languages like Swahili or Thai, LLMs often misinterpret prompts or infer in English. This implicit bias toward high-resource languages hinders factual accuracy, interpretability, and reliability. In this paper, we propose M2A, a novel method that combines multi-scale multilingual alignment with language consistency compensation for machine-translated questions to train models to directly and accurately infer in the target language. Furthermore, existing multilingual benchmarks only evaluate final answers, overlooking whether inferences occur in the intended language. To address this gap, we introduce GeoFact-X, a geography-based multilingual factual inference benchmark, with inference traces in English, Hindi, Japanese, Swahili, and Thai. Consequently, M2A significantly improves multilingual reasoning fidelity in both mathematical and factual reasoning tasks, highlighting the importance of inference-aware multilingual reinforcement learning for robust cross-language generalization.