Mutarjim is a compact yet powerful language model for bidirectional Arabic-English translation. Based on Kuwain-1.5B, it is significantly smaller than larger language models, yet outperforms larger models on multiple benchmarks thanks to an optimized two-stage learning approach and a carefully selected, high-quality training dataset. Furthermore, to overcome the limitations of existing Arabic-English benchmark datasets (narrow domain, short sentence length, and English source bias), we present a new benchmark, Tarjama-25, consisting of 5,000 expert-reviewed sentence pairs. Mutarjim achieves state-of-the-art performance on the Tarjama-25 English-Arabic translation task, outperforming large proprietary models such as GPT-4o mini. The Tarjama-25 dataset is publicly available.