This paper focuses on improving the multimodal context-in-learning (MICL) capabilities of multimodal large-scale language models (MLLMs). We note that existing MLLMs struggle to leverage visual information and overrely on text patterns, leading to mere text imitation rather than true multimodal adaptation. To address these issues, we propose Dynamic Attention Reallocation (DARA), an efficient fine-tuning strategy that rebalances attention between visual and textual tokens to direct the model's attention to the visual context. Furthermore, we propose TrueMICL, a MICL-specific dataset that explicitly requires the integration of multimodal information, particularly visual content, for accurate task completion. Experimental results demonstrate that the proposed method significantly improves true multimodal context-in-learning capabilities.