This paper presents the results of a study that extends the Memory Mosaics network, which has shown effective configuration and context-based learning ability on conventional medium-sized (GPT-2 level) networks and synthetic small-scale datasets, to a large-scale language model (Llama-8B level) and real-world datasets. The Memory Mosaics v2, which has been extended to a 10 billion-parameter scale, is trained with 1 trillion tokens and its performance is evaluated in three evaluation aspects (training knowledge storage, new knowledge storage, and context-based learning). The results show that Memory Mosaics v2 has a similar performance in learning training knowledge as the Transformer, and significantly outperforms the Transformer in terms of the ability to perform new tasks during inference (the second and third aspects). In particular, Memory Mosaics v2 trained with 1 trillion tokens outperforms the Transformer trained with 8 trillion tokens, suggesting that such performance improvements are difficult to achieve simply by increasing the training data of the Transformer.