Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Memory Mosaics at scale

Created by
  • Haebom

Author

Jianyu Zhang, L eon Bottou

Outline

This paper presents the results of a study that extends the Memory Mosaics network, which has shown effective configuration and context-based learning ability on conventional medium-sized (GPT-2 level) networks and synthetic small-scale datasets, to a large-scale language model (Llama-8B level) and real-world datasets. The Memory Mosaics v2, which has been extended to a 10 billion-parameter scale, is trained with 1 trillion tokens and its performance is evaluated in three evaluation aspects (training knowledge storage, new knowledge storage, and context-based learning). The results show that Memory Mosaics v2 has a similar performance in learning training knowledge as the Transformer, and significantly outperforms the Transformer in terms of the ability to perform new tasks during inference (the second and third aspects). In particular, Memory Mosaics v2 trained with 1 trillion tokens outperforms the Transformer trained with 8 trillion tokens, suggesting that such performance improvements are difficult to achieve simply by increasing the training data of the Transformer.

Takeaways, Limitations

Takeaways:
We validate the superior compositional and context-specific learning capabilities of memory mosaics in large-scale language models.
Memory Mosaic v2 was found to be superior to Transformer in terms of new knowledge storage and contextual learning.
Despite the difference in training data size, Memory Mosaic v2 outperforms Transformer, highlighting the architectural advantage of Memory Mosaic.
Limitations:
This study presents results for a model and dataset of a certain scale, and does not guarantee the same results for models or datasets of different scales.
There is a lack of detailed information on the architectural improvements in Memory Mosaics v2.
Additional evaluations for various real-world applications are needed.
👍