Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PLaMo 2 Technical Report

Created by
  • Haebom

Author

Preferred Networks, :, Kaizaburo Chubachi, Yasuhiro Fujita, Shinichi Hemmi, Yuta Hirokawa, Toshiki Kataoka, Goro Kobayashi, Kenichi Maehashi, Calvin Metzger, Hiroaki Mikami, Shogo Murai, Daisuke Nishino, Kento Nozawa, Shintarou Okada, Daisuke Okanohara, Shunta Saito, Shotaro Sano, Shuji Suzuki, Daisuke Tanaka, Avinash Ummadisingu, Hanqin Wang, Sixue Wang, Tianqi Xu

Outline

PLaMo 2 is a series of large-scale language models specialized for Japanese. It utilizes a Samba-based hybrid architecture and, through continuous pretraining, transitions to full attention, supporting 32K token contexts. To address data scarcity, it was trained using an extensive synthetic corpus, achieving computational efficiency through weight reuse and structural pruning. This efficient pruning methodology yielded an 8B model that achieved performance comparable to a 100B model. Post-training further improved the model using supervised learning fine-tuning (SFT) and direct preference optimization (DPO) pipelines, leveraging synthetic Japanese instruction data and model merging techniques. Inference was optimized using vLLM and quantization to minimize accuracy loss. It achieved state-of-the-art results on Japanese benchmarks, outperforming similarly sized open models in instruction following, language fluency, and Japanese-specific knowledge.

Takeaways, Limitations

Takeaways:
We improved the efficiency and performance of large-scale language models with a Samba-based hybrid architecture and support for 32K token contexts through continuous pre-training.
By utilizing synthetic data and efficient pruning techniques, we successfully reduced the weight of the model by achieving the performance of a 100B model with an 8B model.
We achieved state-of-the-art performance on Japanese benchmarks using post-training techniques such as SFT, DPO, synthetic data, and model merging.
Inference optimization through vLLM and quantization enables efficient inference without compromising accuracy.
Limitations:
Due to the high reliance on synthetic data, there is a possibility of performance degradation due to differences from real data.
Although the model size has been reduced, it may still require significant computational resources.
This paper lacks a detailed description of the specific synthetic data generation method or the detailed settings of the SFT and DPO pipelines.
There is a lack of validation of the applicability and generalization performance to other languages.
👍