This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Created by
Haebom
Author
Sean McLeish, John Kirchenbauer, David Yu Miller, Siddharth Singh, Abhinav Bhatele, Micah Goldblum, Ashwinee Panda, Tom Goldstein
Outline
This paper points out that existing scaling law research has been conducted within a limited range of hyperparameter choices, and instead explores scaling laws by considering diverse model architectures and hyperparameter choices. We release Gemstones, an open-source scaling law dataset containing over 4,000 checkpoints for Transformer models with up to 2 billion parameters, and include ablation studies on learning rates and cooldowns. This enables more complex scaling studies, such as analyzing the relationship between width and depth. Our results reveal that scaling law prescriptions are highly sensitive to the experimental design process and the specific model checkpoints used during fitting.
Takeaways, Limitations
•
Takeaways:
◦
We emphasize the importance of studying scaling laws considering various model structures and hyperparameters.
◦
We provide an open-source dataset, Gemstones, to lay the foundation for future scaling law research.
◦
We show that the prescription of the scaling law is sensitive to experimental design and model choice.
◦
It enables analysis of relationships between various elements such as width and depth.
•
Limitations:
◦
The model size for the Gemstones dataset is limited to 2 billion parameters.
◦
This research is limited to a specific model architecture (Transformer).
◦
Further research is needed considering more diverse hyperparameters and architectures.