haebom
Sign In
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
Created by
Haebom
Category
Empty
Made with Slashpage