Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

Created by
  • Haebom

Author

Shanghai AI Lab, :, Yicheng Bao, Guanxu Chen, Mingkang Chen, Yunhao Chen, Chiyu Chen, Lingjie Chen, Sirui Chen, Xinquan Chen, Jie Cheng, Yu Cheng, Dengke Deng, Yizhuo Ding, Dan Ding, Xiaoshan Ding, Yi Ding, Zhichen Dong, Lingxiao Du, Yuyu Fan, Ruijun Ge, Tianle Gu, Lujun Gui, Jiaxuan Guo, Qianxi He, Yuenan Hou, Xuhao Hu, Hong Huang, Kaichen Huang, Shiyang Huang, Yuxian Jiang, Shanzhe Lei, Jie Li, Lijun Li, Hao Li, Juncheng Li, Xiangtian Li, Yafu Li, Lingyu Li, Xueyan Li, Haotian Liang, Dongrui Liu, Qihua Liu, Zhixuan Liu, Bangwei Liu, Huacan Liu, Yuexiao Liu, Zongkai Liu, Chaochao Lu, Yudong Lu, Xiaoya Lu, Zhenghao Lu, Qitan Lv, Caoyuan Ma, Jiachen Ma, Xiaoya Ma, Zhongtian Ma, Lingyu Meng, Ziqi Miao, Yazhe Niu, Yuezhang Peng, Yuan Pu, Han Qi, Chen Qian, Xingge Qiao, Jingjing Qu, Jiashu Qu, Wanying Qu, Wenwen Qu, Xiaoye Qu, Qihan Ren, Qingnan Ren, Qingyu Ren, Jing Shao, Wenqi Shao, Shuai Shao, Dongxing Shi, Xin Song, Xinhao Song, Yan Teng, Xuan Tong, Yingchun Wang, Xuhong Wang, Shujie Wang, Ruofan Wang, Wenjie Wang, Yajie Wang, Muhao Wei, Xiaoyu Wen, Fenghua Weng, Yuqi Wu, Yingtong Xiong,

Outline

We present SafeWork-R1, a state-of-the-art multimodal inference model developed using the SafeLadder framework. SafeLadder integrates large-scale, incremental, safety-oriented reinforcement learning post-training with a multi-principle verifier. Unlike existing RLHFs, SafeWork-R1 develops inherent safety-related reasoning and self-reflection capabilities, resulting in safety "aha" moments. It outperforms the baseline model, Qwen2.5-VL-72B, by an average of 46.54% on safety-related benchmarks and outperforms leading proprietary models such as GPT-4.1 and Claude Opus 4. Step-by-step verification is enhanced through two intervention methods and a deliberative search mechanism during inference. The SafeWork-R1-InternVL3-78B, SafeWork-R1-DeepSeek-70B, and SafeWork-R1-Qwen2.5VL-7B models were also developed, demonstrating that safety and functionality can coevolve synergistically.

Takeaways, Limitations

Takeaways:
We demonstrate the effectiveness of the SafeLadder framework, which integrates large-scale reinforcement learning post-training and a multi-principle verifier.
Presenting a new method to simultaneously improve safety and performance.
Achieving safety performance that surpasses existing top-of-the-line models.
Suggesting the possibility of developing a model with essential reasoning and self-reflection capabilities regarding safety.
Verify the generalizability of the framework to various basic models.
Limitations:
Lack of detailed explanation of the specific implementation and algorithm of the SafeLadder framework.
Lack of detailed description of the benchmarks and evaluation metrics used.
Further verification of safety and reliability in real-world environments is needed.
Lack of discussion of the potential risks and ethical issues of the developed model.
👍