Sign In

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

Created by
  • Haebom
Category
Empty

์ €์ž

Yun Luo, Futing Wang, Qianjia Cheng, Fangchen Yu, Haodi Lei, Jianhao Yan, Chenxi Li, Jiacheng Chen, Yufeng Zhao, Haiyuan Wan, Yuchen Zhang, Shenghe Zheng, Junchi Yao, Qingyang Zhang, Haonan He, Wenxuan Zeng, Li Sheng, Chengxing Xie, Yuxin Zuo, Yizhuo Li, Yulun Wu, Rui Huang, Dongzhan Zhou, Kai Chen, Yu Qiao, Lei Bai, Yu Cheng, Ning Ding, Bowen Zhou, Peng Ye, Ganqu Cui

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋ฌผ๋ฆฌํ•™ ์˜ฌ๋ฆผํ”ผ์•„๋“œ์™€ ๊ฐ™์€ ๋ณต์žกํ•œ ๊ณผํ•™์  ์ถ”๋ก  ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์‹œ๊ฐ์  ์ •๋ณด์™€ ์–ธ์–ด์  ์ดํ•ด๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” Vision-Language Model (VLM)์ธ P1-VL์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. P1-VL์€ ์ ์ง„์ ์ธ ๋‚œ์ด๋„ ์ƒ์Šน์„ ํ†ตํ•œ Curriculum Reinforcement Learning๊ณผ ์ถ”๋ก  ์‹œ ๋ฐ˜๋ณต์ ์ธ ์ž๊ธฐ ๊ฒ€์ฆ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” Agentic Augmentation์„ ํ™œ์šฉํ•˜์—ฌ ๋ฌผ๋ฆฌ ๋ฒ•์น™๊ณผ์˜ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๋ฉฐ ์ถ”๋ก ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, 13๊ฐœ ๋ฌผ๋ฆฌ ์˜ฌ๋ฆผํ”ผ์•„๋“œ ์‹œํ—˜์œผ๋กœ ๊ตฌ์„ฑ๋œ HiPhO ๋ฒค์น˜๋งˆํฌ์—์„œ 12๊ฐœ์˜ ๊ธˆ๋ฉ”๋‹ฌ์„ ํš๋“ํ•˜๋ฉฐ ์˜คํ”ˆ์†Œ์Šค VLM ์ค‘ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์‹œ์‚ฌ์  1: ํ…์ŠคํŠธ ์ •๋ณด๋งŒ์œผ๋กœ๋Š” ๋ถ€์กฑํ•œ ๋ฌผ๋ฆฌ์  ์ œ์•ฝ ์กฐ๊ฑด(์˜ˆ: ๊ฒฝ๊ณ„ ์กฐ๊ฑด, ๊ณต๊ฐ„ ๋Œ€์นญ์„ฑ)์„ ํฌํ•จํ•˜๋Š” ๋„ํ‘œ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•˜์—ฌ ๊ณผํ•™์  ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์‹œ์‚ฌ์  2: P1-VL์€ ๋ฌผ๋ฆฌ ๋ฌธ์ œ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ „๋ฐ˜์ ์ธ STEM ๋ถ„์•ผ์—์„œ ๋›ฐ์–ด๋‚œ ๊ณผํ•™์  ์ถ”๋ก  ๋Šฅ๋ ฅ๊ณผ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ํ–ฅํ›„ ๋ฒ”์šฉ์ ์ธ ๋ฌผ๋ฆฌ ์ง€๋Šฅ ๋ชจ๋ธ ๊ฐœ๋ฐœ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ์—ด์—ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ํ•œ๊ณ„์  ๋˜๋Š” ํ–ฅํ›„ ๊ณผ์ œ: ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฌผ๋ฆฌ ์˜ฌ๋ฆผํ”ผ์•„๋“œ๋ผ๋Š” ํŠน์ • ๋ถ„์•ผ์— ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์œผ๋ฉฐ, ์‹ค์ œ ์„ธ๊ณ„์˜ ๋ณต์žกํ•˜๊ณ  ๋™์ ์ธ ๋ฌผ๋ฆฌ ํ˜„์ƒ์„ ๋” ์ž˜ ์ดํ•ดํ•˜๊ณ  ์ถ”๋ก ํ•˜๊ธฐ ์œ„ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ ๋ฐ ๋ชจ๋ธ ๊ฐœ์„ ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘