Sign In

Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments

Created by
  • Haebom
Category
Empty

μ €μž

Romain Froger, Pierre Andrews, Matteo Bettini, Amar Budhiraja, Ricardo Silveira Cabral, Virginie Do, Emilien Garreau, Jean-Baptiste Gaya, Hugo Lauren\c{c}on, Maxime Lecanu, Kunal Malkan, Dheeraj Mekala, Pierre Menard, Gerard Moreno-Torres Bertran, Ulyana Piterbarg, Mikhail Plekhanov, Mathieu Rita, Andrey Rusakov, Vladislav Vorotilov, Mengjue Wang, Ian Yu, Amine Benhalloum, Gregoire Mialon, Thomas Scialom

πŸ’‘ κ°œμš”

λ³Έ 논문은 동적이고 비동기적인 ν™˜κ²½μ—μ„œ λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈ(LLM) μ—μ΄μ „νŠΈμ˜ μ„±λŠ₯을 ν‰κ°€ν•˜κΈ° μœ„ν•œ μƒˆλ‘œμš΄ 벀치마크인 Gaia2λ₯Ό μ œμ•ˆν•©λ‹ˆλ‹€. Gaia2λŠ” ν™˜κ²½μ΄ μ—μ΄μ „νŠΈ 행동과 λ…λ¦½μ μœΌλ‘œ μ§„ν™”ν•˜λŠ” ν˜„μ‹€μ μΈ μ‹œλ‚˜λ¦¬μ˜€λ₯Ό ν¬ν•¨ν•˜μ—¬ μ‹œκ°„ μ œμ•½, λ…Έμ΄μ¦ˆ, 동적 μ΄λ²€νŠΈμ— λŒ€ν•œ 적응, λͺ¨ν˜Έμ„± ν•΄κ²°, λ‹€λ₯Έ μ—μ΄μ „νŠΈμ™€μ˜ ν˜‘μ—… λŠ₯λ ₯을 ν‰κ°€ν•©λ‹ˆλ‹€. 각 μ‹œλ‚˜λ¦¬μ˜€μ—λŠ” μ„Έλ°€ν•œ μ•‘μ…˜ 레벨 평가λ₯Ό μ§€μ›ν•˜λŠ” 검증기가 ν¬ν•¨λ˜μ–΄ μžˆμ–΄, 검증 κ°€λŠ₯ν•œ λ³΄μƒμœΌλ‘œλΆ€ν„°μ˜ κ°•ν™” ν•™μŠ΅μ— 직접 ν™œμš© κ°€λŠ₯ν•©λ‹ˆλ‹€.

πŸ”‘ μ‹œμ‚¬μ  및 ν•œκ³„

β€’
ν˜„μž¬ GPT-5와 같은 μ΅œμ²¨λ‹¨ λͺ¨λΈλ„ 동적이고 비동기적인 ν™˜κ²½μ—μ„œλŠ” μ‹œκ°„ μ œμ•½μ΄ μžˆλŠ” μž‘μ—…μ— μ·¨μ•½ν•˜λ©°, λͺ¨λΈ κ°„μ—λŠ” μΆ”λ‘ , νš¨μœ¨μ„±, 강건성 κ°„μ˜ 근본적인 상좩 관계가 μ‘΄μž¬ν•¨μ„ λ³΄μ—¬μ€λ‹ˆλ‹€.
β€’
Gaia2 λ²€μΉ˜λ§ˆν¬λŠ” "sim2real" 격차λ₯Ό μ’νžˆλŠ” 데 μžˆμ–΄ LLM μ—μ΄μ „νŠΈκ°€ μ§λ©΄ν•œ 어렀움을 λͺ…ν™•νžˆ λ“œλŸ¬λ‚΄λ©°, μ‹€μ œ 적용 κ°€λŠ₯ν•œ μ—μ΄μ „νŠΈ μ‹œμŠ€ν…œ κ°œλ°œμ„ μœ„ν•œ μ€‘μš”ν•œ κΈ°λ°˜μ„ μ œκ³΅ν•©λ‹ˆλ‹€.
β€’
ν˜„μž¬ μ˜€ν”ˆ μ†ŒμŠ€ λͺ¨λΈ 쀑 Kimi-K2κ°€ κ°€μž₯ 높은 μ„±λŠ₯을 λ³΄μ΄μ§€λ§Œ, μ΅œμ²¨λ‹¨ μƒμš© λͺ¨λΈμ— λΉ„ν•΄ 격차가 μ‘΄μž¬ν•˜λ©°, 더 λ§Žμ€ 연ꡬλ₯Ό 톡해 μ˜€ν”ˆ μ†ŒμŠ€ μ—μ΄μ „νŠΈμ˜ μ„±λŠ₯ ν–₯상이 ν•„μš”ν•©λ‹ˆλ‹€.
πŸ‘