Sign In

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Created by
  • Haebom
Category
Empty

μ €μž

Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, Yifan Zhou, Yang Chen, Chen Zhang, Yutao Fan, Zihu Wang, Songtao Huang, Francisco Piedrahita-Velez, Yue Liao, Hongru Wang, Mengyue Yang, Heng Ji, Jun Wang, Shuicheng Yan, Philip Torr, Lei Bai

πŸ’‘ κ°œμš”

λ³Έ 논문은 λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈ(LLM)의 κ°•ν™”ν•™μŠ΅ λΆ„μ•Όμ—μ„œ 기쑴의 μˆ˜λ™μ μΈ μ‹œν€€μŠ€ μƒμ„±μ—μ„œ λ²—μ–΄λ‚˜ 자율적인 μ˜μ‚¬κ²°μ • μ—μ΄μ „νŠΈλ‘œμ˜ νŒ¨λŸ¬λ‹€μž„ μ „ν™˜μ„ 닀룬닀. κΈ°μ‘΄ LLM κ°•ν™”ν•™μŠ΅μ˜ 단일 μŠ€ν… MDPμ—μ„œ λ²—μ–΄λ‚˜, μ‹œκ°„μ΄ μ—°μž₯되고 λΆ€λΆ„μ μœΌλ‘œ κ΄€μΈ‘ κ°€λŠ₯ν•œ POMDPλ₯Ό μ •μ˜ν•˜λŠ” μ—μ΄μ „νŠΈ κ°•ν™”ν•™μŠ΅μ„ κ°œλ…ν™”ν•œλ‹€. κ³„νš, 도ꡬ μ‚¬μš©, λ©”λͺ¨λ¦¬, μΆ”λ‘ , 자기 κ°œμ„ , 인식을 ν¬ν•¨ν•˜λŠ” 핡심 μ—μ΄μ „νŠΈ κΈ°λŠ₯κ³Ό λ‹€μ–‘ν•œ μ‘μš© λΆ„μ•Όλ₯Ό μ€‘μ‹¬μœΌλ‘œ 체계적인 λΆ„λ₯˜λ₯Ό μ œμ‹œν•˜λ©°, κ°•ν™”ν•™μŠ΅μ΄ μ΄λŸ¬ν•œ κΈ°λŠ₯듀을 정적 λͺ¨λ“ˆμ—μ„œ 적응적이고 κ°•λ ₯ν•œ μ—μ΄μ „νŠΈ ν–‰λ™μœΌλ‘œ λ³€ν™˜ν•˜λŠ” 핡심 λ©”μ»€λ‹ˆμ¦˜μž„μ„ μ£Όμž₯ν•œλ‹€.

πŸ”‘ μ‹œμ‚¬μ  및 ν•œκ³„

β€’
μ—μ΄μ „νŠΈ κ°•ν™”ν•™μŠ΅μ€ LLM을 λ‹¨μˆœν•œ ν…μŠ€νŠΈ 생성기λ₯Ό λ„˜μ–΄ λ³΅μž‘ν•œ ν™˜κ²½μ—μ„œ 자율적으둜 ν–‰λ™ν•˜λŠ” μ—μ΄μ „νŠΈλ‘œ λ°œμ „μ‹œν‚€λŠ” 데 μ€‘μš”ν•œ 역할을 ν•œλ‹€.
β€’
LLM 기반 μ—μ΄μ „νŠΈμ˜ λ‹€μ–‘ν•œ κΈ°λŠ₯(κ³„νš, 도ꡬ μ‚¬μš©, λ©”λͺ¨λ¦¬ λ“±)κ³Ό μ‘μš© λΆ„μ•Όλ₯Ό μ²΄κ³„μ μœΌλ‘œ λΆ„λ₯˜ν•˜μ—¬ 연ꡬ λ°©ν–₯ 섀정에 κΈ°μ—¬ν•œλ‹€.
β€’
λ°©λŒ€ν•œ μ΅œμ‹  연ꡬ 결과듀을 μ’…ν•©ν•˜κ³  μ˜€ν”ˆμ†ŒμŠ€ ν™˜κ²½, 벀치마크, ν”„λ ˆμž„μ›Œν¬λ₯Ό μ •λ¦¬ν•˜μ—¬ ν–₯ν›„ 연ꡬλ₯Ό μœ„ν•œ μ‹€μ§ˆμ μΈ 정보λ₯Ό μ œκ³΅ν•œλ‹€.
β€’
이 λΆ„μ•ΌλŠ” λΉ λ₯΄κ²Œ λ°œμ „ν•˜κ³  있으며, ν™•μž₯ κ°€λŠ₯ν•˜κ³  λ²”μš©μ μΈ AI μ—μ΄μ „νŠΈ κ°œλ°œμ„ μœ„ν•œ κΈ°νšŒμ™€ 도전 κ³Όμ œκ°€ λ‚¨μ•„μžˆλ‹€.
πŸ‘