Sign In

LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models

μž‘μ„±μž
  • Haebom
μΉ΄ν…Œκ³ λ¦¬
Empty

μ €μž

Boyang Shen, Kaixiang Yang, Hao Wang, Qiuyu Yu, Qiang Xie, Qiang Li, Zhiwei Wang

πŸ’‘ κ°œμš”

λ³Έ 논문은 κΈ°μ‘΄ Vision-Language-Action (VLA) λͺ¨λΈμ΄ 좔상화 μˆ˜μ€€μ΄ 높은 ν‘œν˜„μ„ μ•‘μ…˜ μ˜ˆμΈ‘μ— μΌκ΄€λ˜κ²Œ μ‚¬μš©ν•˜λŠ” ν•œκ³„λ₯Ό μ§€μ ν•˜λ©°, λ‘œλ΄‡ μ‘°μž‘μ˜ 반볡적인 곡간 μ‘°μ • μž‘μ—…μ— μ ν•©ν•˜μ§€ μ•ŠμŒμ„ λ³΄μ—¬μ€λ‹ˆλ‹€. 이λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ LoopVLAλŠ” μˆœν™˜μ  μž¬κ΅¬μΆ•, μ•‘μ…˜ 예츑, μΆ©λΆ„μ„± 좔정을 κ³΅λ™μœΌλ‘œ ν•™μŠ΅ν•˜λŠ” μƒˆλ‘œμš΄ VLA μ•„ν‚€ν…μ²˜λ₯Ό μ œμ•ˆν•©λ‹ˆλ‹€. LoopVLAλŠ” 곡유 트랜슀포머 블둝을 톡해 반볡적으둜 λ©€ν‹°λͺ¨λ‹¬ 토큰을 κ°œμ„ ν•˜κ³ , 각 λ‹¨κ³„μ—μ„œ μ•‘μ…˜ 후보와 μΆ”κ°€ μž¬κ΅¬μΆ•μ˜ ν•„μš”μ„±μ„ μΆ”μ •ν•˜λŠ” μΆ©λΆ„μ„± 점수λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€.

πŸ”‘ μ‹œμ‚¬μ  및 ν•œκ³„

β€’
μ μ‘ν˜• ν‘œν˜„ μž¬κ΅¬μΆ•: LoopVLAλŠ” 각 λ°˜λ³΅λ§ˆλ‹€ ν‘œν˜„μ„ λ™μ μœΌλ‘œ μž¬κ΅¬μΆ•ν•˜μ—¬ μž‘μ—…μ— ν•„μš”ν•œ μ •λ³΄μ˜ 좔상화 μˆ˜μ€€μ„ μ‘°μ ˆν•¨μœΌλ‘œμ¨ νš¨μœ¨μ„±μ„ λ†’μž…λ‹ˆλ‹€.
β€’
효율적인 μ•‘μ…˜ 예츑: μΆ©λΆ„μ„± μΆ”μ • λ©”μ»€λ‹ˆμ¦˜μ„ 톡해 λΆˆν•„μš”ν•œ 계산을 쀄이고, ν•„μš”ν•œ μ‹œμ μ— μ •ν™•ν•œ μ•‘μ…˜ μ˜ˆμΈ‘μ„ μˆ˜ν–‰ν•˜μ—¬ μ„±λŠ₯을 ν–₯μƒμ‹œν‚΅λ‹ˆλ‹€.
β€’
ν–₯ν›„ 과제: μΆ©λΆ„μ„± μΆ”μ •μ˜ 정확도λ₯Ό λ”μš± 높이고, λ‹€μ–‘ν•œ λ‘œλ΄‡ μ‘°μž‘ ν™˜κ²½ 및 λ³΅μž‘ν•œ μž‘μ—…μ— λŒ€ν•œ μΌλ°˜ν™” μ„±λŠ₯을 ν‰κ°€ν•˜λŠ” 것이 ν–₯ν›„ κ³Όμ œμž…λ‹ˆλ‹€.
πŸ‘