Sign In

K-Gen: A Multimodal Language-Conditioned Approach for Interpretable Keypoint-Guided Trajectory Generation

Created by
  • Haebom
Category
Empty

์ €์ž

Mingxuan Mu, Guo Yang, Lei Chen, Ping Wu, Jianxun Cui

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์ž์œจ ์ฃผํ–‰ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์‚ฌ์‹ค์ ์ด๊ณ  ๋‹ค์–‘ํ•œ ๊ถค์  ์ƒ์„ฑ์˜ ์–ด๋ ค์›€์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด K-Gen์ด๋ผ๋Š” ํ•ด์„ ๊ฐ€๋Šฅํ•œ ํ‚คํฌ์ธํŠธ ๊ธฐ๋ฐ˜ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. K-Gen์€ BEV ์ง€๋„ ์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ ์„ค๋ช…์„ ํ†ตํ•ฉํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์ค‘ ๋ชจ๋‹ฌ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(MLLM)์„ ํ™œ์šฉํ•˜๋ฉฐ, ์ง์ ‘์ ์ธ ๊ถค์  ์˜ˆ์ธก ๋Œ€์‹  ์—์ด์ „ํŠธ์˜ ์˜๋„๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” ํ•ด์„ ๊ฐ€๋Šฅํ•œ ํ‚คํฌ์ธํŠธ์™€ ์ถ”๋ก ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ดํ›„ ๋ณด๊ฐ• ๋ชจ๋“ˆ์„ ํ†ตํ•ด ์ด๋Ÿฌํ•œ ํ‚คํฌ์ธํŠธ๋ฅผ ์ •์ œํ•˜์—ฌ ์ •ํ™•ํ•œ ๊ถค์ ์„ ์ƒ์„ฑํ•˜๋ฉฐ, T-DAPO๋ผ๋Š” ๊ถค์  ์ธ์‹ ๊ฐ•ํ™” ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ธฐ์กด ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ ๊ถค์  ์ƒ์„ฑ ๋ฐฉ์‹์ด ์‹œ๊ฐ์  ๋งฅ๋ฝ์„ ์ œ๋Œ€๋กœ ํฌ์ฐฉํ•˜์ง€ ๋ชปํ•˜๋Š” ํ•œ๊ณ„๋ฅผ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์ ‘๊ทผ ๋ฐฉ์‹์œผ๋กœ ๊ทน๋ณตํ•˜์—ฌ, BEV ์ง€๋„์™€ ํ…์ŠคํŠธ ์„ค๋ช…์„ ํ†ตํ•ฉํ•จ์œผ๋กœ์จ ๋” ํ’๋ถ€ํ•œ ์žฅ๋ฉด ์ดํ•ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ถค์ ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.
โ€ข
์ง์ ‘์ ์ธ ๊ถค์  ์˜ˆ์ธก ๋Œ€์‹  ํ•ด์„ ๊ฐ€๋Šฅํ•œ ํ‚คํฌ์ธํŠธ์™€ ์ถ”๋ก ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹์€ ์—์ด์ „ํŠธ์˜ ์˜๋„๋ฅผ ์ดํ•ดํ•˜๊ณ  ๊ถค์  ์ƒ์„ฑ ๊ณผ์ •์„ ํˆฌ๋ช…ํ•˜๊ฒŒ ๋งŒ๋“ค๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ๊ถค์ ์˜ ์‹ ๋ขฐ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค.
โ€ข
WOMD ๋ฐ nuPlan ๋ฐ์ดํ„ฐ์…‹์—์„œ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆ๋œ K-Gen์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก  ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์ถ”๋ก ๊ณผ ํ‚คํฌ์ธํŠธ ๊ธฐ๋ฐ˜ ๊ถค์  ์ƒ์„ฑ์˜ ํšจ๊ณผ์ ์ธ ๊ฒฐํ•ฉ์„ ์ž…์ฆํ•œ๋‹ค.
โ€ข
ํ–ฅํ›„ ์—ฐ๊ตฌ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ๊ณผ ๋ณต์žกํ•œ ์ฃผํ–‰ ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋Œ€ํ•œ K-Gen์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ณ , ์ƒ์„ฑ๋œ ํ‚คํฌ์ธํŠธ ๋ฐ ์ถ”๋ก ์˜ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์„ ๋”์šฑ ์‹ฌํ™”์‹œํ‚ค๋Š” ๋ฐฉ์•ˆ์„ ๋ชจ์ƒ‰ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค.
๐Ÿ‘