Sign In

Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation

Created by
  • Haebom
Category
Empty

์ €์ž

Lingyong Yan, Jiulong Wu, Dong Xie, Weixian Shi, Deguo Xia, Jizhou Huang

๐Ÿ’ก ๊ฐœ์š”

๊ธฐ์กด์˜ ์ข…๋‹จ ๊ฐ„(end-to-end) ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ์€ ๊ต์œก์šฉ ์ฝ˜ํ…์ธ ์ฒ˜๋Ÿผ ์—„๊ฒฉํ•œ ๋…ผ๋ฆฌ์  ์ •ํ™•์„ฑ๊ณผ ์ง€์‹ ํ‘œํ˜„์ด ์š”๊ตฌ๋˜๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ํ•œ๊ณ„๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ต์œก ๋ฌธ์ œ๋กœ๋ถ€ํ„ฐ ๊ณ ํ’ˆ์งˆ ๊ต์œก์šฉ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ณ„์ธต์  LLM ๊ธฐ๋ฐ˜ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์ธ LAVES๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. LAVES๋Š” ๋‹จ๊ณ„๋ณ„ ์ถ”๋ก , ๊ต์œก์  ์„œ์‚ฌ, ์˜๋ฏธ๋ก ์ ์œผ๋กœ ์ถฉ์‹คํ•œ ์‹œ๊ฐ์  ์‹œ์—ฐ, ์ •ํ™•ํ•œ ์‹œ์ฒญ๊ฐ์  ๋™๊ธฐํ™”๋ฅผ ๋™์‹œ์— ์š”๊ตฌํ•˜๋Š” ๋‹ค๋ชฉ์  ์ž‘์—…์œผ๋กœ ๊ต์œก ๋น„๋””์˜ค ์ƒ์„ฑ์„ ๊ณต์‹ํ™”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LAVES๋Š” ์ „๋ฌธํ™”๋œ ์—์ด์ „ํŠธ์™€ ์ค‘์•™ ์กฐ์ • ์—์ด์ „ํŠธ๋ฅผ ํ†ตํ•ด ๊ต์œก์šฉ ๋น„๋””์˜ค ์ƒ์„ฑ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ๋ถ„ํ•ดํ•˜๊ณ , ํ’ˆ์งˆ ๊ฒŒ์ดํŠธ์™€ ๋ฐ˜๋ณต์  ๋น„ํŒ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ฉํ•˜์—ฌ ์ ˆ์ฐจ์  ์ถฉ์‹ค๋„, ์ƒ์‚ฐ ๋น„์šฉ, ์ œ์–ด ๊ฐ€๋Šฅ์„ฑ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ”ฝ์…€์„ ์ง์ ‘ ํ•ฉ์„ฑํ•˜๋Š” ๋Œ€์‹ , ๊ตฌ์กฐํ™”๋œ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ๋น„๋””์˜ค ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๊ตฌ์ถ•ํ•˜์—ฌ ํ…œํ”Œ๋ฆฟ ๊ธฐ๋ฐ˜ ์กฐ๋ฆฝ ๊ทœ์น™์„ ํ†ตํ•ด ๋™๊ธฐํ™”๋œ ์‹œ๊ฐ ํšจ๊ณผ์™€ ์„œ์‚ฌ๋ฅผ ๊ฒฐ์ •๋ก ์ ์œผ๋กœ ์ปดํŒŒ์ผํ•จ์œผ๋กœ์จ ์ˆ˜๋™ ํŽธ์ง‘ ์—†์ด ์™„์ „ ์ž๋™ํ™”๋œ ์ข…๋‹จ ๊ฐ„ ์ƒ์‚ฐ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋Œ€๊ทœ๋ชจ ๋ฐฐํฌ์—์„œ LAVES๋Š” ํ˜„์žฌ ์—…๊ณ„ ํ‘œ์ค€ ์ ‘๊ทผ ๋ฐฉ์‹์— ๋น„ํ•ด 95% ์ด์ƒ์˜ ๋น„์šฉ ์ ˆ๊ฐ์„ ๋‹ฌ์„ฑํ•˜๊ณ  ๋†’์€ ์ˆ˜์šฉ๋ฅ ์„ ์œ ์ง€ํ•˜๋ฉฐ ํ•˜๋ฃจ์— ๋ฐฑ๋งŒ ๊ฐœ ์ด์ƒ์˜ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘