Sign In

DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents

Created by
  • Haebom
Category
Empty

์ €์ž

Jiahao Zhao, Shaoxuan Xu, Zhongxiang Sun, Fengqi Zhu, Jingyang Ou, Yuling Shi, Chongxuan Li, Xiao Zhang, Jun Xu

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์กด ๊ฒ€์ƒ‰ ์—์ด์ „ํŠธ์˜ ์‹ฌ๊ฐํ•œ ์ง€์—ฐ ์‹œ๊ฐ„ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ™•์‚ฐ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(dLLM)์„ ํ™œ์šฉํ•˜๋Š” DLLM-Searcher ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. Agentic SFT์™€ Agentic VRPO๋ฅผ ํ†ตํ•ด dLLM์˜ ์ •๋ณด ํƒ์ƒ‰ ๋ฐ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”ํ•˜๊ณ , P-ReAct๋ผ๋Š” ์ƒˆ๋กœ์šด ์—์ด์ „ํŠธ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋„์ž…ํ•˜์—ฌ ๋ณ‘๋ ฌ ์ถ”๋ก  ๋ฐ ํ–‰๋™์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•จ์œผ๋กœ์จ ์ง€์—ฐ ์‹œ๊ฐ„์„ ์ค„์˜€์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, DLLM-Searcher๋Š” ๊ธฐ์กด LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ์™€ ๋™๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉด์„œ ์ถ”๋ก  ์†๋„๋ฅผ ์•ฝ 15% ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
ํ™•์‚ฐ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(dLLM)์˜ ๋ณ‘๋ ฌ ๋””์ฝ”๋”ฉ ๋Šฅ๋ ฅ๊ณผ ์œ ์—ฐํ•œ ์ƒ์„ฑ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ํ™œ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰ ์—์ด์ „ํŠธ์˜ ํšจ์œจ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
Agentic SFT์™€ Agentic VRPO๋ฅผ ํ†ตํ•œ dLLM์˜ ์—์ด์ „ํŠธ ๋Šฅ๋ ฅ ๊ฐ•ํ™”๋Š” ์ •๋ณด ํƒ์ƒ‰ ๋ฐ ์ถ”๋ก  ์„ฑ๋Šฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
P-ReAct ํŒจ๋Ÿฌ๋‹ค์ž„์€ ๊ธฐ์กด ReAct ๋ฐฉ์‹์˜ ์ˆœ์ฐจ์  ์ฒ˜๋ฆฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ ์‹ค์งˆ์ ์ธ ์ถ”๋ก  ์†๋„๋ฅผ ๋‹จ์ถ•ํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์€ ์—ฌ์ „ํžˆ dLLM์˜ ๊ทผ๋ณธ์ ์ธ ์ถ”๋ก  ๋Šฅ๋ ฅ์— ์ œ์•ฝ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ณต์žกํ•˜๊ณ  ๋‹ค์–‘ํ•œ ๋„๊ตฌ ํ˜ธ์ถœ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ์˜ ์„ฑ๋Šฅ ๊ฒ€์ฆ์ด ์ถ”๊ฐ€์ ์œผ๋กœ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘