Sign In

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Buyun Liang, Jinqi Luo, Liangzu Peng, Kwan Ho Ryan Chan, Darshan Thaker, Kaleab A. Kinfu, Fengrui Tian, Hamed Hassani, Rene Vidal

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ํ™˜๊ฐ(hallucination) ํ˜„์ƒ์„ ์œ ๋ฐœํ•˜๋Š” ํ˜„์‹ค์ ์ธ ์ ๋Œ€์  ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•๋ก ์ธ REALISTA๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. REALISTA๋Š” ๊ธฐ์กด ์ด์‚ฐ์  ํ”„๋กฌํ”„ํŠธ ๊ณต๊ฒฉ์˜ ์ œํ•œ๋œ ํƒ์ƒ‰ ๊ณต๊ฐ„๊ณผ ์—ฐ์†์  ์ž ์žฌ ๊ณต๊ฐ„ ๊ณต๊ฒฉ์˜ ๋น„ํ˜„์‹ค์ ์ธ ๊ฒฐ๊ณผ๋ผ๋Š” ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด, ์ž…๋ ฅ์— ์˜์กดํ•˜๋Š” ์œ ํšจํ•œ ํŽธ์ง‘ ๋ฐฉํ–ฅ ์‚ฌ์ „๊ณผ ์ž ์žฌ ๊ณต๊ฐ„์—์„œ์˜ ์ตœ์ ํ™”๋ฅผ ๊ฒฐํ•ฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ REALISTA๋Š” ๊ธฐ์กด ์ตœ์‹  ๊ณต๊ฒฉ ๋ฐฉ๋ฒ• ๋Œ€๋น„ ์šฐ์ˆ˜ํ•˜๊ฑฐ๋‚˜ ๋™๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ํŠนํžˆ ๊ธฐ์กด ํ˜„์‹ค์  ๊ณต๊ฒฉ์ด ์‹คํŒจํ–ˆ๋˜ ์ž์œ  ํ˜•์‹ ์‘๋‹ต ์„ค์ •์—์„œ ๋Œ€ํ˜• ์ถ”๋ก  ๋ชจ๋ธ์„ ์„ฑ๊ณต์ ์œผ๋กœ ๊ณต๊ฒฉํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์˜ ํ™˜๊ฐ์„ ์œ ๋ฐœํ•˜๋Š” ํ˜„์‹ค์ ์ด๊ณ  ํšจ๊ณผ์ ์ธ ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•˜์—ฌ LLM์˜ ์‹ ๋ขฐ์„ฑ ํ‰๊ฐ€์— ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๊ธฐ์กด ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•๋ก ์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜์—ฌ, ์˜๋ฏธ๋ก ์  ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ๋‹ค์–‘ํ•œ ํƒ์ƒ‰์ด ๊ฐ€๋Šฅํ•œ ์ƒˆ๋กœ์šด ๊ณต๊ฒฉ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ๊นŒ์ง€ ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•๋ก ์ด ํŠน์ • LLM ์•„ํ‚คํ…์ฒ˜๋‚˜ ๋ฐ์ดํ„ฐ์…‹์— ํŽธํ–ฅ๋  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์œผ๋ฉฐ, ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ ์„ ๋”์šฑ ๋†’์ด๊ธฐ ์œ„ํ•œ ์ตœ์ ํ™” ๊ธฐ๋ฒ• ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘