Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

$\Mu$KE: Matryoshka Unstructured Knowledge Editing of Large Language Models

Created by
  • Haebom

Author

Zian Su, Ziyang Huang, Kaiyuan Zhang, Xiangyu Zhang

Outline

This paper addresses the challenges of large-scale language models (LLMs), which face challenges such as hallucinations and security risks due to the limitations of static training data. While the locate-and-edit paradigm, which modifies the model's internal knowledge, has proven to be a cost-effective alternative to retraining, current unstructured approaches, particularly window-based autoregressive methods, often disrupt causal dependencies between initial memory updates and subsequent output tokens. This study theoretically analyzes these limitations and presents Matryoshka Unstructured Knowledge Editing ($\mu$KE), a novel memory update mechanism that preserves these dependencies using Matryoshka-style objectives and adaptive loss coefficients. Experimental evaluations on four benchmarks for two models demonstrate that $\mu$KE improves editing efficiency by up to 12.33% over state-of-the-art methods and remains robust across various editing formats, highlighting the potential of effective unstructured knowledge editing in LLMs.

Takeaways, Limitations

Takeaways:
Matryoshka Unstructured Knowledge Editing ($\mu$KE) shows up to 12.33% higher efficiency than traditional unstructured knowledge editing methods.
It maintains strong performance even for editing in various formats.
We present an effective approach to editing informal knowledge in LLMs.
We present a mechanism that effectively maintains causal dependencies between initial memory updates and subsequent output tokens.
Limitations:
Further research is needed to determine the generality and scalability of the proposed method.
Further performance evaluation of $\mu$KE for various LLM architectures and sizes is needed.
Further validation of applicability and safety in real-world environments is required.
👍