[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

The Dual-Route Model of Induction

Created by
  • Haebom

Author

Sheridan Feucht, Eric Todd, Byron Wallace, David Bau

Outline

In this paper, we present the results of a study that, in addition to the traditional token-level copying induction head, we discovered a concept-level induction head that copies entire lexical units. The concept-level induction head learns by paying attention to the ends of multi-token words, and copies meaningful text in parallel with the token-level induction head. The paper shows that the concept-level induction head is responsible for semantic tasks such as word-level translation, while the token-level induction head is essential for tasks that require literal copying, such as nonsensical token copying. The two paths operate independently, and removing the token-level induction head causes the model to paraphrase instead of copying literally. We patch and analyze the output of the concept-level induction head and find that it contains word representations that are independent of language and form, suggesting that large-scale language models represent abstract word meanings independent of language and form.

Takeaways, Limitations

Takeaways:
We reveal that large-scale language models process and replicate information not only at the token level but also at the concept level.
We demonstrate that the concept-level induction head plays an important role in semantic tasks, especially word-level translation.
We present evidence that large-scale language models represent abstract word meanings independent of language and form.
We reveal mechanisms by which two copy paths, token-level and concept-level, operate independently and interact with each other.
Limitations:
A detailed description of the working mechanism of the concept level induction head may be lacking.
The possibility that the presented results may be limited to a specific model or dataset.
Further research is needed on generalizability to various types of large-scale language models.
👍