[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

StackTrans: From Large Language Model to Large Pushdown Automata Model

Created by
  • Haebom

Author

Kechi Zhang, Ge Li, Jia Li, Huangzhao Zhang, Yihong Dong, Jia Li, Jingjing Xu, Zhi Jin

Outline

In this paper, we propose StackTrans to address the problem that the Transformer architecture fails to effectively capture Chomsky layers (such as regular expressions or deterministic context-free grammars), which is a Limitations. StackTrans explicitly integrates hidden state stacks between Transformer layers, inspired by pushdown automata. Stack operations (push and pop) are differentiable, end-to-end trainable, and compatible with existing frameworks such as flash-attention. It demonstrates superior performance over existing Transformer models and other baseline models on Chomsky layers and large-scale natural language benchmarks, and is scalable from 360 million to 7 billion parameters. In particular, StackTrans-360M outperforms several open-source LLMs with 2–3 times more parameters, demonstrating its efficiency and inference capability.

Takeaways, Limitations

Takeaways:
A novel approach to improving the Chomsky layer processing capability of Transformer architectures is presented.
Proof of efficiency and performance improvements of the StackTrans architecture incorporating a hidden state stack.
Achieve performance that outperforms large-scale models even with small-parameter models.
Maintain compatibility with existing frameworks.
Limitations:
Further research is needed on the generalization ability of the proposed StackTrans architecture.
There is a need to evaluate the performance of StackTrans for more complex grammar structures.
Further research is needed on optimizing stack size and stack operations.
👍