[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Change of Thought: Adaptive Test-Time Computation

Created by
  • Haebom

Author

Mrinal Mathur, Mike Doan, Barak Pearlmutter, Sergey Plis

Outline

In this paper, we demonstrate that a Transformer evaluated at a fixed depth is limited in its expressive power to the TC0 circuit class, and propose a novel approach to improve the expressive power of the encoder Transformer, rather than an autoregressive approach, to overcome this limitation. While existing autoregressive approaches (next-token prediction, chain-of-thought reasoning) rely on a feedback loop that decodes and re-encodes intermediate states into tokens, the SELF-Transformer proposed in this paper iteratively refines attention weights within the encoder layer to a fixed point, thereby adjusting test-time computation according to input difficulty. This is done by iteratively updating the alignment matrix internally, rather than generating an alignment matrix that mixes input sequences in a single pass. As a result, we achieve up to 20% accuracy improvement on encoder-style benchmarks without increasing the number of parameters, and show that input-adaptive alignment provides significant benefits at test-time with a small additional computational cost. Thus, the SELF-Transformer significantly recovers the expressive power of recurrent reasoning while maintaining the simplicity of a pure encoder architecture.

Takeaways, Limitations

Takeaways:
We present a novel method to overcome the expressive limitations of fixed-depth Transformers.
Improving the expressive power of encoder Transformers without self-repetitive approaches.
Improve accuracy and increase computational efficiency through input adaptive sorting at test time.
Achieving performance improvements without increasing the number of parameters.
Gaining the benefits of recurrent inference while maintaining the simplicity of a pure encoder architecture.
Limitations:
Additional experiments are needed to determine whether the effectiveness of SELF-Transformer is the same across all encoder benchmarks.
Quantitative analysis and optimization method study are needed for the increase in test time calculation amount due to input difficulty.
Further research is needed on generalizability to other types of architectures or tasks.
👍