This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Linear-Time Demonstration Selection for In-Context Learning via Gradient Estimation
Created by
Haebom
Author
Ziniu Zhang, Zhenshuo Zhang, Dongyue Li, Lu Wang, Jennifer Dy, Hongyang R. Zhang
Outline
This paper presents an algorithm for selecting demonstration examples for a query set in contextual learning. We address how to select k examples from a set of n examples to use as conditions for downstream inference. Unlike existing token embedding similarity-based methods, this paper proposes a novel approach that utilizes the gradient of outputs in the input embedding space. We estimate model outputs via a first-order approximation using the gradient and apply this estimation to multiple randomly selected subsets. We compute an influence score for each demonstration and select the k most relevant examples. Because the model outputs and gradients only need to be precomputed once, the algorithm operates in linear time with respect to model and training set sizes. Extensive experiments on various models and datasets demonstrate its efficiency. The gradient estimation procedure approximates full inference with less than 1% error on six datasets. This enables up to 37.7x faster subset selection than existing methods and yields an average 11% improvement over existing input embedding-based selection methods.
Takeaways, Limitations
•
Takeaways:
◦
We present a demo example selection algorithm that is more efficient and performs better than input embedding-based methods.
◦
Provides an accurate approximation to full inference through gradient-based estimation.
◦
Select demo examples quickly and efficiently, even for large-scale models.
◦
It can be used in various applications such as prompt tuning and thought chain inference.
•
Limitations:
◦
Because gradient-based estimation relies on first-order approximations, errors may increase in complex models or datasets.
◦
The efficiency of the algorithm relies on pre-computation of the model output and gradients, which can require significant computational resources.
◦
Hyperparameter tuning may be required to optimize the specific model and dataset.