Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Stochastic Parameter Decomposition

Created by
  • Haebom

Author

Lucius Bushnaq, Dan Braun, Lee Sharkey

Outline

This paper addresses the core step of reverse engineering neural networks: decomposing neural networks into simpler components. To address the limitations of existing decomposition methods, we propose a linear parameter decomposition framework that decomposes neural network parameters into a sum of vectors that are sparsely used in the parameter space. However, the existing mainstream method, Attribution-based Parameter Decomposition (APD), is impractical due to its computational cost and hyperparameter sensitivity. In this paper, we present Stochastic Parameter Decomposition (SPD), a novel method that is more scalable and robust to hyperparameters than APD. SPD can decompose larger and more complex models than APD, avoids issues such as learned parameter shrinkage, and demonstrates better identification of underlying mechanisms in toy models. By linking causal mediation analysis with network decomposition methods, we address the scalability issues of linear parameter decomposition methods for large models, opening up new research possibilities for mechanistic interpretability. We have released a library for running SPD and reproducible experiments ( https://github.com/goodfire-ai/spd/tree/spd-paper ).

Takeaways, Limitations

Takeaways:
We present an SPD algorithm that is more scalable and robust to hyperparameters than APD.
Solving the problem of reducing learned parameters, which is a problem in APD.
More accurately identify real mechanisms from toy models
Extending research on mechanistic interpretation through the combination of causal mediation analysis and network decomposition methods.
An open-source library that can run and reproduce SPD is released.
Limitations:
The performance of the proposed SPD algorithm is limited to toy models and relatively large models, and its generalizability to real-world large-scale models requires further study.
Further verification of performance and efficiency when applied to actual complex neural networks is required.
👍