Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning

Created by
  • Haebom

Author

Sanskar Pandey, Ruhaan Chopra, Saad Murtaza Bhat, Ark Abhyudaya

Outline

Hecto is a lightweight MoE architecture proposed to overcome the limitations of existing MoE models, such as identical inductive bias and static computational path. It exploits the heterogeneity of the architecture by combining GRU experts (temporal inference) and FFNN experts (static abstraction) under a Top-1 gating mechanism. It is evaluated on three inference benchmarks (AG News, SST-2, HotpotQA) and a regression task (STS-B), and shows similar or only a small performance difference from the homogeneous baseline model despite the separated input representation. Each expert is shown to be specialized for a distinct type of inference: temporal inference and static inference. In large batch sizes, the computational constraints are relaxed, which leads to better optimization of the heterogeneous architecture, resulting in improved performance. Experimental results demonstrate that the stability and interpretability of Hecto across a variety of inference tasks stem from the diversity of the architecture. In conclusion, Hecto is positioned as a new conditional computational benchmark that provides a principled framework for specialized inference in resource-constrained environments.

Takeaways, Limitations

Takeaways:
It overcomes the same inductive bias and static computational path of the existing MoE model, Limitations, and shows effective performance in various inference tasks.
Effectively perform temporal inference and static abstraction through specialization of GRU and FFNN experts, and improve model interpretability.
We demonstrate performance improvements at large batch sizes and demonstrate the efficiency of heterogeneous architectures.
We present a novel approach for specialized inference in resource-constrained environments.
Limitations:
Potential loss of information due to use of top-1 gating mechanism.
Further research is needed on the scalability to apply to more complex inference tasks by adding different types of experts.
Generalization performance verification for other tasks beyond the currently presented benchmarks is needed.
👍