Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Making Language Model a Hierarchical Classifier

Created by
  • Haebom

Author

Yihong Wang, Zhonglin Jiang, Ningyuan Xi, Yue Zhao, Qingqing Gu, Xiyuan Chen, Hao Wu, Sheng Xu, Hange Zhou, Yong Chen, Luo Ji

Outline

Decoder-only language models, such as GPT and LLaMA, typically perform decoding at the last layer. This study proposes a hierarchical decoder architecture that simultaneously decodes text at different layers, leveraging human hierarchical reasoning. To adapt a pretrained language model to this hierarchical decoder configuration, we copy the language heads from the last layer to selected intermediate layers and fine-tune them with different task inputs. Experiments demonstrate that these selective intermediate layers can generate meaningful and reasonable content, and this hierarchical decoder paradigm achieves state-of-the-art performance on multiple tasks, including hierarchical text classification, classification-based generation, and hierarchical text generation. HdLM outperforms all baselines on WoS, DBpedia, ESconv, EmpatheticDialogues, and several cognitive tests. Furthermore, we provide a thorough theoretical analysis of the methodology's convergence and computational savings. This study demonstrates the potential of a generalized hierarchical rhesus machine learning model trained from scratch.

Takeaways, Limitations

Takeaways:
The performance of language models can be improved through hierarchical decoder architecture.
Pre-trained models can be efficiently adapted by leveraging optional intermediate layers.
It achieves SOTA performance in a variety of tasks, and is particularly effective in hierarchical tasks.
The validity of the methodology is verified through theoretical analysis.
We suggest the possibility of developing a generalized hierarchical resonator trained from scratch.
Limitations:
Due to time and computational resource constraints, we use pre-trained models.
Further research is needed on how to select and fine-tune specific layers.
Further evaluation of the model's generalization ability is needed.
👍