Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding

Created by
  • Haebom

Author

Ziyin Zhang, Hang Yu, Shijie Li, Peng Di, Jianguo Li, Rui Wang

Outline

This paper presents a framework called Graph-Aligned Large Language Models (GALLa). GALLa represents structural information (e.g., data flow) of code in a graph format, providing additional information to the text-token-based learning methods of existing code language models (LLMs). To overcome the scalability limitations of existing models that utilize structural information due to the need for modifications to the Transformer architecture, GALLa leverages graph neural networks (GNNs) and cross-modal alignment techniques to inject structural information as an auxiliary task during the fine-tuning process. This framework is model- and task-independent, making it applicable to a variety of code LLMs and subtasks. It requires structural graph data only during training and incurs no additional overhead during inference. Experiments on five code tasks using seven LLMs with parameters ranging from 350 million to 14 billion demonstrate that GALLa outperforms the baseline model even on powerful models such as LLaMA3 and Qwen2.5-Coder.

Takeaways, Limitations

Takeaways:
We present a novel framework that effectively utilizes structural information in code to improve LLM performance.
Applicable to a variety of LLMs and jobs with a model- and task-independent approach.
Utilize structural information only during learning without additional cost during inference.
Shows consistent performance improvements across LLMs of various sizes.
Limitations:
The experimental results presented in this paper may be limited to specific datasets and tasks.
Further research is needed on generalization performance across different programming languages and code styles.
More complex code structures or performance evaluations for specific domains are required.
Cost and efficiency issues in generating structural graphs.
👍