Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation

Created by
  • Haebom

Author

Mengting Pan, Fan Li, Xiaoyang Wang, Wenjie Zhang, Xuemin Lin

Outline

This paper proposes HiTeC, a novel framework for self-supervised learning on text-attributed hypergraphs (TAHGs). We highlight the limitations of existing contrastive learning-based methods, which fail to effectively utilize the textual information in TAHGs, suffer from noise due to random data augmentation, and struggle to capture long-range dependencies. HiTeC consists of a text encoder pretraining stage using a structure-aware contrastive objective function, followed by a second stage that utilizes semantic-aware augmentation strategies such as prompt-enhanced text augmentation and semantic-aware hyperedge deletion. Furthermore, we propose a multi-scale contrastive loss function that better captures long-range dependencies through s-walk-based subgraph-level contrastiveness. This two-stage design decouples text encoder pretraining from hypergraph contrastive learning, thereby improving scalability and maintaining representation quality. We demonstrate the effectiveness of HiTeC through extensive experiments.

Takeaways, Limitations

Takeaways:
We present HiTeC, a novel, efficient and scalable framework for self-supervised learning on hypergraphs with textual properties.
Overcoming the limitations of existing methods through structure-aware contrastive learning and semantic awareness augmentation strategies.
Effectively capturing long-range dependencies with multi-scale contrastive loss functions.
Improved scalability by separating text encoder pretraining and hypergraph contrastive learning through a two-stage design.
Limitations:
It is possible that HiTeC's performance improvements will be limited to certain types of TAHGs.
Further research is needed to optimize the parameters of the proposed semantic recognition augmentation strategy.
The computational complexity of multi-scale contrastive loss functions can be high.
Further validation of applicability and generalization performance on real-world large-scale datasets is needed.
👍