[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training

Created by
  • Haebom

Author

Seth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Robert Ross, Shivaram Venkataraman

Outline

Spatio-temporal graph neural networks (ST-GNNs), which are effective for modeling large-scale spatio-temporal data dependencies, have been mainly applied to small-scale datasets due to memory constraints. In this paper, we present the PyTorch Geometric Temporal Index (PGT-I), an extension of PyTorch Geometric Temporal that integrates distributed data parallel learning and two novel strategies: index placement and distributed index placement. The indexing technique leverages spatio-temporal structures to dynamically generate snapshots at runtime, which significantly reduces memory overhead, while distributed index placement enables scalable processing across multiple GPUs. The proposed technique enables ST-GNNs to be trained on the entire PeMS dataset for the first time without graph partitioning, achieving up to 89% peak memory usage reduction and up to 11.78x speedup over standard DDP using 128 GPUs.

Takeaways, Limitations

Takeaways:
We present a novel framework, PGT-I, that enables ST-GNN training on large-scale spatiotemporal datasets.
Improved memory efficiency and learning speed through index placement and distributed index placement strategies.
Performance improvement verified through experiments using the PeMS dataset.
Limitations:
PGT-I depends on PyTorch Geometric Temporal, and its compatibility with other frameworks is uncertain.
The effectiveness of the presented method may be limited to the PeMS dataset, and its generalizability to other types of spatiotemporal datasets requires further study.
High dependency on distributed learning environment.
👍