Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

Created by
  • Haebom

Author

Ruishuo Chen, Xun Wang, Rui Hu, Zhuoran Li, Longbo Huang

Outline

Generative Flow Networks (GFlowNets) are effective for sampling diverse, high-reward objects. However, in real-world environments where new reward queries are unavailable, they must be trained from offline datasets. Existing proxy-based training methods are vulnerable to error propagation, while existing proxy-free approaches employ coarse-grained constraints that limit exploration. To address these issues, this paper proposes Trajectory-Distilled GFlowNet (TD-GFN), a novel proxy-free training framework. TD-GFN learns dense, transition-level edge rewards from offline trajectories via inverse reinforcement learning, providing rich structural guidance for efficient exploration. Crucially, to ensure robustness, these rewards are indirectly used to guide the policy through DAG pruning and prioritized backward sampling of training trajectories. This ensures that the final gradient update relies solely on the ground-truth final rewards from the dataset, preventing error propagation. Experimental results demonstrate that TD-GFN significantly outperforms a wide range of existing baselines in both convergence speed and final sample quality, establishing a more robust and efficient paradigm for offline GFlowNet training.

Takeaways, Limitations

Takeaways:
A novel proxy-free methodology for offline GFlowNet training (TD-GFN) is presented.
Learning edge rewards through inverse reinforcement learning to provide structural guidance for efficient exploration.
Preventing error propagation through DAG pruning and prioritized backward sampling.
Achieve improved convergence speed and sample quality compared to existing methodologies.
Limitations:
There is no Limitations specified in the paper.
👍