Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Pullback Flow Matching on Data Manifolds

Created by
  • Haebom

Author

Friso de Kruiff, Erik Bekkers, Ozan Oktem, Carola-Bibiane Sch onlieb, Willem Diepeveen

Outline

Pullback Flow Matching (PFM) is a novel framework for generative modeling on data manifolds. Unlike traditional Riemannian Flow Matching (RFM) models that assume or learn a restricted closed manifold mapping, PFM utilizes pullback geometry and equidistant learning to preserve the geometry of the underlying manifold while enabling efficient generation and precise interpolation in the latent space. This approach not only facilitates closed mapping on the data manifold, but also allows for a designable latent space using assumed metrics on both the data and latent manifolds. By improving equidistant learning with Neural ODEs and proposing scalable training objectives, we achieve a latent space more suitable for interpolation, resulting in improved manifold learning and generation performance. We demonstrate the effectiveness of PFM through applications to synthetic data, protein dynamics, and protein sequence data, generating novel proteins with specific properties. This method has strong potential in drug discovery and materials science, where generating novel samples with specific properties is important.

Takeaways, Limitations

Takeaways:
We present an efficient generative modeling framework that preserves the geometric structure of data manifolds.
Enables precise interpolation in latent space.
Allows for design potential space.
Improving equidistant learning using Neural ODEs.
Suggesting potential applications in drug discovery and materials science.
Demonstrating the possibility of creating new proteins with specific properties.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Applicability and performance evaluation for various data types is required.
Need for analysis of scalability and computational cost for high-dimensional data.
Further validation of performance and limitations in real-world applications is needed.
👍