This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
NinA: Normalizing Flows in Action. Training VLA Models with Normalizing Flows
Created by
Haebom
Author
Denis Tarasov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Nikita Lyubaykin, Andrei Polubarov, Alexander Derevyagin, Vladislav Kurenkov
Outline
We propose Normalizing Flows in Action (NinA), a novel approach leveraging Normalizing Flow (NF) for fast inference in Vision-Language-Action (VLA) models. NinA replaces the existing diffusion-based action decoder, enabling sampling with a single transformation and reducing inference time. Integrating it into the FLOWER VLA architecture and testing it on the LIBERO benchmark, we demonstrate that it performs on par with diffusion-based decoders while achieving significantly faster inference speeds.
Takeaways, Limitations
•
Takeaways:
◦
NinA presents the possibility of dramatically improving inference speed by replacing diffusion-based decoders in VLA models.
◦
It increases the applicability of the VLA model in real environments where high-frequency control is required without performance degradation.
•
Limitations:
◦
In this paper, we need to verify whether NinA shows the same performance on other VLA architectures and benchmarks.
◦
NinA's generalization performance and suitability for various complex tasks should be further evaluated.