This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper proposes ManipBench, a novel benchmark for evaluating low-level reasoning in robotic manipulation. While Vision-Language Models (VLMs) are primarily used as high-level planners in robotic manipulation, research on their low-level reasoning (determining precise robot actions) has also been conducted recently. ManipBench evaluates the low-level reasoning capabilities of VLMs in robotic manipulation across various aspects, including object-to-object interaction and manipulation of deformable objects. Thirty-three representative VLMs from ten model families are extensively tested on the benchmark, analyzing model performance differences and correlations with real-world manipulation tasks. This analysis reveals a significant gap between current models and human-level understanding.
Takeaways, Limitations
•
Takeaways:
◦
We present a new benchmark (ManipBench) that comprehensively evaluates the low-level robotic manipulation reasoning capabilities of VLMs.
◦
We compare and analyze the performance of various VLMs and present correlations with actual tasks.
◦
It clearly shows the difference between the current technological level of VLMs and human level.
•
Limitations:
◦
Since ManipBench is still an early stage benchmark, more models and tasks will need to be added in the future.
◦
Further review and improvement of the benchmark design and evaluation metrics may be required.
◦
Further evaluation is needed for more complex robotic manipulation tasks that are beyond the scope of current benchmarks.