Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

VisioFirm: Cross-Platform AI-assisted Annotation Tool for Computer Vision

Created by
  • Haebom

Author

Safouane El Ghazouali, Umberto Michelucci

Outline

VisioFirm is an open-source web application that streamlines image labeling through AI-powered automation. It integrates state-of-the-art foundational models, such as CLIP and Ultralytics models, and Grounding DINO, to generate initial annotations and maximize recall using a low confidence threshold. Users can refine annotations with interactive tools that support bounding boxes, oriented bounding boxes, and polygons, and it also offers real-time segmentation using Segment Anything, accelerated by WebGPU. It supports multiple export formats, including YOLO, COCO, Pascal VOC, and CSV, and operates offline after model caching. Benchmarks on various datasets have shown that it reduces manual effort by up to 90% while maintaining high annotation accuracy.

Takeaways, Limitations

Takeaways:
AI-based automation can significantly improve the efficiency of image labeling.
High flexibility with support for various annotation types (bounding box, oriented bounding box, polygon, segmentation) and export formats.
Improved accessibility by supporting offline operation.
It shows an effect of reducing the workload by up to 90% compared to existing manual labeling.
It is open source and can be used by anyone.
Limitations:
Current performance is based on test results for COCO type classes, and performance on other types of datasets requires further validation.
If the initial prediction is not accurate, significant corrections may be required by the user.
There is a possibility of reduced annotation accuracy for complex images or special classes.
Optimal performance only in WebGPU-supported browser environments.
👍