VisioFirm is an open-source web application that streamlines image labeling through AI-powered automation. It integrates state-of-the-art foundational models, such as CLIP and Ultralytics models, and Grounding DINO, to generate initial annotations and maximize recall using a low confidence threshold. Users can refine annotations with interactive tools that support bounding boxes, oriented bounding boxes, and polygons, and it also offers real-time segmentation using Segment Anything, accelerated by WebGPU. It supports multiple export formats, including YOLO, COCO, Pascal VOC, and CSV, and operates offline after model caching. Benchmarks on various datasets have shown that it reduces manual effort by up to 90% while maintaining high annotation accuracy.