This paper highlights that accurate and generalizable object segmentation in ultrasound images is challenging due to anatomical variations, diverse imaging protocols, and limited annotation data. To address this challenge, we propose a prompt-based visual-language model (VLM) that integrates Grounding DINO and SAM2. Using 18 publicly available ultrasound datasets, including breast, thyroid, liver, prostate, kidney, and paraspinal muscles, Grounding DINO is fine-tuned and validated on 15 datasets using Low Rank Adaptation (LoRA) in the ultrasound domain. The remaining three datasets are used for testing to evaluate performance on unknown distributions. Experimental results demonstrate that the proposed method outperforms state-of-the-art segmentation methods, including UniverSeg, MedSAM, MedCLIP-SAM, BiomedParse, and SAMUS, on most existing datasets, maintaining robust performance even on unknown datasets without additional fine-tuning. These results highlight the promise of VLM for scalable and robust ultrasound image analysis and suggest that it can reduce the reliance on large-scale organ-specific annotation data. The code will be published at code.sonography.ai after acceptance.