Retrieval Augmented Generation (RAG) systems struggle to handle domain-specific knowledge due to the poor performance of pre-trained embeddings and the excessive computational overhead of large-scale language model (LLM)-based retrievers. While fine-tuning data-augmented embedding models offers a promising direction, their effectiveness is limited by the need for high-quality training data and a reliable chunking strategy that maintains contextual integrity. In this paper, we propose Language Model Augmented Retriever (LMAR), a model-agnostic framework that combines LLM-based data synthesis, contrastive embedding adaptation, and efficient text clustering. LMAR consists of a two-stage pipeline: (1) triplet sampling and synthetic data augmentation, where the LLM acts as a labeler and verifier to ensure high-fidelity supervision throughout the pipeline. Experimental results demonstrate that LMAR outperforms several baseline models on several domain-specific benchmark datasets while maintaining reasonable hardware requirements and low latency. Its model-agnostic nature allows for seamless integration with the new RAG architecture and text embedding models, enabling continuous improvement without pipeline redesign. These results highlight that LMAR is a practical and cost-effective solution for scalable domain-specific adaptation.