This paper presents an efficient model optimization technique to solve the energy consumption, memory usage, and latency issues that arise when deploying large-scale AI models in resource-constrained environments. We propose a systematic ontology matching method using a state-of-the-art Transformer-based model by leveraging the cosine-based semantic similarity between non-expert medical terms and the UMLS Metathesaurus. We perform model optimization using Microsoft Olive and ONNX Runtime, Intel Neural Compressor, and IPEX, and evaluate it by applying it to two tasks of the DEFT 2020 evaluation campaign. As a result, we achieve an average 20x increase in inference speed and about 70% reduction in memory usage while surpassing the previous state-of-the-art performance.