[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Search-Optimized Quantization in Biomedical Ontology Alignment

Created by
  • Haebom

Author

Oussama Bouaggad, Natalia Grabar

Outline

This paper presents an efficient model optimization technique to solve the energy consumption, memory usage, and latency issues that arise when deploying large-scale AI models in resource-constrained environments. We propose a systematic ontology matching method using a state-of-the-art Transformer-based model by leveraging the cosine-based semantic similarity between non-expert medical terms and the UMLS Metathesaurus. We perform model optimization using Microsoft Olive and ONNX Runtime, Intel Neural Compressor, and IPEX, and evaluate it by applying it to two tasks of the DEFT 2020 evaluation campaign. As a result, we achieve an average 20x increase in inference speed and about 70% reduction in memory usage while surpassing the previous state-of-the-art performance.

Takeaways, Limitations

Takeaways:
Presenting a systematic methodology for efficient optimization of large-scale AI models.
Achieving a new state-of-the-art for ontology alignment problems in healthcare.
Results that dramatically improve inference speed and memory usage.
Presentation of effective utilization methods for various optimization tools (Microsoft Olive, ONNX Runtime, Intel Neural Compressor, IPEX).
Limitations:
Further research is needed on the generalizability of the presented methodology (dependence on specific medical data and models).
Applicability to other domains or other types of models needs to be verified.
Need to review dependencies on specific versions of the optimization tools used and compatibility with future versions.
A more in-depth analysis of the accuracy degradation that may occur during the optimization process is needed.
👍