This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results
Created by
Haebom
Author
Dawar Khan, Xinyu Liu, Omar Mena, Donggang Jia, Alexandre Kouyoumdjian, Ivan Viola
Outline
In this paper, we present AIvaluateXR, a comprehensive evaluation framework for deploying large-scale language models (LLMs) on extended reality (XR) devices. We deploy 17 LLMs on four XR platforms: Magic Leap 2, Meta Quest 3, Vivo X100s Pro, and Apple Vision Pro, and measure four key metrics: performance consistency, processing speed, memory usage, and battery consumption. We evaluate the performance of each model-device combination (68) by varying string length, batch size, and number of threads, and analyze the tradeoffs for real-time XR applications. We propose a unified evaluation method based on 3D Pareto optimality theory to select the optimal device-model combination, compare the efficiency of on-device LLMs with client-server and cloud-based setups, and evaluate the accuracy for two interactive tasks. This provides valuable insights to guide future optimization efforts for LLM deployment on XR devices, and our evaluation method can serve as a standard foundation for further research and development in this emerging field. Source code and supplementary material are available at www.nanovis.org/AIvaluateXR.html에서 .
Takeaways, Limitations
•
Takeaways:
◦
We provide a comprehensive evaluation framework, AIvaluateXR, for LLM deployment on XR devices.
◦
We present experimental evaluation results for a variety of XR devices and LLMs, providing insights into selecting the optimal device-model combination.
◦
Helps you choose a practical deployment strategy by comparing the effectiveness of on-device LLM, client-server, and cloud-based setups.
◦
An integrated evaluation method based on the 3D Pareto optimality theory can be used as a standard basis for future research.
•
Limitations:
◦
The types of LLM and XR devices used in the evaluation may be limited. Further research is needed that includes a wider variety of models and devices.
◦
Evaluation metrics are limited to performance, speed, memory, and battery consumption. Other important factors such as user experience and latency may be lacking.
◦
Specific guidelines for selecting an optimized LLM for a particular XR application may be lacking.
◦
It may not fully reflect the complexity of actual usage environments.