[공지사항]을 빙자한 안부와 근황

Show more

Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Merge Kernel for Bayesian Optimization on Permutation Space

Demographic-aware fine-grained classification of pediatric wrist fractures

Generative Multi-Target Cross-Domain Recommendation

ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle

Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models

A Simple Baseline for Stable and Plastic Neural Networks

WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling

From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

How Not to Detect Prompt Injections with an LLM

Critiques of World Models

The role of large language models in UI/UX design: A systematic literature review

LearnLens: LLM-Enabled Personalised, Curriculum-Grounded Feedback with Educators in the Loop

STACK: Adversarial Attacks on LLM Safeguard Pipelines

ZonUI-3B: A Lightweight Vision-Language Model for Cross-Resolution GUI Grounding

Understanding Reasoning in Thinking Language Models via Steering Vectors

Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation

TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis

SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet

Exploring Graph Representations of Logical Forms for Language Modeling

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs

CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors

Hands-On: Segmenting Individual Signs from Continuous Sequences

Can we ease the Injectivity Bottleneck on Lorentzian Manifolds for Graph Neural Networks?

Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation

HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation

AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results

An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

Evaluating link prediction: New perspectives and recommendations

Learning to Reason at the Frontier of Learnability

Stonefish: Supporting Machine Learning Research in Marine Robotics

Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning

On the Transfer of Knowledge in Quantum Algorithms

Code Readability in the Age of Large Language Models: An Industrial Case Study from Atlassian

Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude

ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems

Consistency of Responses and Continuations Generated by Large Language Models on Social Media

From Code to Compliance: Assessing ChatGPT's Utility in Designing an Accessible Webpage -- A Case Study

Temporal reasoning for timeline summarisation in social media

Invisible Textual Backdoor Attacks based on Dual-Trigger

Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models

Two-Stage Pretraining for Molecular Property Prediction in the Wild

Towards Practical Operation of Deep Reinforcement Learning Agents in Real-World Network Management at Open RAN Edges

An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots

Bridging Local and Global Knowledge via Transformer in Board Games

Entropy Loss: An Interpretability Amplifier of 3D Object Detection Network for Intelligent Driving

FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation

On Pre-training of Multimodal Language Models Customized for Chart Understanding

Visual Grounding Methods for Efficient Interaction with Desktop Graphical User Interfaces

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation

SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings

Improved DDIM Sampling with Moment Matching Gaussian Mixtures

Eye-tracked Virtual Reality: A Comprehensive Survey on Methods and Privacy Challenges

From Roots to Rewards: Dynamic Tree Reasoning with RL

Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light

Instance space analysis of the capacitated vehicle routing problem

Multi-Agent LLMs as Ethics Advocates for AI-Based Systems

GATSim: Urban Mobility Simulation with Generative Agents

Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Strategic Reflectivism In Intelligent Systems

SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator

What the F*ck Is Artificial General Intelligence?

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios

To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization

BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems

UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception

CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis

Toward Temporal Causal Representation Learning with Tensor Decomposition

Kolmogorov Arnold Networks (KANs) for Imbalanced Data -- An Empirical Perspective

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track

Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment

The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?

DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits

Edge Intelligence with Spiking Neural Networks

VLA-Mark: A cross modal watermark for large vision-language alignment model

Noradrenergic-inspired gain modulation attenuates the stability gap in joint training

A multi-strategy improved snake optimizer for three-dimensional UAV path planning and engineering problems

Photonic Fabric Platform for AI Accelerators

OrthoInsight: Rib Fracture Diagnosis and Report Generation Based on Multi-Modal Large Models

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

A segmented robot grasping perception neural network for edge AI

Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need

DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation

Exploiting Primacy Effect To Improve Large Language Models

Generalist Forecasting with Frozen Video Models via Latent Diffusion

Convergent transformations of visual representation in brains and models

Preprint: Did I Just Browse A Website Written by LLMs?

The Levers of Political Persuasion with Conversational AI

Political Leaning and Politicalness Classification of Texts

Self-supervised learning on gene expression data

Using LLMs to identify features of personal and professional skills in an open-response situational judgment test

AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results

Created by

Haebom

저자

Dawar Khan, Xinyu Liu, Omar Mena, Donggang Jia, Alexandre Kouyoumdjian, Ivan Viola

개요

본 논문은 확장 현실(XR) 기기에서 대규모 언어 모델(LLM)의 배포를 위한 포괄적인 평가 프레임워크인 AIvaluateXR을 제시합니다. Magic Leap 2, Meta Quest 3, Vivo X100s Pro, Apple Vision Pro 등 네 가지 XR 플랫폼에서 17개의 LLM을 배포하여 성능 일관성, 처리 속도, 메모리 사용량, 배터리 소모량 등 네 가지 주요 지표를 측정했습니다. 문자열 길이, 배치 크기, 스레드 수를 변경하며 각 모델-기기 조합(68개)의 성능을 평가하고, 실시간 XR 애플리케이션을 위한 트레이드오프를 분석했습니다. 3D 파레토 최적성 이론을 기반으로 최적의 기기-모델 조합을 선택하는 통합 평가 방법을 제안하고, 온디바이스 LLM과 클라이언트-서버 및 클라우드 기반 설정의 효율성을 비교하며 두 가지 대화형 작업에 대한 정확도를 평가했습니다. XR 기기에서 LLM 배포를 위한 향후 최적화 노력을 안내하는 데 귀중한 통찰력을 제공하며, 본 평가 방법은 이 신흥 분야의 추가 연구 및 개발을 위한 표준 기반으로 사용될 수 있습니다. 소스 코드와 보충 자료는 www.nanovis.org/AIvaluateXR.html에서 제공됩니다.

시사점, 한계점

•

시사점:

◦

XR 기기에서 LLM 배포를 위한 포괄적인 평가 프레임워크 AIvaluateXR을 제공합니다.

◦

다양한 XR 기기와 LLM에 대한 실험적 평가 결과를 제시하여 최적의 기기-모델 조합 선택에 대한 통찰력을 제공합니다.

◦

온디바이스 LLM, 클라이언트-서버, 클라우드 기반 설정의 효율성 비교를 통해 실용적인 배포 전략 선택에 도움을 줍니다.

◦

3D 파레토 최적성 이론 기반의 통합 평가 방법은 향후 연구의 표준 기반으로 활용될 수 있습니다.

•

한계점:

◦

평가에 사용된 LLM과 XR 기기의 종류가 제한적일 수 있습니다. 더 다양한 모델과 기기를 포함한 추가 연구가 필요합니다.

◦

평가 지표가 성능, 속도, 메모리, 배터리 소모량에 국한되어 있습니다. 사용자 경험, 지연 시간 등 다른 중요한 요소에 대한 고려가 부족할 수 있습니다.

◦

특정 XR 애플리케이션에 대한 최적화된 LLM 선택에 대한 구체적인 가이드라인이 부족할 수 있습니다.

◦

실제 사용 환경의 복잡성을 완전히 반영하지 못할 수 있습니다.

Made with Slashpage