Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

Quantifying Label-Induced Bias in Large Language Model Self- and Cross-Evaluations

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Language Models and Logic Programs for Trustworthy Financial Reasoning

Occlusion Robustness of CLIP for Military Vehicle Classification

SPGrasp: Spatiotemporal Prompt-driven Grasp Synthesis in Dynamic Scenes

DrugReasoner: Interpretable Drug Approval Prediction with a Reasoning-augmented Language Model

Dynamic Fusion Multimodal Network for SpeechWellness Detection

Agentic AI for Software: thoughts from Software Engineering community

CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models

LLM Assertiveness can be Mechanistically Decomposed into Emotional and Logical Components

ONG: Orthogonal Natural Gradient Descent

Tri-Accel: Curvature-Aware Precision-Adaptive and Memory-Elastic Optimization for Efficient GPU Usage

GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts Model

Bridging Generalization and Personalization in Wearable Human Activity Recognition via On-Device Few-Shot Learning

SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

Adaptively Robust LLM Inference Optimization under Prediction Uncertainty

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

FLAIR: Frequency- and Locality-Aware Implicit Neural Representations

Hierarchical Evaluation Function: A Multi-Metric Approach for Optimizing Demand Forecasting Models

Learning local and global prototypes with optimal transport for unsupervised anomaly detection and localization

Quantum Flow Matching

BConformeR: A Conformer Based on Mutual Sampling for Unified Prediction of Continuous and Discontinuous Antibody Binding Sites

Preacher: Paper-to-Video Agentic System

UQGNN: Uncertainty Quantification of Graph Neural Networks for Multivariate Spatiotemporal Prediction

Grid2Guide: A* Enabled Small Language Model for Indoor Navigation

ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection

MAQuA: Adaptive Question-Asking for Multidimensional Mental Health Screening using Item Response Theory

Class Unbiasing for Generalization in Medical Diagnosis

LLM Serving Optimization with Variable Prefill and Decode Lengths

Grid-Agent: An LLM-Powered Multi-Agent System for Power Grid Control

CF3: Compact and Fast 3D Feature Fields

MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning

A DbC Inspired Neurosymbolic Layer for Trustworthy Agent Design

Convergence Analysis of Aggregation-Broadcast in LoRA-enabled Distributed Fine-Tuning

Exploring the Application of Visual Question Answering (VQA) for Classroom Activity Monitoring

AR-LIF: Adaptive reset leaky integrate-and-fire neuron for spiking neural networks

A Markov Categorical Framework for Language Modeling

Towards Compute-Optimal Many-Shot In-Context Learning

Benchmarking LLM Privacy Recognition for Social Robot Decision Making

Diffusion Models for Time Series Forecasting: A Survey

GPI-Net: Gestalt-Guided Parallel Interaction Network via Orthogonal Geometric Consistency for Robust Point Cloud Registration

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need

Demographic-aware fine-grained classification of pediatric wrist fractures

Agentic-R1: Distilled Dual-Strategy Reasoning

Driving as a Diagnostic Tool: Scenario-based Cognitive Assessment in Older Drivers from Driving Video

MedVAL: Toward Expert-Level Medical Text Validation with Language Models

NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation

RALLY: Role-Adaptive LLM-Driven Yoked Navigation for Agentic UAV Swarms

Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design

Towards Efficient and Accurate Spiking Neural Networks via Adaptive Bit Allocation

Flow-Modulated Scoring for Semantic-Aware Knowledge Graph Completion

TPTT: Transforming Pretrained Transformers into Titans

What Is the Point of Equality in Machine Learning Fairness? Beyond Equality of Opportunity

QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety

A theoretical framework for self-supervised contrastive learning for continuous dependent data

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Auto prompt sql: a resource-efficient architecture for text-to-sql translation in constrained environments

Labelling Data with Unknown References

FinS-Pilot: A Benchmark for Online Financial RAG System

Diagnosing Reliability in Text-Guided Medical Image Editing

Speeding Up Hyper-Heuristics With Markov-Chain Operator Selection and the Only-Worsening Acceptance Operator

A versatile foundation model for cine cardiac magnetic resonance image analysis tasks

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation

Multiple LLM Agents Debate for Equitable Cultural Alignment

Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems

Can NeRFs See without Cameras?

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

The challenge of hidden gifts in multi-agent reinforcement learning

How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning

Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning

Toward Real-World Cooperative and Competitive Soccer with Quadrupedal Robot Teams

FreqSelect: Frequency-Aware fMRI-to-Image Reconstruction

ViEEG: Hierarchical Visual Neural Representation for EEG Brain Decoding

One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems

Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs

ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation

Mask-PINNs: Mitigating Internal Covariate Shift in Physics-Informed Neural Networks

ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling

FairPO: Robust Preference Optimization for Fair Multi-Label Learning

Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer

GenTorrent: Scaling Large Language Model Serving with An Overlay Network

Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation

Progent: Programmable Privilege Control for LLM Agents

A Rollout-Based Algorithm and Reward Function for Resource Allocation in Business Processes

Agent-Q: Fine-Tuning Large Language Models for Quantum Circuit Generation and Optimization

A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Disease Detection from Retinal Fundus Images

More Bang for the Buck: Process Reward Modeling with Entropy-Driven Uncertainty

LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos

Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound

Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification

Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering

General Table Question Answering via Answer-Formula Joint Generation

Open-World Skill Discovery from Unsegmented Demonstrations

To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging

MOHPER: Multi-objective Hyperparameter Optimization Framework for E-commerce Retrieval System

LLM Serving Optimization with Variable Prefill and Decode Lengths

Created by

Haebom

저자

Meixuan Wang, Yinyu Ye, Zijie Zhou

개요

본 논문은 이종의 prefill 및 decode 길이를 갖는 LLM 요청을 처리하는 문제를 연구합니다. LLM 서빙에서 prefill 길이는 입력 프롬프트 길이에 해당하며, KV 캐시의 초기 메모리 사용량을 결정합니다. decode 길이는 순차적으로 생성되는 출력 토큰의 수를 나타내며, 각 토큰이 추가될 때마다 KV 캐시 메모리 사용량이 1단위씩 증가합니다. n개의 요청 집합이 주어졌을 때, 총 완료 시간을 최소화하기 위해 요청을 스케줄링하고 처리하는 것을 목표로 합니다. 본 논문은 배치, 배치 제약, 선행 관계, 선형적으로 증가하는 메모리 사용량의 상호 작용으로 인해 이 문제가 NP-hard임을 보입니다. 흔히 사용되는 FCFS 및 SF 스케줄링 전략을 분석하고, 그 경쟁 비율이 메모리 제한에 따라 sublinear하게 증가한다는 것을 증명합니다(이는 메모리 수요가 큰 실제 환경에서는 상당한 단점). 이를 해결하기 위해 시간에 따라 효율적으로 배치를 형성하는 새로운 선택 지표를 기반으로 하는 새로운 알고리즘을 제안하고, 이 알고리즘이 일정한 경쟁 비율을 달성한다는 것을 증명합니다. 마지막으로, 동적 프로그래밍 변형, 지역 탐색 방법 및 LP 기반 스케줄러를 포함한 이 접근 방식에서 영감을 받은 몇 가지 알고리즘 변형을 개발하고 평가하여 포괄적인 시뮬레이션을 통해 표준 기준보다 성능이 우수하고 계산 효율성을 유지함을 보여줍니다.

시사점, 한계점

•

시사점: LLM 요청 처리의 효율성을 획기적으로 개선할 수 있는 새로운 알고리즘을 제시하고, 이론적 분석과 실험적 결과를 통해 그 효과를 입증했습니다. 시간에 따라 효율적으로 배치를 형성하는 새로운 선택 지표는 실제 LLM 서빙 시스템에 적용 가능한 실용적인 해결책을 제공합니다. 다양한 알고리즘 변형을 통해 다양한 환경에 적용 가능성을 높였습니다.

•

한계점: 제안된 알고리즘의 성능은 시뮬레이션 결과에 기반하며, 실제 LLM 서빙 시스템 환경에서의 성능은 추가적인 실험을 통해 검증되어야 합니다. 알고리즘의 복잡도에 대한 분석이 더 필요할 수 있습니다. 특정 유형의 요청 분포에 대해서는 최적의 성능을 보장하지 못할 수 있습니다.

Made with Slashpage