Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons

MambaNeXt-YOLO: A Hybrid State Space Model for Real-time Object Detection

Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving

FOLIAGE: Towards Physical Intelligence World Models Via Unbounded Surface Evolution

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Retrieval-Augmented Generation as Noisy In-Context Learning: A Unified Theory and Risk Bounds

DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving

Flexiffusion: Training-Free Segment-Wise Neural Architecture Search for Efficient Diffusion Models

Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction

Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models

Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy

ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration

ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning

Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems

Breaking the Cloak! Unveiling Chinese Cloaked Toxicity with Homophone Graph and Toxic Lexicon

Inclusive, Differentially Private Federated Learning for Clinical Data

Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs)

Electrolyzers-HSI: Close-Range Multi-Scene Hyperspectral Imaging Benchmark Dataset

In-context Language Learning for Endangered Languages in Speech Recognition

Rethinking Text-based Protein Understanding: Retrieval or LLM?

DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

Deconstructing Obfuscation: A four-dimensional framework for evaluating Large Language Models assembly code deobfuscation capabilities

STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution

FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models

Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation

The Role of Diversity in In-Context Learning for Large Language Models

MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering

Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

SUS backprop: linear backpropagation algorithm for long inputs in transformers

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Information Science Principles of Machine Learning: A Causal Chain Meta-Framework Based on Formalized Information Mapping

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule

Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge

AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents

Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress

Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization

Goal-Oriented Time-Series Forecasting: Foundation Framework Design

Biased by Design: Leveraging Inherent AI Biases to Enhance Critical Thinking of News Readers

Hearing Anywhere in Any Environment

NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark

LLM Social Simulations Are a Promising Research Method

Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions

COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing

On The Sample Complexity Bounds In Bilevel Reinforcement Learning

TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

ChatWise: A Strategy-Guided Chatbot for Enhancing Cognitive Support in Older Adults

Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment

Knowledge Retention for Continual Model-Based Reinforcement Learning

When Claims Evolve: Evaluating and Enhancing the Robustness of Embedding Models Against Misinformation Edits

ATLaS: Agent Tuning via Learning Critical Steps

Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable

From Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors

TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice

SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference

Contrastive Visual Data Augmentation

SNaRe: Domain-aware Data Generation for Low-Resource Event Detection

AnyTop: Character Animation Diffusion with Any Topology

Towards Robust ESG Analysis Against Greenwashing Risks: Aspect-Action Analysis with Cross-Category Generalization

GoRA: Gradient-driven Adaptive Low Rank Adaptation

Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models

Bandit Multiclass List Classification

CurvGAD: Leveraging Curvature for Enhanced Graph Anomaly Detection

Can Large Language Models Understand Intermediate Representations in Compilers?

Contrastive Representation Distillation via Multi-Scale Feature Decoupling

Blackout DIFUSCO

Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs

Analytical Lyapunov Function Discovery: An RL-based Generative Approach

Explainability in Practice: A Survey of Explainable NLP Across Various Domains

Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions

Spectro-Riemannian Graph Neural Networks

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Efficiently Serving Large Multimodal Models Using EPD Disaggregation

Hybrid deep convolution model for lung cancer detection with transfer learning

Uncovering Memorization Effect in the Presence of Spurious Correlations

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

An analytic theory of creativity in convolutional diffusion models

HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing

From Intention To Implementation: Automating Biomedical Research via LLMs

Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting

MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost

Functional relevance based on the continuous Shapley value

Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow Networks

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Focus On This, Not That! Steering LLMs with Adaptive Feature Specification

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences

Beyond Position: the emergence of wavelet-like properties in Transformers

The Disparate Benefits of Deep Ensembles

Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing

Evaluating Morphological Compositional Generalization in Large Language Models

Stein Variational Evolution Strategies

Biased AI can Influence Political Decision-Making

HashAttention: Semantic Sparsity for Faster Inference

Created by

Haebom

저자

Aditya Desai, Shuo Yang, Alejandro Cuadron, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

개요

본 논문은 장문맥스트를 활용하는 고급 AI 시스템에서 확장성 문제를 야기하는 어텐션 계산의 효율성을 높이는 방법을 제시합니다. 기존의 scaled dot-product attention (SDPA)는 일부 중요 토큰만 출력에 크게 기여하는 토큰 스파스성을 가지지만, 이를 효과적으로 활용하는 것은 어려웠습니다. 기존 방법들은 품질 저하 또는 추가 자원 소모라는 문제가 있었습니다. 본 논문에서는 중요 토큰 식별을 Maximum Inner Product Search (MIPS) 문제로 정의하고, GPU 친화적이지 않고 query와 key 분포의 차이로 성능이 저하되는 기존 MIPS 솔루션의 문제점을 지적합니다. 대신, 중요 토큰 식별을 추천 문제로 재구성하는 HashAttention을 제안합니다. HashAttention은 학습된 매핑 함수를 사용하여 해밍 공간에서 키와 쿼리를 인코딩하여 의미적 유사성을 포착하고, 비트 연산을 통해 효율적으로 중요 토큰을 식별하여 어텐션을 계산합니다. 일반 데이터로 학습된 HashAttention은 최소한의 품질 손실로 최대 16배의 토큰 사용량을 줄이며, 토큰당 32비트의 보조 메모리만 필요합니다. 특정 작업에 대한 미세 조정을 통해 스파스성을 32배까지 향상시킬 수 있으며, A100 GPU에서 32배 스파스성을 달성하여 GPT-FAST와 FlashDecode의 어텐션 지연 시간을 각각 최대 4.3배 및 2.54배 줄이고, GPT-FAST의 처리량을 최대 3.12배 향상시킵니다.

시사점, 한계점

•

시사점:

◦

HashAttention은 어텐션 계산의 효율성을 획기적으로 높여 장문맥스트 처리의 확장성 문제를 해결하는 데 기여할 수 있습니다.

◦

비트 연산 기반의 효율적인 중요 토큰 식별 알고리즘을 제시하여 GPU 환경에서의 성능 향상을 가져옵니다.

◦

일반 데이터 학습과 특정 작업 미세 조정을 통해 다양한 상황에 적용 가능성을 보여줍니다.

◦

GPT-FAST와 FlashDecode에서의 실험 결과는 HashAttention의 실질적인 성능 향상을 증명합니다.

•

한계점:

◦

HashAttention의 성능 향상은 특정 모델(GPT-FAST, FlashDecode)과 GPU(A100) 환경에서의 결과에 기반하므로, 다른 모델이나 하드웨어 환경에서는 성능이 달라질 수 있습니다.

◦

32비트의 보조 메모리 사용은 상대적으로 적은 양이지만, 매우 큰 모델에서는 여전히 상당한 추가 메모리 부담이 될 수 있습니다.

◦

일반 데이터 학습과 특정 작업 미세 조정 모두 필요하다는 점이 추가적인 비용을 발생시킬 수 있습니다.

◦

해밍 공간에서의 인코딩이 의미적 유사성을 완벽하게 포착하지 못할 가능성이 있으며, 이로 인해 어텐션 품질 저하가 발생할 수 있습니다.

Made with Slashpage