Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

AutoChemSchematic AI: A Closed-Loop, Physics-Aware Agentic Framework for Auto-Generating Chemical Process and Instrumentation Diagrams

LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training

DLP: Dynamic Layerwise Pruning in Large Language Models

MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

Estimating LLM Consistency: A User Baseline vs Surrogate Metrics

Mind the Gap: A Practical Attack on GGUF Quantization

Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation

Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles

Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms

SWE-bench Goes Live!

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Matryoshka Model Learning for Improved Elastic Student Models

Context-Robust Knowledge Editing for Language Models

HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models

iDSE: Navigating Design Space Exploration in High-Level Synthesis Using LLMs

Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection

RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Hume: Introducing System-2 Thinking in Visual-Language-Action Model

Universal Value-Function Uncertainties

How Do Transformers Learn Variable Binding in Symbolic Programs?

Adversarial bandit optimization for approximately linear functions

Hierarchical Retrieval with Evidence Curation for Open-Domain Financial Question Answering on Standardized Documents

Homophily Enhanced Graph Domain Adaptation

NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline

An Interpretable Representation Learning Approach for Diffusion Tensor Imaging

InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts

Moderating Harm: Benchmarking Large Language Models for Cyberbullying Detection in YouTube Comments

Security Concerns for Large Language Models: A Survey

A Survey of LLM $\times$ DATA

Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain

NMCSE: Noise-Robust Multi-Modal Coupling Signal Estimation Method via Optimal Transport for Cardiovascular Disease Detection

FRIREN: Beyond Trajectories -- A Spectral Lens on Time

A Fully Generative Motivational Interviewing Counsellor Chatbot for Moving Smokers Towards the Decision to Quit

TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling

Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection

Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Replay Attacks Against Audio Deepfake Detection

Forensic deepfake audio detection using segmental speech features

A3 : an Analytical Low-Rank Approximation Framework for Attention

A Survey of 3D Reconstruction with Event Cameras

Confabulation dynamics in a reservoir computer: Filling in the gaps with untrained attractors

WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales

Deep Learning Framework for Infrastructure Maintenance: Crack Detection and High-Resolution Imaging of Infrastructure Surfaces

GAME: Learning Multimodal Interactions via Graph Structures for Personality Trait Estimation

LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

Motion-compensated cardiac MRI using low-rank diffeomorphic flow (DMoCo)

OODTE: A Differential Testing Engine for the ONNX Optimizer

Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization

Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video

Depth-Constrained ASV Navigation with Deep RL and Limited Sensing

(Im)possibility of Automated Hallucination Detection in Large Language Models

Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward

Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs

The Structural Safety Generalization Problem

Parameterized Synthetic Text Generation with SimpleStories

SD$^2$: Self-Distilled Sparse Drafters

Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games

Semantic-guided Representation Learning for Multi-Label Recognition

SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement

Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation

A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates

Understanding Inequality of LLM Fact-Checking over Geographic Regions with Agent and Retrieval models

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

A Survey on Event-driven 3D Reconstruction: Development under Different Categories

VecTrans: Enhancing Compiler Auto-Vectorization through LLM-Assisted Code Transformations

REALM: A Dataset of Real-World LLM Use Cases

Opportunities and Challenges of Frontier Data Governance With Synthetic Data

ARFlow: Human Action-Reaction Flow Matching with Physical Guidance

Position: Beyond Assistance - Reimagining LLMs as Ethical and Adaptive Co-Creators in Mental Health Care

Redefining Toxicity: An Objective and Context-Aware Approach for Stress-Level-Based Detection

A Dual-Directional Context-Aware Test-Time Learning for Text Classification

MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance

ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning

FMNet: Frequency-Assisted Mamba-Like Linear Attention Network for Camouflaged Object Detection

NFIG: Autoregressive Image Generation with Next-Frequency Prediction

GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

CSTRL: Context-Driven Sequential Transfer Learning for Abstractive Radiology Report Summarization

Wanda++: Pruning Large Language Models via Regional Gradients

HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation

Optimizing Multi-Hop Document Retrieval Through Intermediate Representations

Causally Reliable Concept Bottleneck Models

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models

Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models

POPGym Arcade: Parallel Pixelated POMDPs

Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems

Quantifying First-Order Markov Violations in Noisy Reinforcement Learning: A Causal Discovery Approach

SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models

Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases

Faithful Logic Embeddings in HOL -- Deep and Shallow

TestNUC: Enhancing Test-Time Computing Approaches and Scaling through Neighboring Unlabeled Data Consistency

Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis

Created by

Haebom

저자

Hong Huang, Dapeng Wu

개요

본 논문은 자원 제약이 있는 개인 기기에서 대규모 언어 모델(LLM)의 배치를 저해하는 과도한 계산 및 메모리 요구 사항을 해결하기 위해, 양자화를 통해 효율성을 높이는 새로운 방법을 제시합니다. 기존 양자화 방법의 한계인 성능과 오버헤드 간의 균형 문제, 활성화 이상치 처리 문제를 해결하기 위해, Outlier Spatial Stability Hypothesis (OSSH)를 제안합니다. OSSH를 기반으로, 저정밀 활성화 표현을 최적화하는 매개변수 효율적인 미세 조정 프레임워크인 Quaff를 제시합니다. Quaff는 경량 연산을 사용하여 불변 채널에서만 이상치를 동적으로 억제하여, 전정밀 가중치 저장 및 전역 재조정 없이 양자화 오류를 줄입니다. 10개의 벤치마크에 대한 광범위한 실험을 통해 OSSH의 유효성과 Quaff의 효과를 검증합니다. 특히 GPQA 추론 벤치마크에서 Quaff는 전정밀 미세 조정에 비해 1.73배의 지연 시간 감소와 30%의 메모리 절약을 달성하면서 Phi-3 모델에서 정확도를 0.6% 향상시켰습니다. 이는 효율성, 성능, 배포 가능성 간의 삼중 절충을 해결한 것입니다. Quaff는 모델 유용성을 희생하지 않고 소비자급 GPU 미세 조정을 가능하게 하여 개인화된 LLM 배포를 민주화합니다. 코드는 Github에서 공개됩니다.

시사점, 한계점

•

시사점:

◦

소비자급 GPU에서의 LLM 미세 조정을 가능하게 함으로써 개인화된 LLM 배포를 민주화합니다.

◦

기존 양자화 방법의 성능과 효율성 간의 trade-off 문제를 효과적으로 해결합니다.

◦

GPQA 추론 벤치마크에서 상당한 지연 시간 감소와 메모리 절약, 정확도 향상을 달성했습니다.

◦

OSSH라는 새로운 가설을 제시하고 이를 통해 효율적인 양자화 방법을 제안합니다.

•

한계점:

◦

OSSH 가설의 일반성 및 다양한 모델/데이터셋에 대한 적용 가능성에 대한 추가 연구가 필요합니다.

◦

제안된 방법의 효과는 특정 벤치마크 및 모델에 국한될 수 있습니다. 더욱 다양한 환경에서의 성능 평가가 필요합니다.

◦

소비자급 GPU를 기준으로 하였으므로, 더욱 제한된 자원 환경에서의 성능은 추가적으로 검증되어야 합니다.

Made with Slashpage