Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

ACCeLLiuM: Supervised Fine-Tuning for Automated OpenACC Pragma Generation

AnchDrive: Bootstrapping Diffusion Policies with Hybrid Trajectory Anchors for End-to-End Driving

Diffusion-Augmented Contrastive Learning: A Noise-Robust Encoder for Biosignal Representations

FusedANN: Convexified Hybrid ANN via Attribute-Vector Fusion

HiCoLoRA: Addressing Context-Prompt Misalignment via Hierarchical Collaborative LoRA for Zero-Shot DST

A Longitudinal Randomized Control Study of Companion Chatbot Use: Anthropomorphism and Its Mediating Role on Social Impacts

TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding

Automated Facility Enumeration for Building Compliance Checking using Door Detection and Large Language Models

Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling

Comparing RAG and GraphRAG for Page-Level Retrieval Question Answering on Math Textbook

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models

Recent Advancements in Microscopy Image Enhancement using Deep Learning: A Survey

Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity

Towards a Physics Foundation Model

Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews

Positional Encoding via Token-Aware Phase Attention

Chain or tree? Re-evaluating complex reasoning from the perspective of a matrix of thought

A Two-Stage Strategy for Mitosis Detection Using Improved YOLO11x Proposals and ConvNeXt Classification

JudgeAgent: Knowledge-wise and Dynamic LLM Evaluation with Agent-as-Interviewer

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

Scalable Option Learning in High-Throughput Environments

"She was useful, but a bit too optimistic": Augmenting Design with Interactive Virtual Personas

In-Context Algorithm Emulation in Fixed-Weight Transformers

Dream to Chat: Model-based Reinforcement Learning on Dialogues with User Belief Modeling

Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders

Conflict-Aware Soft Prompting for Retrieval-Augmented Generation

ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification

StreetReaderAI: Making Street View Accessible Using Context-Aware Multimodal AI

Graph is a Natural Regularization: Revisiting Vector Quantization for Graph Representation Learning

Intuition emerges in Maximum Caliber models at criticality

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy

Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models

Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning

SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy

DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS

Generative Logic: A New Computer Architecture for Deterministic Reasoning and Knowledge Generation

Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles

Hierarchical Graph Neural Network for Compressed Speech Steganalysis

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

The Invisible Leash: Why RLVR May or May Not Escape Its Origin

APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

KV Cache Steering for Controlling Frozen LLMs

Lightweight MSA Design Advances Protein Folding From Evolutionary Embeddings

Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders

Neural-Network solver of ideal MHD equilibria

Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective

Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs

On the Necessity of Output Distribution Reweighting for Effective Class Unlearning

TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

Latent Concept Disentanglement in Transformer-based Language Models

Personalized LLM Decoding via Contrasting Personal Preference

Exploiting Block Coordinate Descent for Cost-Effective LLM Model Training

Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox

Think With Videos For Agentic Long-Video Understanding

VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks

Position: Simulating Society Requires Simulating Thought

AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification

DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models

Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement

Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

Physics-Guided Motion Loss for Video Generation Model

Probing Neural Topology of Large Language Models

InfiMed: Low-Resource Medical MLLMs with Advancing Understanding and Reasoning

Mamba Integrated with Physics Principles Masters Long-term Chaotic System Forecasting

Model-Preserving Adaptive Rounding

DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training

Spectral-inspired Operator Learning with Limited Data and Unknown Physics

BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases

Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

Prompting is not Enough: Exploring Knowledge Integration and Controllable Generation on Large Language Models

HD-PiSSA: High-Rank Distributed Orthogonal Adaptation

Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study

FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning

BP-Seg: A graphical model approach to unsupervised and non-contiguous text segmentation using belief propagation

Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning

The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

Learning Flexible Forward Trajectories for Masked Molecular Diffusion

Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems

Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

UniErase: Towards Balanced and Precise Unlearning in Language Models

Octic Vision Transformers: Quicker ViTs Through Equivariance

Intentional Gesture: Deliver Your Intentions with Gestures for Speech

UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models

VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation

Learning Hierarchical Domain Models Through Environment-Grounded Interaction

Shadow-FT: Tuning Instruct Model via Training on Paired Base Model

Structured Relational Representations

Latent Veracity Inference for Identifying Errors in Stepwise Reasoning

Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders

ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training

FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models

Created by

Haebom

저자

Ionut-Vlad Modoranu, Mher Safaryan, Erik Schultheis, Max Ryabinin, Artem Chumachenko, Dan Alistarh

개요

본 논문은 대규모 언어 모델(LLM) 훈련의 실행 시간을 개선하고 적응적 최적화기의 메모리 사용량을 줄이기 위해 저차원 공간으로 학습을 제한하는 저랭크 최적화를 제안합니다. 기존 연구는 특이값 분해(SVD) 또는 QR 분해를 기반으로 선형 계층의 기울기를 투영했지만, 각 계층에 개별적으로 적용하는 것은 계산 비용이 많이 들고 투영 행렬을 저장하기 위해 추가 메모리 비용이 발생합니다. 이 연구에서는 이산 코사인 변환(DCT)의 미리 정의된 직교 행렬을 사용하여 SVD/QR 기반 기울기 투영을 저차원 공간으로 근사하는 계산 효율적이고 간단한 2단계 절차를 제안합니다. DCT 행렬에서 각 계층의 기울기와 정렬된 열을 동적으로 선택하며, 효과적인 투영 행렬은 O(n³) 시간에 DCT 행렬과의 간단한 matmul을 통해 얻어지고, 가장 관련 있는 기저 벡터를 식별하기 위한 경량 정렬 단계가 뒤따릅니다. 대형 계층의 경우, DCT는 고속 푸리에 변환(FFT)을 기반으로 하는 Makhoul의 N-point 알고리즘을 사용하여 O(n²log(n)) 시간에 계산할 수 있습니다. 직교 기저의 미리 정의된 특성으로 인해 훈련 시작 시 한 번 계산됩니다. 사전 훈련 및 미세 조정 작업에 대한 실험 결과는 랭크에 독립적인 실행 시간을 얻으면서 비용이 많이 드는 SVD/QR 기반 방법과 일치하는 성능을 보이며, 다양한 모델 크기에서 최대 25% 더 빠른 실행 시간과 메모리 사용량 감소를 달성함을 보여줍니다.

시사점, 한계점

•

시사점:

◦

SVD/QR 기반 기울기 투영을 근사하는 계산 효율적인 방법 제안.

◦

DCT를 활용하여 훈련 시간 및 메모리 사용량 감소.

◦

랭크 독립적인 실행 시간 달성.

◦

SVD/QR 기반 방법과 유사한 성능을 보임.

•

한계점:

◦

구체적인 한계점은 논문 내용에 명시되지 않음. (요약본에 나타나지 않음)

Made with Slashpage