Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

CoRT: Code-integrated Reasoning within Thinking

TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding

Policy-Based Trajectory Clustering in Offline Reinforcement Learning

Understanding Human-AI Trust in Education

ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization

MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Robotic Policy Learning via Human-assisted Action Preference Optimization

LLM-D12: A Dual-Dimensional Scale of Instrumental and Relational Dependencies on Large Language Models

QuantMCP: Grounding Large Language Models in Verifiable Financial Reality

Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce

Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR

Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery

Q-Ponder: A Unified Training Pipeline for Reasoning-based Visual Quality Assessment

Sample Complexity and Representation Ability of Test-time Scaling Paradigms

Context Is Not Comprehension

High Performance Space Debris Tracking in Complex Skylight Backgrounds with a Large-Scale Dataset

SALAD: Systematic Assessment of Machine Unlearing on LLM-Aided Hardware Design

iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning

Quantum AIXI: Universal Intelligence via Quantum Information

VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use

Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization

QuXAI: Explainers for Hybrid Quantum Machine Learning Models

Convert Language Model into a Value-based Strategic Planner

PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

Token-Efficient RL for LLM Reasoning

MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark

Elucidating the Design Space of Multimodal Protein Language Models

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Content ARCs: Decentralized Content Rights in the Age of Generative AI

PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play

Computation Mechanism Behind LLM Position Generalization

CompMarkGS: Robust Watermarking for Compressed 3D Gaussian Splatting

Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges

Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations

An energy-efficient learning solution for the Agile Earth Observation Satellite Scheduling Problem

Generative Uncertainty in Diffusion Models

EgoNormia: Benchmarking Physical Social Norm Understanding

Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models

From Features to Graphs: Exploring Graph Structures and Pairwise Interactions via GNNs

Object-Centric Latent Action Learning

Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation

TransMLA: Multi-Head Latent Attention Is All You Need

Implicit Language Models are RNNs: Balancing Parallelization and Expressivity

Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty

Prompt-based Depth Pruning of Large Language Models

Great Models Think Alike and this Undermines AI Oversight

Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting

Latent Action Learning Requires Supervision in the Presence of Distractors

SR-Reward: Taking The Path More Traveled

Heterogeneous Multi-Agent Reinforcement Learning for Distributed Channel Access in WLANs

SoK: Watermarking for AI-Generated Content

Engagement-Driven Content Generation with Large Language Models

PyGen: A Collaborative Human-AI Approach to Python Package Creation

DAWN: Designing Distributed Agents in a Worldwide Network

Efficient Length-Generalizable Attention via Causal Retrieval for Long-Context Language Modeling

Center-fixing of tropical cyclones using uncertainty-aware deep learning applied to high-temporal-resolution geostationary satellite imagery

LLM-Cure: LLM-based Competitor User Review Analysis for Feature Enhancement

Deploying Open-Source Large Language Models: A performance Analysis

Neural Networks Generalize on Low Complexity Data

M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture

Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs

The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation

TimeBridge: Better Diffusion Prior Design with Bridge Models for Time Series Generation

Multi-group Uncertainty Quantification for Long-form Text Generation

Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique

Privacy-Aware Spectrum Pricing and Power Control Optimization for LEO Satellite Internet-of-Things

IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language

Incentivizing Quality Text Generation via Statistical Contracts

Visually Descriptive Language Model for Vector Graphics Reasoning

Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices

Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance

Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background Knowledge

IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection

Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation

ConvD: Attention Enhanced Dynamic Convolutional Embeddings for Knowledge Graph Completion

Noise Balance and Stationary Distribution of Stochastic Gradient Descent

The Packing Chromatic Number of the Infinite Square Grid is 15

Reinforcing Multimodal Understanding and Generation with Dual Self-rewards

A Proposal to Extend the Common Model of Cognition with Metacognition

The Optimization Paradox in Clinical AI Multi-Agent Systems

CHANCERY: Evaluating Corporate Governance Reasoning Capabilities in Language Models

DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDE

Beamforming and Resource Allocation for Delay Optimization in RIS-Assisted OFDM Systems

Evaluation of LLMs for mathematical problem solving

The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

A Heuristic Algorithm Based on Beam Search and Iterated Local Search for the Maritime Inventory Routing Problem

A Vision for Auto Research with LLM Agents

AssistanceZero: Scalably Solving Assistance Games

Don't Lag, RAG: Training-Free Adversarial Detection Using RAG

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Training-Free Safe Denoisers for Safe Use of Diffusion Models

CollabLLM: From Passive Responders to Active Collaborators

Position: Theory of Mind Benchmarks are Broken for Large Language Models

The Remarkable Robustness of LLMs: Stages of Inference?

Created by

Haebom

저자

Vedang Lad, Wes Gurnee, Max Tegmark

개요

본 논문은 추론 과정에서 인접 레이어를 삭제하거나 바꾸는 구조적 개입에 대한 대규모 언어 모델(LLM)의 강건성을 조사합니다. 놀랍게도, 모델은 미세 조정 없이 원래 최상위 1개 예측 정확도의 72-95%를 유지합니다. 성능 저하는 레이어 전반에 걸쳐 균일하지 않으며, 초기 및 최종 레이어에 대한 개입이 가장 큰 저하를 야기하는 반면, 중간 레이어를 삭제하는 것에는 상당히 강건합니다. 이러한 국소적 민감도 패턴은 다양한 모델 계열과 크기에 걸쳐 관찰되는 네 가지 추론 단계에 대한 가설을 제시합니다: (1) 로컬 컨텍스트를 통합하여 원시 토큰 임베딩을 고급 표현으로 상승시키는 디토크나이제이션, (2) 작업 및 엔티티 특정 기능을 반복적으로 개선하는 기능 엔지니어링, (3) 숨겨진 상태를 타당한 다음 토큰 예측으로 집계하는 예측 앙상블, (4) 관련 없는 기능을 억제하여 출력 분포를 최종화하는 잔여 선명화. 행동적 및 기계적 증거를 종합하여 LLM에서 깊이 의존적인 계산을 해석하기 위한 프레임워크를 제공합니다.

시사점, 한계점

•

시사점:

◦

LLM의 추론 과정에 대한 새로운 이해를 제공합니다. 특히, 레이어별 역할을 네 가지 단계로 구분하여 설명함으로써 LLM의 내부 동작 메커니즘에 대한 통찰력을 제공합니다.

◦

LLM의 구조적 강건성을 보여주어, 모델의 효율성 및 안정성에 대한 새로운 관점을 제시합니다.

◦

제시된 네 가지 추론 단계 프레임워크는 향후 LLM 설계 및 최적화 연구에 기여할 수 있습니다.

•

한계점:

◦

분석에 사용된 LLM의 종류와 크기에 대한 제한이 있을 수 있습니다. 다양한 모델에 대한 일반화 가능성을 더욱 검증할 필요가 있습니다.

◦

레이어 삭제 및 교체 외 다른 유형의 구조적 개입에 대한 LLM의 강건성은 추가 연구가 필요합니다.

◦

제시된 네 가지 추론 단계가 모든 LLM에 적용 가능한 보편적인 프레임워크인지에 대한 추가 연구가 필요합니다.

Made with Slashpage