Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Inductive Moment Matching

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

FaceID-6M: A Large-Scale, Open-Source FaceID Customization Dataset

Interactive Medical Image Analysis with Concept-based Similarity Reasoning

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation

A Transformer Model for Predicting Chemical Reaction Products from Generic Templates

CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking

The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

Multi-Task Reinforcement Learning Enables Parameter Scaling

Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models

Prediction of Frozen Region Growth in Kidney Cryoablation Intervention Using a 3D Flow-Matching Model

Call for Rigor in Reporting Quality of Instruction Tuning Data

KunlunBaize: LLM with Multi-Scale Convolution and Multi-Token Prediction Under TransformerX Framework

(How) Do Language Models Track State?

IterPref: Focal Preference Learning for Code Generation via Iterative Debugging

PaCA: Partial Connection Adaptation for Efficient Fine-Tuning

Forgotten Polygons: Multimodal Large Language Models are Shape-Blind

Q-PETR: Quant-aware Position Embedding Transformation for Multi-View 3D Object Detection

FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling

Helix-mRNA: A Hybrid Foundation Model For Full Sequence mRNA Therapeutics

IMLE Policy: Fast and Sample Efficient Visuomotor Policy Learning via Implicit Maximum Likelihood Estimation

Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models

Equivariant Masked Position Prediction for Efficient Molecular Representation

IRepair: An Intent-Aware Approach to Repair Data-Driven Errors in Large Language Models

Automated Consistency Analysis of LLMs

ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

Rationalization Models for Text-to-SQL

Reflection-Window Decoding: Text Generation with Selective Refinement

Agentic Bug Reproduction for Effective Automated Program Repair at Google

Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods

KAA: Kolmogorov-Arnold Attention for Enhancing Attentive Graph Neural Networks

TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions

Multi-P$^2$A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models

Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training

SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization

DMin: Scalable Training Data Influence Estimation for Diffusion Models

MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

RL-MILP Solver: A Reinforcement Learning Approach for Solving Mixed-Integer Linear Programs with Graph Neural Networks

Proto Successor Measure: Representing the Behavior Space of an RL Agent

KinMo: Kinematic-aware Human Motion Understanding and Generation

Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks

OminiControl: Minimal and Universal Control for Diffusion Transformer

MTA: Multimodal Task Alignment for BEV Perception and Captioning

PyGen: A Collaborative Human-AI Approach to Python Package Creation

Fair Summarization: Bridging Quality and Diversity in Extractive Summaries

AtlasSeg: Atlas Prior Guided Dual-U-Net for Cortical Segmentation in Fetal Brain MRI

V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM

Fourier Head: Helping Large Language Models Learn Complex Probability Distributions

Conditional diffusions for neural posterior estimation

LLM-HDR: Bridging LLM-based Perception and Self-Supervision for Unpaired LDR-to-HDR Image Reconstruction

Chemistry-Inspired Diffusion with Non-Differentiable Guidance

Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion

6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering

CAX: Cellular Automata Accelerated in JAX

Emotion-Aware Embedding Fusion in LLMs (Flan-T5, LLAMA 2, DeepSeek-R1, and ChatGPT 4) for Intelligent Response Generation

What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach

Meta-RTL: Reinforcement-Based Meta-Transfer Learning for Low-Resource Commonsense Reasoning

LEMMo-Plan: LLM-Enhanced Learning from Multi-Modal Demonstration for Planning Sequential Contact-Rich Manipulation Tasks

Training with Differential Privacy: A Gradient-Preserving Noise Reduction Approach with Provable Security

ASMA: An Adaptive Safety Margin Algorithm for Vision-Language Drone Navigation via Scene-Aware Control Barrier Functions

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models

Detect, Investigate, Judge and Determine: A Knowledge-guided Framework for Few-shot Fake News Detection

How Data Inter-connectivity Shapes LLMs Unlearning: A Structural Unlearning Perspective

Value Improved Actor Critic Algorithms

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Is the House Ready For Sleeptime? Generating and Evaluating Situational Queries for Embodied Question Answering

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Adversarial Guided Diffusion Models for Adversarial Purification

M-HOF-Opt: Multi-Objective Hierarchical Output Feedback Optimization via Multiplier Induced Loss Landscape Scheduling

The VampPrior Mixture Model

Regularization by Texts for Latent Diffusion Inverse Solvers

Deep Tensor Network

Identifying the Truth of Global Model: A Generic Solution to Defend Against Byzantine and Backdoor Attacks in Federated Learning (full version)

Generalizable Imitation Learning Through Pre-Trained Representations

Hypergraph Structure Inference From Data Under Smoothness Prior

HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis

SketchOGD: Memory-Efficient Continual Learning

ChatGPT-4 in the Turing Test: A Critical Analysis

Toward an Evaluation Science for Generative AI Systems

WritingBench: A Comprehensive Benchmark for Generative Writing

ToolFuzz -- Automated Agent Tool Testing

Building Interval Type-2 Fuzzy Membership Function: A Deck of Cards based Co-constructive Approach

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Learning to Plan with Personalized Preferences

Exponential Speedups by Rerooting Levin Tree Search

A Unified Framework for Motion Reasoning and Generation in Human Interaction

Agent-Oriented Planning in Multi-Agent Systems

Dynamic Analysis and Adaptive Discriminator for Fake News Detection

A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models

AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI

Nondeterministic Causal Models

X-SHIELD: Regularization for eXplainable Artificial Intelligence

Categorical semantics of compositional reinforcement learning

(How) Do Language Models Track State?

Created by

Haebom

저자

Belinda Z. Li, Zifan Carl Guo, Jacob Andreas

개요

본 논문은 변환기 언어 모델(LMs)이 진화하는 세계의 관찰되지 않은 상태를 추적하는 것처럼 보이는 행동(스토리텔링부터 코드 생성까지)을 어떻게 수행하는지 연구합니다. 연구진은 순열 조합(일련의 교환 후 객체 집합의 순서를 계산)을 수행하도록 훈련되거나 미세 조정된 LM에서 상태 추적을 연구했습니다. 이 문제의 단순한 대수 구조에도 불구하고, 많은 다른 작업(예: 유한 오토마타의 시뮬레이션 및 부울 식의 평가)을 순열 조합으로 축소할 수 있으므로 일반적인 상태 추적을 위한 자연스러운 모델이 됩니다. 연구 결과, LM은 이 작업에 대해 두 가지 상태 추적 메커니즘 중 하나를 일관되게 학습한다는 것을 보여줍니다. 첫 번째는 Liu et al. (2023)과 Merrill et al. (2024)의 최근 이론적 연구에서 사용된 "결합 스캔" 구성과 매우 유사합니다. 두 번째는 계산하기 쉬운 특징(순열 패리티)을 사용하여 출력 공간을 부분적으로 가지치기한 다음 결합 스캔으로 이를 개선합니다. 두 메커니즘은 현저히 다른 강건성 특성을 나타내며, 휴리스틱을 장려하거나 억제하는 중간 훈련 작업을 통해 LM을 하나 또는 다른 메커니즘으로 유도하는 방법을 보여줍니다. 이 연구 결과는 사전 훈련되거나 미세 조정된 변환기 LM이 효율적이고 해석 가능한 상태 추적 메커니즘을 구현하는 방법을 학습할 수 있으며, 이러한 메커니즘의 출현을 예측하고 제어할 수 있음을 보여줍니다.

시사점, 한계점

•

시사점:

◦

변환기 LM이 효율적이고 해석 가능한 상태 추적 메커니즘을 학습할 수 있음을 보여줍니다.

◦

LM이 상태 추적을 위해 사용하는 두 가지 주요 메커니즘을 밝혀냈습니다.

◦

중간 훈련 작업을 통해 LM의 상태 추적 메커니즘을 제어할 수 있는 방법을 제시합니다.

◦

순열 조합이라는 단순한 작업을 통해 다양한 복잡한 작업의 상태 추적 메커니즘을 이해할 수 있는 가능성을 제시합니다.

•

한계점:

◦

연구는 순열 조합이라는 특정 작업에 국한되어 있습니다. 다른 유형의 작업에 대한 일반화 가능성은 추가 연구가 필요합니다.

◦

제시된 두 가지 메커니즘 외에 다른 상태 추적 메커니즘이 존재할 가능성이 있습니다.

◦

중간 훈련 작업의 설계가 LM의 상태 추적 메커니즘에 미치는 영향에 대한 더 깊이 있는 분석이 필요합니다.

◦

실제 세계의 복잡한 문제에 대한 상태 추적 메커니즘의 적용 가능성에 대한 추가 연구가 필요합니다.

Made with SlashPage