Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

Language Models are Injective and Hence Invertible

Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models

Latent Diffusion Model without Variational Autoencoder

Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning

CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions

Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion

LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering

STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding

MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering

Beyond One World: Benchmarking Super Heros in Role-Playing Across Multiversal Contexts

Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations

Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs

ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding

Max It or Miss It: Benchmarking LLM On Solving Extremal Problems

Phenome-Wide Multi-Omics Integration Uncovers Distinct Archetypes of Human Aging

When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models

The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers

A Vision for Access Control in LLM-based Agent Systems

Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models

Formally Verified Certification of Unsolvability of Temporal Planning Problems

DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction

MSDM: Generating Task-Specific Pathology Images with a Multimodal Conditioned Diffusion Model for Cell and Nuclei Segmentation

Synthetic Series-Symbol Data Generation for Time Series Foundation Models

SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation

Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy

A New Digital Divide? Coder Worldviews, the Slop Economy, and Democracy in the Age of AI

Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs

Creative synthesis of kinematic mechanisms

Market-Driven Subset Selection for Budgeted Training

Mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations

A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation

TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting

Learning Generalizable Shape Completion with SIM(3) Equivariance

Dolphin v1.0 Technical Report

A Measurement Study of Model Context Protocol Ecosystem

Diffusion Models are Kelly Gamblers

RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility

Semantic Representation Attack against Aligned Large Language Models

Chiplet-Based RISC-V SoC with Modular AI Acceleration

Accurate and Efficient Low-Rank Model Merging in Core Space

The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA

Graph Coloring for Multi-Task Learning

Robust LLM Training Infrastructure at ByteDance

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Communications to Circulations: Real-Time 3D Wind Field Prediction Using 5G GNSS Signals and Deep Learning

Why and How Auxiliary Tasks Improve JEPA Representations

Creativity Benchmark: A benchmark for marketing creativity for large language models

SpikingBrain: Spiking Brain-inspired Large Models

Robust Pan-Cancer Mitotic Figure Detection with YOLOv12

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection

Epistemic Trade-Off: An Analysis of the Operational Breakdown and Ontological Limits of "Certainty-Scope" in AI

ZeST: an LLM ベースの Zero-Shot Traversability Navigation for Unknown Environments

Interpretable Decision-Making for End-to-End Autonomous Driving

A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs

Limitations of Normalization in Attention Mechanism

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance: A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5

CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models

SegDAC: Improving Visual Reinforcement Learning by Extracting Dynamic Objectc-Centric Representations from Pretrained Vision Models

VGGSounder: Audio-Visual Evaluations for Foundation Models

Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches

CAPO: Towards Enhancing LLM Reasoning through Generative Credit Assignment

FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models

SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches

A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification

From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models

ReDi: Rectified Discrete Flow

Adaptive Policy Synchronization for Scalable Reinforcement Learning

From Sequence to Structure: Uncovering Substructure Reasoning in Transformers

Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation

Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences

DP-Fusion: Token-Level Differentially Private Inference for Large Language Models

AI-Generated Video Detection via Perceptual Straightening

From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging

Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning

ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM

GeNIE: A Generalizable Navigation System for In-the-Wild Environments

From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary

Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling

PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

Code Execution as Grounded Supervision for LLM Reasoning

Denoising the Future: Top-p Distributions for Moving Through Time

HauntAttack: When Attack Follows Reasoning as a Shadow

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning

CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching

KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision

SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

VERINA: Benchmarking Verifiable Code Generation

RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation

The quest for the GRAph Level autoEncoder (GRALE)

Efficient Large Language Model Inference with Neural Block Linearization

DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning

Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

Code Execution as Grounded Supervision for LLM Reasoning

Created by

Haebom

作者

Dongwon Jung、Wenxuan Zhou、Muhao Chen

概要

本論文では、大規模言語モデル（LLM）の推論能力を向上させるために、プログラム実行の決定性を活用して高品質のChain of Thoughtマップデータを生成するスケーラブルな方法を提案します。既存の人間の注釈やエラーが発生しやすいLLM生成CoTに頼るのではなく、コード実行から検証可能なステップバイステップの推論プロセスを抽出し、自然言語のCoT推論に変換します。さまざまなドメインの推論ベンチマーク実験によって提案された方法は、さまざまなタスクでLLMの転移可能な推論能力を効果的に向上させることを示しています。さらに、アブレーション研究は、方法論が非常に正確な推論データを生成し、無意味な反復と過度の事故を減らし、推論の間のトークン全体の長さを減らすことを確認しました。

Takeaways、Limitations

•

Takeaways：

◦

プログラム実行の決定性を活用して信頼性が高く正確なCoTマップデータを生成します。

◦

様々な推論作業におけるLLMの推論能力の向上

◦

推論データの精度検証と推論中のトークン長の短縮

◦

ヒト注釈とLLM生成CoTの限界を克服するスケーラブルな方法の提示

•

Limitations：

◦

具体的なLimitationsへの言及は論文の要約に含まれていません。（例：適用可能なタスクの制限、特定の種類のコード構造の難しさなど）

◦

具体的な方法論の制約や性能低下要因に関する情報が不足している。

Made with Slashpage