Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

Self-Questioning Language Models

Beyond risk: A proto-framework for assessing the societal impact of AI systems

Supervised Dynamic Dimension Reduction with Deep Neural Network

EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering

LLMs Have a Heart of Stone: Demystifying the Soft Thinking Ability of Large Reasoning Models

Industrial LLM-based Code Optimization under Regulation: A Mixture-of-Agents Approach

Reliable Evaluation Protocol for Low-Precision Retrieval

Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery

Tool-integrated Reinforcement Learning for Repo Deep Search

CauKer: classification time series foundation models can be pretrained on synthetic data only

Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment

DMSC: Dynamic Multi-Scale Coordination Framework for Time Series Forecasting

HyCodePolicy: Hybrid Language Controllers for Multimodal Monitoring and Decision in Embodied Agents

Entity Representation Learning Through Onsite-Offsite Graph for Pinterest Ads

Evaluating User Experience in Conversational Recommender Systems: A Systematic Review Across Classical and LLM-Powered Approaches

Spatial-Frequency Aware for Object Detection in RAW Image

Learning Pivoting Manipulation with Force and Vision Feedback Using Optimization-based Demonstrations

NCCR: to Evaluate the Robustness of Neural Networks and Adversarial Examples

ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions

From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation

EcoTransformer: Attention without Multiplication

Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization

True Multimodal In-Context Learning Needs Attention to the Visual Context

Gauge Flow Models

Zero-Shot Neural Architecture Search with Weighted Response Correlation

The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover

CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations

VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting

A Comparative Study of Specialized LLMs as Dense Retrievers

Sign Spotting Disambiguation using Large Language Models

UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields

Thought Anchors: Which LLM Reasoning Steps Matter?

UITron-Speech: Towards Automated GUI Agents Based on Speech Instructions

15,500 Seconds: Lean UAV Classification Using EfficientNet and Lightweight Fine-Tuning

AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale

On the Fundamental Impossibility of Hallucination Control in Large Language Models

Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR

Text-Only Reasoning Unleashes Zero-Shot Multimodal Evaluators

CAIN: Hijacking LLM-Humans Conversations via Malicious System Prompts

Explain Less, Understand More: Jargon Detection via Personalized Parameter-Efficient Fine-tuning

What Lives? A meta-analysis of diverse opinions on the definition of life

A Generative Neural Annealer for Black-Box Combinatorial Optimization

GRILL: Gradient Signal Restoration in Ill-Conditioned Layers to Enhance Adversarial Attacks on Autoencoders

CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering

Mj\"olnir: A Deep Learning Parametrization Framework for Global Lightning Flash Density

RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework

ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning

Beyond Wide-Angle Images: Structure-to-Detail Video Portrait Correction via Unsupervised Spatiotemporal Adaptation

CITRAS: Covariate-Informed Transformer for Time Series Forecasting

Rubric Is All You Need: Enhancing LLM-based Code Evaluation With Question-Specific Rubrics

Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models

The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory

Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Pull-Based Query Scheduling for Goal-Oriented Semantic Communication

Accelerating Focal Search in Multi-Agent Path Finding with Tighter Lower Bounds

RAILGUN: A Unified Convolutional Policy for Multi-Agent Path Finding Across Different Environments and Tasks

UltraSTF: Ultra-Compact Model for Large-Scale Spatio-Temporal Forecasting

PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models

Foundation Model of Electronic Medical Records for Adaptive Risk Estimation

Tool Unlearning for Tool-Augmented LLMs

Vision without Images: End-to-End Computer Vision from Single Compressive Measurements

How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias

3DTTNet: Multimodal Fusion-Based 3D Traversable Terrain Modeling for Off-Road Environments

DOGR: Towards Versatile Visual Document Grounding and Referring

Real-World Offline Reinforcement Learning from Vision Language Model フィードバック

Causality-Driven Audits of Model Robustness

AUTALIC: A Dataset for Anti-AUTistic Ableist Language In Context

Beyond Adapter Retrieval: Latent Geometry-Preserving Composition via Sparse Task Projection

Pyhgf: A neural network library for predictive coding

Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated

AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual Granularity

Parse Trees Guided LLM Prompt Compression

One Model, Any Conjunctive Query: Graph Neural Networks for Answering Queries over Incomplete Knowledge Graphs

A Value Based Parallel Update MCTS Method for Multi-Agent Cooperative Decision Making of Connected and Automated Vehicles

Fairness Definitions in Language Models Explained

CityLight: A Neighborhood-inclusive Universal Model for Coordinated City-scale Traffic Signal Control

Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting

Long-Term Visual Object Tracking with Event Cameras: An Associative Memory Augmented Tracker and A Benchmark Dataset

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

From Cluster Assumption to Graph Convolution: Graph-based Semi-Supervised Learning Revisited

Environmental Sound Classification on An Embedded Hardware Platform

Data Dependency Inference for Industrial Code Generation Based on UML Sequence Diagrams

InqEduAgent: Adaptive AI Learning Partners with Gaussian Process Augmentation

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

Higher Gauge Flow Models

Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models

IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks

SLR: Automated Synthesis for Scalable Logical Reasoning

The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets

Learning to Inference Adaptively for Multimodal Large Language Models

Efficient rule induction by ignoring pointless rules

Why the Agent Made that Decision: Contrastive Explanation Learning for Reinforcement Learning

Evaluating Detection Thresholds: The Impact of False Positives and Negatives on Super-Resolution Ultrasound Localization Microscopy

On the Fundamental Impossibility of Hallucination Control in Large Language Models

Created by

Haebom

作者

Micha{\l} P. Karpowicz

概要

本論文は、非重要な知識集約を実行することができる任意の大規模言語モデル（LLM）が、真の（内部的に一貫した）知識表現、意味情報の保存、関連知識の完全な開示、および知識制約の最適性を同時に達成できないという根本的な不可能性の定理を提示します。この不可能性は、工学的限界ではなく、情報集約自体の数学的構造に由来する。分散コンポーネントが部分的な知識を活用して応答を形成するために競合するアイデアオークションで推論プロセスを説明することによって、この結果を確立します。証明は、メカニズム設計理論（Green-Laffont）、適切なスコアリング規則理論（Savage）、およびトランスの直接アーキテクチャ分析（Log-Sum-Exp凸性）の3つの独立した数学的領域にまたがっています。具体的には、厳密に凹状の設定では、様々な信念の集計スコアが個々のスコアの合計を厳しく超えていることを示しています。その違いは、不可逆的な確実性または過信の生成、すなわち幻覚と創造性または想像力の数学的起源を定量化することができます。この分析をサポートするために、一般的な設定での境界推論をモデル化するために、意味情報の測定と出現演算子の補完的な概念を導入します。警戒された推論は有用な洞察とインスピレーションを提供するアクセス可能な情報を生成しますが、理想的な推論は意味の内容を厳密に保存することを証明します。幻覚と想像力が情報保存の不可欠な違反に基づく数学的に同じ現象であることを示すことによって、この論文は高度なAIシステムでこれらの行動を管理するための原則的な基盤を提供します。最後に、提案された理論の評価と改善のためのいくつかの推測的なアイデアを提示します。

Takeaways、Limitations

•

Takeaways:

◦

LLMの幻覚と創造性の数学的起源を明らかにすることによって、これらの現象を理解し管理するための原則的な基盤を提供します。

◦

意味情報測定や出現演算子などの新しい概念を導入し、境界推論をモデル化する新しいフレームワークを提示します。

◦

LLMの知識集約プロセスをアイデアオークションとしてモデル化し、新しい分析の視点を提供します。

•

Limitations:

◦

提示された理論はまだ推測的な側面を持っており、追加の実験的検証が必要です。

◦

理論の実際のLLMシステムへの適用と一般化の可能性に関するさらなる研究が必要である。

◦

提案された理論に基づいて、LLMの幻覚と創造性を効果的に管理するための具体的な方法論のさらなる研究が必要です。

Made with Slashpage