Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Dynaword: From One-shot to Continuously Developed Datasets

Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor

Proof2Hybrid: Automatic Mathematical Benchmark Synthesis for Proof-Centric Problems

Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy

BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability

SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy

Managing Escalation in Off-the-Shelf Large Language Models

FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models

A Foundational Schema.org Mapping for a Legal Knowledge Graph: Representing Brazilian Legal Norms as FRBR Works

D3: Training-Free AI-Generated Video Detection Using Second-Order Features

SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity

Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Memorization in Fine-Tuned Large Language Models

From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation

The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?

Post-Completion Learning for Language Models

Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content

Equivariant Volumetric Grasping

SemiSegECG: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation

FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices

Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark

Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation

TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models

Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs

$\Texttt{Droid}$: A Resource Suite for AI-Generated Code Detection

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Principled Foundations for Preference Optimization

Evaluating LLMs on Real-World Forecasting Against Expert Forecasters

STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking

S2FGL: Spatial Spectral Federated Graph Learning

AI4Research: A Survey of Artificial Intelligence for Scientific Research

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation

Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints

Causally Steered Diffusion for Automated Video Counterfactual Generation

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

ProRefine: Inference-Time Prompt Refinement with Textual Feedback

SALAD: Systematic Assessment of Machine Unlearning on LLM-Aided Hardware Design

MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering

Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

LightRetriever: A LLM-based Hybrid Retrieval Architecture with 1000x Faster Query Inference

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

All-optical temporal integration mediated by subwavelength heat antennas

GRILL: Gradient Signal Restoration in Ill-Conditioned Layers to Enhance Adversarial Attacks on Autoencoders

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

FFCBA: Feature-based Full-target Clean-label Backdoor Attacks

Multilingual Performance Biases of Large Language Models in Education

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis

Efficient Generative Model Training via Embedded Representation Warmup

Graph Attention-Driven Bayesian Deep Unrolling for Dual-Peak Single-Photon Lidar Imaging

Spectral Architecture Search for Neural Network Models

Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

Augmented Adversarial Trigger Learning

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs

A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness

PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset

DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping

Entropy-Lens: The Information Signature of Transformer Computations

CAMEF: Causal-Augmented Multi-Modality Event-Driven Financial Forecasting by Integrating Time Series Patterns and Salient Macroeconomic Announcements

Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach

AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Multilingual Chain-of-Thought

AI-driven Wireless Positioning: Fundamentals, Standards, State-of-the-art, and Challenges

CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models

Average-Reward Soft Actor-Critic

Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation

From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

SANDWICH: Towards an Offline, Differentiable, Fully-Trainable Wireless Neural Ray-Tracing Surrogate

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

Effective AGM Belief Contraction: A Journey beyond the Finitary Realm (Technical Report)

Beyond Images: Adaptive Fusion of Visual and Textual Data for Food Classification

TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks

KCR: Resolving Long-Context Knowledge Conflicts via Reasoning in LLMs

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

CADDesigner: Conceptual Design of CAD Models Based on General-Purpose Agent

Mind the Gap: The Divergence Between Human and LLM-Generated Tasks

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis

The AlphaPhysics Term Rewriting System for Marking Algebraic Expressions in Physics Exams

Modeling Deontic Modal Logic in the s(CASP) Goal-directed Predicate Answer Set Programming System

Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study

The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning

Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments

Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory

UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization

Curious Causality-Seeking Agents Learn Meta Causal World

Created by

Haebom

作者

Zhiyu Zhao, Haoxuan Li, Haifeng Zhang, Jun Wang, Francesco Faccio, J urgen Schmidhuber, Mengyue Yang

概要

この論文は、世界をモデル化する際に環境の基礎にある因果法則が一つであり、不変であるという一般的な仮定について問題を提起します。実際には観測ウィンドウが狭いため、固定された基礎メカニズムが変化する因果メカニズムと見られることが多いです。したがって、政策や環境状態の微妙な変化でさえ、観察された因果メカニズムを変えることができます。これを解決するために、この論文では、**メタインとグラフ（Meta-Causal Graph）**という世界モデルを提案します。これは、潜在的な世界の状況によって因果構造がどのように変化するかを効率的にエンコードする最小限の統合表現です。メタインとグラフは、それぞれ、メタ状態（潜在状態空間内）によってアクティブにされる複数の因果グラフで構成されています。この表現に基づいて、本論文は、（1）各ブグラフをアクティブにするメタ状態を識別し、（2）エージェントの好奇心に基づく介入ポリシーを通じて対応する因果関係を発見し、（3）継続的な好奇心ベースのナビゲーションとエージェントの経験を通じてメタインとグラフを繰り返し改善する**因果追求エージェント**を提供します。合成作業とロボットアーム操作作業の実験は、提案された方法が因果力学の変化を強く捉え、以前に見られなかった文脈にも効果的に一般化されることを示しています。

Takeaways、Limitations

•

Takeaways：

◦

変化する因果関係を持つ環境でも堅牢に動作する世界モデルを提示します。

◦

メタインとグラフは、さまざまな因果構造を効率的に表現する新しい方法を提供します。

◦

エージェントの好奇心ベースのナビゲーションにより、世界モデルを継続的に改善できます。

◦

合成および実際のロボット作業で提案された方法の効果を実験的に検証しました。

•

Limitations：

◦

メタ状態の定義と識別方法に関するさらなる研究が必要である。

◦

高次元の複雑な環境では、メタインとグラフのスケーラビリティのレビューが必要です。

◦

エージェントの好奇心に基づく介入政策の効率を高める方策の研究が必要である。

◦

実験環境の制約により、一般化性能のさらなる検証が必要です。

Made with Slashpage