/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Structure Transfer: an Inference-Based Calculus for the Transformation of Representations
Ensemble of Pathology Foundation Models for MIDOG 2025 Track 2: Atypical Mitosis Classification
AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation
Understanding Space Is Rocket Science - Only Top Reasoning Models Can Solve Spatial Understanding Tasks
DaMoC: Efficiently Selecting the Optimal Large Language Model for Fine-tuning Domain Tasks Based on Data and Model Compression
Modular Techniques for Synthetic Long-Context Data Generation in Language Model Training and Evaluation
EZhouNet:A framework based on graph neural network and anchor interval for the respiratory sound event detection
AImoclips: A Benchmark for Evaluating Emotion Conveyance in Text-to-Music Generation
TimeCopilot
First Order Model-Based RL through Decoupled Backpropagation
Pilot Study on Generative AI and Critical Thinking in Higher Education Classrooms
Beacon: Post-Training Quantization with Integrated Grid Selection
Is Artificial Intelligence Reshaping the Landscape of the International Academic Community of Geosciences?
Vectorized Attention with Learnable Encoding for Quantum Transformer
Transplant Then Regenerate: A New Paradigm for Text Data Augmentation
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
MultiGen: Child-Friendly Multilingual Speech Generator with LLMs
StreetViewAI: Making Street View Accessible Using Context-Aware Multimodal AI
Street-Level AI: Are Large Language Models Ready for Real-World Judgments?
The KG-ER Conceptual Schema Language
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
Conditional Video Generation for High-Efficiency Video Compression
TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP
Demographic-aware fine-grained classification of pediatric wrist fractures
An Analysis of Action-Value Temporal-Difference Methods That Learn State Values
Stochastic Parameter Decomposition
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
MiniCPM4: Ultra-Efficient LLMs on End Devices
Evaluating the Efficacy of LLM-Based Reasoning for Multiobjective HPC Job Scheduling
How Can I Publish My LLM Benchmark Without Giving the True Answers Away?
Optimization of Module Transferability in Single Image Super-Resolution: Universality Assessment and Cycle Residual Blocks
Transferable Mask Transformer: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation
RBT4DNN: Requirements-based Testing of Neural Networks
Robust Offline Imitation Learning Through State-level Trajectory Stitching
Beyond holography: the entropic quantum gravity foundations of image processing
KNighter: Transforming Static Analysis with LLM-Synthesized Checkers
FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response
CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection
Rapid Word Learning Through Meta In-Context Learning
Image Embedding Sampling Method for Diverse Captioning
Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases?
Extended Histogram-based Outlier Score (EHBOS)
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models
Breaking the Context Bottleneck on Long Time Series Forecasting
Defending LVLMs Against Vision Attacks through Partial-Perception Supervision
ACING: Actor-Critic for Instruction Learning in Black-Box LLMs
Kolb-Based Experiential Learning for Generalist Agents with Human-Level Kaggle Data Science Performance
Quantifying Calibration Error in Neural Networks Through Evidence-Based Theory
Robust training of implicit generative models for multivariate and heavy-tailed distributions with an invariant statistical loss
Learning from 10 Demos: Generalisable and Sample-Efficient Policy Learning with Oriented Affordance Frames
AutoPETIII:The Tracer Frontier。 What Frontier?
Long Input Sequence Network for Long Time Series Forecasting
FFHFlow: Diverse and Uncertainty-Aware Dexterous Grasp Generation via Flow Variational Inference
Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers
MTP: A Meaning-Typed Language Abstraction for AI-Integrated Programming
Diffusion on language model encodings for protein sequence generation
Style Transfer to Calvin and Hobbes comics using Stable Diffusion
Autonomation, Not Automation: Activities and Needs of European Fact-checkers as a Basis for Designing Human-Centered AI Systems
Plan Verification for LLM-Based Embodied Task Completion Agents
EigenBench: A Comparative Behavioral Measure of Value Alignment
Oyster-I:Beyond Refusal - Constructive Safety Alignment for Responsible Language Models
Extending FKG.in: Towards a Food Claim Traceability Network
DeepVIS: Bridging Natural Language and Data Visualization Through Step-wise Reasoning
Theory of Mind Using Active Inference: A Framework for Multi-Agent Cooperation
CP-Bench: Evaluating Large Language Models for Constraint Modelling
Axiomatics of Restricted Choices by Linear Orders of Sets with Minimum as Fallback
DMN-Guided Prompting: A Framework for Controlling LLM Behavior
Computational Basis of LLM's Decision Making in Social Simulation
Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers
Enhancing FKG.in: automating Indian food composition analysis
WASP: A Weight-Space Approach to Detecting Learned Spuriousness
Transferable Belief Model on Quantum Circuits
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
(Ir)rationality in AI: State of the Art, Research Challenges and Open Questions
Intelligence Primer
ChronoGraph: A Real-World Graph-Based Multivariate Time Series Dataset
Delta Activations: A Representation for Finetuned Large Language Models
DEXOP: A Device for Robotic Transfer of Dexterous Human Manipulation
Towards a Unified View of Large Language Model Post-Training
No Thoughts Just AI: Biased LLM Recommendations Limit Human Agency in Resume Screening
IPA: An Information-Preserving Input Projection Framework for Efficient Foundation Model Adaptation
SSGaussian: Semantic-Aware and Structure-Preserving 3D Style Transfer
Parking Availability Prediction via Fusing Multi-Source Data with A Self-Supervised Learning Enhanced Spatio-Temporal Inverted Transformer
PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation
AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds
From Editor to Dense Geometry Estimator
Decoupled Entity Representation Learning for Pinterest Ads Ranking
Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models
HumAIne-Chatbot: Real-Time Personalized Conversational AI via Reinforcement Learning
Reinforcement Learning for Robust Ageing-Aware Control of Li-ion Battery Systems with Data-Driven Formal Verification
An Empirical Study of Vulnerabilities in Python Packages and Their Detection
How many patients could we save with LLM priors?
Learning Active Perception via Self-Evolving Preference Optimization for GUI Grounding
MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions
VisioFirm: Cross-Platform AI-assisted Annotation Tool for Computer Vision
Crossing the Species Divide: Transfer Learning from Speech to Animal Sounds
YOLO Ensemble for UAV-based Multispectral Defect Detection in Wind Turbine Components
Attention as an Adaptive Filter
TAGAL: Tabular Data Generation using Agentic LLM メソッド
Enhancing Technical Documents Retrieval for RAG
Load more
Plan Verification for LLM-Based Embodied Task Completion Agents
Created by
Haebom
作者
Ananth Hariharan, Vardhan Dongre, Dilek Hakkani-T ur, Gokhan Tur
概要
この論文は、実装されたAIのための大規模言語モデル(LLM)ベースの作業計画と、対応する人間のデモンストレーションが不要な行動、重複したナビゲーション、および論理エラーによってポリシーの品質を低下させる可能性があるという問題を提起します。これを解決するために、判断LLMが行動順序を批判し、計画LLMが修正を適用する反復検証フレームワークを提案します。これにより、徐々にクリーンで空間的に一貫した軌跡が生成されます。ルールベースのアプローチとは異なり、自然言語プロンプトに依存して、無関係な行動、矛盾、および欠落しているステップなど、さまざまな種類のエラーの広範な一般化を可能にします。 TEACh実装AIデータセットの手動で注釈付きのアクションセットでは、提案されたフレームワークは、4つの最先端LLM(GPT-4-mini、DeepSeek-R1、Gemini 2.5、LLaMA 4 Scout)に対して最大90%の再現率と100%の精度を達成します。スタイリッシュなループは急速に収束し、96.5%のシーケンスが最大3回の反復のみを必要とし、時間効率と空間的行動構成の両方を改善します。重要なのは、この方法は人間のエラー回復パターンを維持しながら崩壊させず、強力な修正動作の今後の研究を支援するということです。空間計画と行動を改善するための信頼できるLLM機能を使用して計画検証を確立することで、実装されたAIで模倣学習のための高品質トレーニングデータを拡張可能なパスを提供します。
Takeaways、Limitations
•
Takeaways:
◦
LLMを使用した繰り返し計画検証フレームワークは、実装されたAIの作業計画の品質を向上させることができることを示しています。
◦
自然言語プロンプトベースのアプローチでは、さまざまな種類のエラーの一般化が可能です。
◦
時間効率と空間的行動構成を改善します。
◦
人間のエラー回復パターンを保存し、堅牢なシステム構築に貢献します。
◦
模倣学習のための高品質トレーニングデータを生成するためのスケーラブルな方法を提供します。
•
Limitations:
◦
提案されたフレームワークのパフォーマンスは、使用されるLLMのパフォーマンスに依存する可能性があります。
◦
TEAChデータセットの評価結果のみが提示され、他のデータセットでの一般化パフォーマンスには追加の検証が必要です。
◦
複雑な作業や例外的な状況の処理性能には、さらなる研究が必要です。
◦
完全なエラー除去を保証するものではなく、一部のエラーはまだ残っている可能性があります。
PDFを見る
Made with Slashpage