/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Merge-of-Thought Distillation
OTESGN: Optimal Transport-Enhanced Syntactic-Semantic Graph Networks for Aspect-Based Sentiment Analysis
MESH - Understanding Videos Like Human: Measuring Hallucinations in Large Video Models
Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics
Symmetry-Guided Multi-Agent Inverse Reinforcement Learning
AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs
Expert-Guided Explainable Few-Shot Learning for Medical Image Diagnosis
Towards Generalized Routing: Model and Agent Orchestration for Adaptive and Efficient Inference
MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining
Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards
Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning
Beyond the Pre-Service Horizon: Infusing In-Service Behavior for Improved Financial Risk Forecasting
On Synthesis of Timed Regular Expressions
TinyDef-DETR: A DETR-based Framework for Defect Detection in Transmission Lines from UAV Imagery
LiDAR-BIND-T: Improved and Temporally Consistent Sensor Modality Translation and Fusion for Robotic Applications
From Vision to Validation: A Theory- and Data-Driven Construction of a GCC-Specific AI Adoption Index
A Comprehensive Guide to Differential Privacy: From Theory to User Expectations
The Architecture of AI Transformation: Four Strategic Patterns and an Emerging Frontier
FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training
Deep Learning-Based Rock Particulate Classification Using Attention-Enhanced ConvNeXt
The Information Dynamics of Generative Diffusion
Data-Augmented Few-Shot Neural Stencil Emulation for System Identification of Computer Models
Group Expectation Policy Optimization for Heterogeneous Reinforcement Learning
Pretrained Conformers for Audio Fingerprinting and Retrieval
Towards Scalable Training for Handwritten Mathematical Expression Recognition
To Theoretically Understand Transformer-Based In-Context Learning for Optimizing CSMA
Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning
HiD-VAE: Interpretable Generative Recommendation via Hierarchical and Disentangled Semantic IDs
MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
Villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
New Kid in the Classroom: Exploring Student Perceptions of AI Coding Assistants
Can Large Language Models Understand As Well As Apply Patent Regulations to Pass a Hands-On Patent Attorney Test?
Uncertainty-aware Diffusion and Reinforcement Learning for Joint Plane Localization and Anomaly Diagnosis in 3D Ultrasound
Uncertainty Estimation by Human Perception versus Neural Models
Persistent Homology of Topic Networks for the Prediction of Reader Curiosity
Task Matters: Knowledge Requirements Shape LLM Responses to Context-Memory Conflict
Crack Path Prediction with Operator Learning using Discrete Particle System data Generation
Diffusion Graph Neural Networks for Robustness in Olfaction Sensors and Datasets
MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering
An Ontology-Driven Graph RAG for Legal Norms: A Structural, Temporal, and Deterministic Approach
Combating Falsification of Speech Videos with Live Optical Signatures (Extended Version)
Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization
Critical Challenges and Guidelines in Evaluating Synthetic Tabular Data: A Systematic Review
Parasite: A Steganography-based Backdoor Attack Framework for Diffusion Models
Towards Adaptive Memory-Based Optimization for Enhanced Retrieval-Augmented Generation
Entropy-Gated Branching for Efficient Test-Time Reasoning
SWI: Speaking with Intent in Large Language Models
Byzantine-Robust Federated Learning Using Generative Adversarial Networks
VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification
MIND: Towards Immersive Psychological Healing with Multi-agent Inner Dialogue
V-HOP: Visuo-Haptic 6D Object Pose Tracking
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
Knowledge-Guided Biomarker Identification for Label-Free Single-Cell RNA-Seq Data: A Reinforcement Learning Perspective
MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond
RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
DeepVoting: Learning and Fine-Tuning Voting Rules with Canonical Embeddings
Rethinking Disentanglement under Dependent Factors of Variation
Discovering physical laws with parallel symbolic enumeration
Semantic Augmentation in Images using Language
Algorithmic Collusion by Large Language Models
A minimal coalition logic
Deep Reinforcement Learning for Inventory Networks: Toward Reliable Policy Optimization
Inconsistency Handling in Prioritized Databases with Universal Constraints: Complexity Analysis and Links with Active Integrity Constraints
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
CogGuide: Human-Like Guidance for Zero-Shot Omni-Modal Reasoning
TreeGPT: Pure TreeFFN Encoder-Decoder Architecture for Structured Reasoning Without Attention Mechanisms
Robix: A Unified Model for Robot Interaction, Reasoning and Planning
KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models
Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation
Optimizing Length Compression in Large Reasoning Models
LLMs for sensory-motor control: Combining in-context and iterative learning
Effort-aware Fairness: Incorporating a Philosophy-informed, Human-centered Notion of Effort into Algorithmic Fairness Metrics
Simulating Human-like Daily Activities with Desire-driven Autonomy
Enhancing Few-Shot Transfer Learning with Optimized Multi-Task Prompt Tuning through Modular Prompt Composition
ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Feasibility-Guided Fair Adaptive Offline Reinforcement Learning for Medicaid Care Management
Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations
Explaining Concept Drift through the Evolution of Group Counterfactuals
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Mechanistic Learning with Guided Diffusion Models to Predict Spatio-Temporal Brain Tumor Growth
Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication
ObjectReact: Learning Object-Relative Control for Visual Navigation
Fluent but Unfeeling: The Emotional Blind Spots of Language Models
Invisible Attributes, Visible Biases: Exploring Demographic Shortcuts in MRI-based Alzheimer's Disease Classification
An improved educational competition optimizer with multi-covariance learning operators for global optimization problems
Improving Video Diffusion Transformer Training by Multi-Feature Fusion and Alignment from Self-Supervised Vision Encoders
A modified RIME algorithm with covariance learning and diversity enhancement for numerical optimization
Towards Explainable Job Title Matching: Leveraging Semantic Textual Relatedness and Knowledge Graphs
Explainable AI for Accelerated Microstructure Imaging: A SHAP-Guided Protocol on the Connectome 2.0 scanner
Incorporating AI Incident Reporting into Telecommunications Law and Policy: Insights from India
OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection
Prompt Pirates Need a Map: Stealing Seeds helps Stealing Prompts
Resource-Efficient Glioma Segmentation on Sub-Saharan MRI
ENSI: Efficient Non-Interactive Secure Inference for Large Language Models
We're Still Doing It (All) Wrong: Recommender Systems, Fifteen Years Later
LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations
MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization
Load more
ObjectReact: Learning Object-Relative Control for Visual Navigation
Created by
Haebom
作者
Sourav Garg, Dustin Craggs, Vineeth Bhat, Lachlan Mares, Stefan Podgorski, Madhava Krishna, Feras Dayoub, Ian Reid
概要
単一のカメラと位相マップのみを使用したビジュアルナビゲーションは、追加のセンサーと3Dマップを必要とする方法の魅力的な選択肢として浮上しました。これは一般に、現在の観察画像と下位目標画像の対から制御を推定する「画像相対的」アプローチによって達成される。しかし、イメージはエージェントの姿勢と実装に厳密に結びついているため、世界のイメージレベルの表現には制限があります。対照的に、オブジェクトは地図の属性なので、実装や軌跡とは無関係の世界表現を提供します。この研究では、いくつかの望ましい特徴を示す「オブジェクト - 相対的」制御学習の新しいパラダイムを紹介します。 b)制御予測の問題を画像マッチングのトラブルシューティングから切り離すことができます。 c)トレーニング - テストとマッピング - 実行設定間の変化に対して高度な不変性を達成できます。 「相対的」3Dシーングラフの形の位相測定マップ表現を提案し、より有益なオブジェクトレベルのグローバルパス計画コストを達成します。明示的なRGB入力を必要としない高レベルの「WayObject Costmap」表現を条件とする「ObjectReact」というローカルコントローラを訓練します。センサーの高さの変化と基本的な空間理解能力に挑戦する複数のナビゲーションタスク(例えば、反対方向の地図軌跡のナビゲーション)では、画像 - 相対制御と比較したオブジェクト - 相対制御学習の利点が示されています。また、シミュレーション専用のポリシーが実際の屋内環境によく一般化できることを示しています。
Takeaways、Limitations
•
Takeaways:
◦
単一のカメラと位相マップのみを使用して、さまざまな環境で堅牢な視覚ナビゲーションを可能にする新しいオブジェクト - 相対制御パラダイムを提示します。
◦
画像マッチング問題と制御予測問題を分離し、より効率的で堅牢なナビゲーションシステムを構築します。
◦
センサーの高さの変化やリバースナビゲーションなど、さまざまな状況でも優れた一般化性能を発揮します。
◦
シミュレーションでは、訓練されたポリシーが実際の環境にうまく移行します。
•
Limitations:
◦
提案された方法の性能は、位相マップの精度と完全性に大きく依存します。不正確または不完全な地図はナビゲーションのパフォーマンスを低下させる可能性があります。
◦
複雑で混雑した環境では、オブジェクト認識と追跡の難しさはナビゲーションのパフォーマンスに影響を与える可能性があります。
◦
実際の環境での一般化性能には、さまざまな環境や照明条件の追加のテストと評価が必要です。
◦
コードと補助資料は提供されていますが、実際の実装と展開の詳細な説明が不足している可能性があります。
PDFを見る
Made with Slashpage