[공지사항]을 빙자한 안부와 근황
Show more
/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Merge Kernel for Bayesian Optimization on Permutation Space
Demographic-aware fine-grained classification of pediatric wrist fractures
Generative Multi-Target Cross-Domain Recommendation
ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle
Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models
A Simple Baseline for Stable and Plastic Neural Networks
WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
How Not to Detect Prompt Injections with an LLM
Critiques of World Models
The role of large language models in UI/UX design: A systematic literature review
LearnLens: LLM-Enabled Personalised, Curriculum-Grounded Feedback with Educators in the Loop
STACK: Adversarial Attacks on LLM Safeguard Pipelines
ZonUI-3B: A Lightweight Vision-Language Model for Cross-Resolution GUI Grounding
Understanding Reasoning in Thinking Language Models via Steering Vectors
Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation
EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation
TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Exploring Graph Representations of Logical Forms for Language Modeling
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs
CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors
Hands-On: Segmenting Individual Signs from Continuous Sequences
Can we ease the Injectivity Bottleneck on Lorentzian Manifolds for Graph Neural Networks?
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation
AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results
An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model
Evaluating link prediction: New perspectives and recommendations
Learning to Reason at the Frontier of Learnability
Stonefish: Supporting Machine Learning Research in Marine Robotics
Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning
On the Transfer of Knowledge in Quantum Algorithms
Code Readability in the Age of Large Language Models: An Industrial Case Study from Atlassian
Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude
ASTRID - An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
Consistency of Responses and Continuations Generated by Large Language Models on Social Media
From Code to Compliance: Assessing ChatGPT's Utility in Designing an Accessible Webpage -- A Case Study
Temporal reasoning for timeline summarisation in social media
Invisible Textual Backdoor Attacks based on Dual-Trigger
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Two-Stage Pretraining for Molecular Property Prediction in the Wild
Towards Practical Operation of Deep Reinforcement Learning Agents in Real-World Network Management at Open RAN Edges
An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots
Bridging Local and Global Knowledge via Transformer in Board Games
Entropy Loss: An Interpretability Amplifier of 3D Object Detection Network for Intelligent Driving
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation
On Pre-training of Multimodal Language Models Customized for Chart Understanding
Visual Grounding Methods for Efficient Interaction with Desktop Graphical User Interfaces
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation
SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings
Improved DDIM Sampling with Moment Matching Gaussian Mixtures
Eye-tracked Virtual Reality: A Comprehensive Survey on Methods and Privacy Challenges
From Roots to Rewards: Dynamic Tree Reasoning with RL
Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light
Instance space analysis of the capacitated vehicle routing problem
Multi-Agent LLMs as Ethics Advocates for AI-Based Systems
GATSim: Urban Mobility Simulation with Generative Agents
Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Strategic Reflectivism In Intelligent Systems
SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator
What the F*ck Is Artificial General Intelligence?
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios
To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization
BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems
UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception
CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis
Toward Temporal Causal Representation Learning with Tensor Decomposition
Kolmogorov Arnold Networks (KANs) for Imbalanced Data - An Empirical Perspective
NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) トラック
Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment
The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?
DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits
Edge Intelligence with Spiking Neural Networks
VLA-Mark: A cross modal watermark for large vision-language alignment model
Noradrenergic-inspired gain modulation attenuates the stability gap in joint training
A multi-strategy improved snake optimizer for 3-dimensional UAV path planning and engineering problems
Photonic Fabric Platform for AI Accelerators
OrthoInsight: Rib Fracture Diagnosis and Report Generation Based on Multi-Modal Large Models
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
A segmented robot grasping perception neural network for edge AI
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation
Exploiting Primacy Effect To Improve Large Language Models
Generalist Forecasting with Frozen Video Models via Latent Diffusion
Convergent transformations of visual representation in brains and models
Preprint: Did I Just Browse A Website Written by LLMs?
The Levers of Political Persuasion with Conversational AI
Political Leaning and Politicalness Classification of Texts
Self-supervised learning on gene expression data
Using LLMs to identify features of personal and professional skills in an open-response situational judgment test
Load more
EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation
Created by
Haebom
作者
Bingqian Lin, Yunshuang Nie, Khun Loun Zai, Ziming Wei, Mingfei Han, Rongtao Xu, Minzhe Niu, Jianhua Han, Liang Lin, Cewu Lu, Xiaodan Liang
概要
本論文は,自然言語命令に従って道を見つけるVision-Language Navigation(VLN)エージェントを構築する研究を扱う。最近の研究は、オープンソース大規模言語モデル(LLM)の推論能力を活用して探索性能を改善し、LLMの訓練データとVLN作業との間のドメインギャップを同時に緩和する可能性を示した。しかしながら、既存のアプローチは主に直接的な入力 - 出力マッピング方式を採用しており、マッピング学習が困難であり、探索決定が説明不可能であるという欠点がある。本論文では、LLMベースのVLNを向上させるための新しい自己改善型実装推論フレームワークであるEvolveNavを提案する。 EvolveNavは、形式化されたChain-of-Thought監督の微調整と自己反射的ポストトレーニングの2段階で構成されています。最初のステップでは、フォーマットされたCoTラベルを使用してモデルのナビゲーション推論能力を有効にし、推論を高速化します。 2番目のステップでは、モデルの自己推論出力を独自に豊富に作成したCoTラベルで繰り返しトレーニングし、監督の多様性を向上させます。誤った推論パターンとは対照的に、正しい推論パターンの学習を促進するために、自己反射的支援作業も導入される。実験結果は、人気のあるVLNベンチマークでは、EvolveNavが以前のLLMベースのVLNアプローチより優れていることを示しています。
Takeaways、Limitations
•
Takeaways:
◦
LLMベースのVLNにおける推論能力の向上とナビゲーション精度の向上に貢献する新しいフレームワーク(EvolveNav)の提示。
◦
形式化されたCoTラベルと自己反復的ポストトレーニングによる効果的な学習戦略の提示
◦
自己反射的補助作業による正しい推論パターン学習の導出
◦
従来のLLMベースのVLNアプローチより優れた性能を実証
•
Limitations:
◦
ナビゲーション操作の複雑さにより、完全なCoTラベルを取得することが困難になる可能性があり、純粋なCoT監督の微調整によって過適合が発生する可能性があります。
◦
提案されたフレームワークの一般化性能の追加検証が必要です。
◦
さまざまな環境と複雑なナビゲーション課題に対するロバースト性評価が必要です。
PDFを見る
Made with Slashpage