Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

ACCeLLiuM: Supervised Fine-Tuning for Automated OpenACC Pragma Generation

AnchDrive: Bootstrapping Diffusion Policies with Hybrid Trajectory Anchors for End-to-End Driving

Diffusion-Augmented Contrastive Learning: A Noise-Robust Encoder for Biosignal Representations

FusedANN: Convexified Hybrid ANN via Attribute-Vector Fusion

HiCoLoRA: Addressing Context-Prompt Misalignment via Hierarchical Collaborative LoRA for Zero-Shot DST

A Longitudinal Randomized Control Study of Companion Chatbot Use: Anthropomorphism and Its Mediating Role on Social Impacts

TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding

Automated Facility Enumeration for Building Compliance Checking using Door Detection and Large Language Models

Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling

Comparing RAG and GraphRAG for Page-Level Retrieval Question Answering on Math Textbook

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models

Recent Advances in Microscopy Image Enhancement using Deep Learning: A Survey

Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity

Towards a Physics Foundation Model

Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews

Positional Encoding via Token-Aware Phase Attention

Chain or tree? Re-evaluating complex reasoning from the perspective of a matrix of thought

A Two-Stage Strategy for Mitosis Detection Using Improved YOLO11x Proposals and ConvNeXt Classification

JudgeAgent: Knowledge-wise and Dynamic LLM Evaluation with Agent-as-Interviewer

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

Scalable Option Learning in High-Throughput Environments

“She was useful, but a bit too optimistic”: Augmenting Design with Interactive Virtual Personas

In-Context Algorithm Emulation in Fixed-Weight Transformers

Dream to Chat: Model-based Reinforcement Learning on Dialogues with User Belief Modeling

Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders

Conflict-Aware Soft Prompting for Retrieval-Augmented Generation

ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification

StreetReaderAI: Making Street View Accessible Using Context-Aware Multimodal AI

Graph is a Natural Regularization: Revisiting Vector Quantization for Graph Representation Learning

Intuition emerges in Maximum Caliber models at criticality

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy

Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models

Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning

SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy

DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS

Generative Logic: A New Computer Architecture for Deterministic Reasoning and Knowledge Generation

Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles

Hierarchical Graph Neural Network for Compressed Speech Steganalysis

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

The Invisible Leash: Why RLVR May or May Not Escape Its Origin

APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

KV Cache Steering for Controlling Frozen LLMs

Lightweight MSA Design Advances Protein Folding From Evolutionary Embeddings

Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders

Neural-Network solver of ideal MHD equilibria

Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective

Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs

On the Necessity of Output Distribution Reweighting for Effective Class Unlearning

TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

Latent Concept Disentanglement in Transformer-based Language Models

Personalized LLM Decoding via Contrasting Personal Preferences

Exploiting Block Coordinate Descent for Cost-Effective LLM Model Training

Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox

Think With Videos For Agentic Long-Video Understanding

VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks

Position: Simulating Society Requires Simulating Thought

AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification

DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models

Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement

Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

Physics-Guided Motion Loss for Video Generation Model

Probing Neural Topology of Large Language Models

InfiMed: Low-Resource Medical MLLMs with Advancing Understanding and Reasoning

Mamba Integrated with Physics Principles Masters Long-term Chaotic System Forecasting

Model-Preserving Adaptive Rounding

DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training

Spectral-inspired Operator Learning with Limited Data and Unknown Physics

BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases

Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

Prompting is not Enough: Exploring Knowledge Integration and Controllable Generation on Large Language Models

HD-PiSSA: High-Rank Distributed Orthogonal Adaptation

Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study

FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning

BP-Seg: A graphical model approach to unsupervised and non-contiguous text segmentation using belief propagation

Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalized Reasoning

The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

Learning Flexible Forward Trajectories for Masked Molecular Diffusion

Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems

Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

UniErase: Towards Balanced and Precise Unlearning in Language Models

Octic Vision Transformers: Quicker Visions Through Equivariance

Intentional Gesture: Deliver Your Intentions with Gestures for Speech

UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models

VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation

Learning Hierarchical Domain Models Through Environment-Grounded Interaction

Shadow-FT: Tuning Instruct Model via Training on Paired Base Model

Structured Relational Representations

Latent Veracity Inference for Identifying Errors in Stepwise Reasoning

Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders

ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

Created by

Haebom

Author

Liuyi Wang, Xinyuan Xia, Hui Zhao, Hanqing Wang, Tai Wang, Yilun Chen, Chengju Liu, Qijun Chen, Jiangmiao Pang

VLN-PE: Physically Realistic Vision-and-Language Navigation Platform

Outline

To overcome the limitations of real-world vision-and-language navigation (VLN) tasks for robots, this paper introduces VLN-PE, a physically realistic VLN platform supporting humanoids, quadrupedal robots, and wheeled robots. VLN-PE systematically evaluates various VLN methods, including a classification model for single-step discrete action prediction, a diffusion model for dense waypoint prediction, and a training-free, supervised Large Language Model (LLM) integrated with path planning.

Takeaways, Limitations

•

We found that performance degradation was caused by physical challenges such as the robot's limited viewing space, environmental lighting changes, collisions, and falls.

•

We exposed the movement constraints of legged robots in complex environments.

•

VLN-PE seamlessly integrates new scenes beyond MP3D, enabling more comprehensive VLN evaluation.

•

We have identified generalization weaknesses in the current model in real-world deployments.

•

VLN-PE offers a novel way to improve adaptability across cross-embodiment.

•

The results and tools of this study will contribute to reconsidering the limitations of VLN and developing robust and practical VLN models.

Made with Slashpage