Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Failure Prediction at Runtime for Generative Robot Policies

ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Enhancing Self-Supervised Learning with Semantic Pairs A New Dataset and Empirical Study

DInfer: An Efficient Inference Framework for Diffusion Language Models

Formalizing Style in Personal Narratives

Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning

Local MAP Sampling for Diffusion Models

HTMformer: Hybrid Time and Multivariate Transformer for Time Series Forecasting

Native Hybrid Attention for Efficient Sequence Modeling

FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline

Leveraging Large Language Models for Cybersecurity Risk Assessment -- A Case from Forestry Cyber-Physical Systems

TRepLiNa: Layer-wise CKA+REPINA Alignment Improves Low-Resource Machine Translation in Aya-23 8B

Probing the Difficulty Perception Mechanism of Large Language Models

Logistic-Gated Operators Enable Auditable Unit-Aware Thresholds in Symbolic Regression

MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation

FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration

Curved Boolean Logic: A Contextual Generalization of Propositional Logic with Algorithmic Consequences

LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning

Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling

Diverse Text-to-Image Generation via Contrastive Noise Optimization

PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling

HAVIR: Hierarchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion

Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning

Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models

SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence

Learning to Reason as Action Abstractions with Scalable Mid-Training RL

How Effective Are Time-Series Models for Rainfall Nowcasting? A Comprehensive Benchmark for Rainfall Nowcasting Incorporating PWV Data

Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models

LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation

The 2025 OpenAI Preparedness Framework does not guarantee any AI risk mitigation practices: a proof-of-concept for affordance analyzes of AI safety policies

Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models

Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality

Multi-Modal Manipulation via Multi-Modal Policy Consensus

Graph Your Own Prompt

Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm

ConQuER: Modular Architectures for Control and Bias Mitigation in IQP Quantum Generative Models

Question-Driven Analysis and Synthesis: Building Interpretable Thematic Trees with LLMs for Text Clustering and Controllable Generation

Tree Search for LLM Agent Reinforcement Learning

Part-of-speech tagging for Nagamese Language using CRF

Prompt Optimization Meets Subspace Representation Learning for Few-shot Out-of-Distribution Detection

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy

Towards Secure and Explainable Smart Contract Generation with Security-Aware Group Relative Policy Optimization

Clip Your Sequences Fairly: Enforcing Length Fairness for Sequence-Level RL

VL Norm: Rethink Loss Aggregation in RLVR

Long-Range Graph Wavelet Networks

Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate

SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning

VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills

DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off

REFRAG: Rethinking RAG based Decoding

Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting

NSPDI-SNN: An efficient lightweight SNN based on nonlinear synaptic pruning and dendritic integration

TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models

Tailored Teaching with Balanced Difficulty: Elevating Reasoning in Multimodal Chain-of-Thought via Prompt Curriculum

A Synthetic Dataset for Manometry Recognition in Robotic Applications

Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation

Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach

Generative artificial intelligence improves projections of climate extremes

MCPVerse: An Expansive, Real-World Benchmark for Agentic Tool Use

A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo

Agentic large language models improve retrieval-based radiology question answering

Vision-Language Cross-Attention for Real-Time Autonomous Driving

Goal-Based Vision-Language Driving

Efficient Compositional Multi-tasking for On-device Large Language Models

Beyond Rate Coding: Surrogate Gradients Enable Spike Timing Learning in Spiking Neural Networks

{S\textsuperscript{2}M\textsuperscript{2}}: Scalable Stereo Matching Model for Reliable Depth Estimation

Learning Representations of Event Time Series with Sparse Autoencoders for Anomaly Detection, Similarity Search, and Unsupervised Classification

A Lightweight and Robust Framework for Real-Time Colorectal Polyp Detection Using LOF-Based Preprocessing and YOLO-v11n

RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services

Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models

Learning Diffusion Models with Flexible Representation Guidance

Simulating Three-dimensional Turbulence with Physics-informed Neural Networks

Train-before-Test Harmonizes Language Model Rankings

MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection

GradMetaNet: An Equivariant Architecture for Learning on Gradients

PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning

Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap

The Hidden Link Between RLHF and Contrastive Learning

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

Doc2SAR: A Synergistic Framework for High-Fidelity Extraction of Structure-Activity Relationships from Scientific Documents

On Convolutions, Intrinsic Dimensions, and Diffusion Models

Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics

SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification

Ignition Phase: Standard Training for Fast Adversarial Robustness

StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery

Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot

Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity

VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision-Language Models

Revisit What You See: Disclose Language Prior in Vision Tokens for LVLM Decoding

LLM-as-a-qualitative-judge: automating error analysis in natural language generation

From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks

Monotone and Conservative Policy Iteration Beyond the Tabular Case

Goal-Based Vision-Language Driving

Created by

Haebom

Author

Santosh Patapati, Trisanth Srinivasan

Outline

NovaDrive is a single-branch vision-language architecture that processes front camera images, HD map tiles, LiDAR depth, and text-based waypoints in a single branch for autonomous driving in complex situations. It uses a lightweight two-stage cross-attention block to first align waypoint tokens to the HD map, then improves attention to fine-grained image and depth patches. This, combined with a novel smoothing loss that prevents abrupt steering and velocity changes, eliminates the need for circular memory. It fine-tunes the top 15 layers of the 11B LLaMA-3.2 vision-language backbone to enable real-time inference. On the nuScenes/Waymo subset of the MD-NEX Outdoor benchmark, NovaDrive increases the success rate to 84% (+4%), improves path efficiency (SPL) to 0.66 (+0.11), and reduces collision frequency from 2.6% to 1.2% (-1.4%). These gains are primarily attributed to waypoint tokens, partial VLM fine-tuning, and cross-attention fusion.

Takeaways, Limitations

•

Takeaways:

◦

Improving autonomous driving safety and efficiency (improving success rates, route efficiency, and crash frequency).

◦

Reduced fuel or battery usage (shorter route).

◦

Possibility of a lighter and more easily updatable driving stack.

◦

Potential for expansion into other specific AI domains.

•

Limitations:

◦

It is difficult to determine the specific Limitations based on the information provided. (You will need to read the entire paper to find out.)

Made with Slashpage