Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

ACCeLLiuM: Supervised Fine-Tuning for Automated OpenACC Pragma Generation

AnchDrive: Bootstrapping Diffusion Policies with Hybrid Trajectory Anchors for End-to-End Driving

Diffusion-Augmented Contrastive Learning: A Noise-Robust Encoder for Biosignal Representations

FusedANN: Convexified Hybrid ANN via Attribute-Vector Fusion

HiCoLoRA: Addressing Context-Prompt Misalignment via Hierarchical Collaborative LoRA for Zero-Shot DST

A Longitudinal Randomized Control Study of Companion Chatbot Use: Anthropomorphism and Its Mediating Role on Social Impacts

TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding

Automated Facility Enumeration for Building Compliance Checking using Door Detection and Large Language Models

Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling

Comparing RAG and GraphRAG for Page-Level Retrieval Question Answering on Math Textbook

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models

Recent Advances in Microscopy Image Enhancement using Deep Learning: A Survey

Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity

Towards a Physics Foundation Model

Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews

Positional Encoding via Token-Aware Phase Attention

Chain or tree? Re-evaluating complex reasoning from the perspective of a matrix of thought

A Two-Stage Strategy for Mitosis Detection Using Improved YOLO11x Proposals and ConvNeXt Classification

JudgeAgent: Knowledge-wise and Dynamic LLM Evaluation with Agent-as-Interviewer

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

Scalable Option Learning in High-Throughput Environments

“She was useful, but a bit too optimistic”: Augmenting Design with Interactive Virtual Personas

In-Context Algorithm Emulation in Fixed-Weight Transformers

Dream to Chat: Model-based Reinforcement Learning on Dialogues with User Belief Modeling

Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders

Conflict-Aware Soft Prompting for Retrieval-Augmented Generation

ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification

StreetReaderAI: Making Street View Accessible Using Context-Aware Multimodal AI

Graph is a Natural Regularization: Revisiting Vector Quantization for Graph Representation Learning

Intuition emerges in Maximum Caliber models at criticality

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy

Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models

Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning

SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy

DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS

Generative Logic: A New Computer Architecture for Deterministic Reasoning and Knowledge Generation

Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles

Hierarchical Graph Neural Network for Compressed Speech Steganalysis

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

The Invisible Leash: Why RLVR May or May Not Escape Its Origin

APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

KV Cache Steering for Controlling Frozen LLMs

Lightweight MSA Design Advances Protein Folding From Evolutionary Embeddings

Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders

Neural-Network solver of ideal MHD equilibria

Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective

Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs

On the Necessity of Output Distribution Reweighting for Effective Class Unlearning

TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

Latent Concept Disentanglement in Transformer-based Language Models

Personalized LLM Decoding via Contrasting Personal Preferences

Exploiting Block Coordinate Descent for Cost-Effective LLM Model Training

Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox

Think With Videos For Agentic Long-Video Understanding

VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks

Position: Simulating Society Requires Simulating Thought

AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification

DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models

Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement

Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

Physics-Guided Motion Loss for Video Generation Model

Probing Neural Topology of Large Language Models

InfiMed: Low-Resource Medical MLLMs with Advancing Understanding and Reasoning

Mamba Integrated with Physics Principles Masters Long-term Chaotic System Forecasting

Model-Preserving Adaptive Rounding

DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training

Spectral-inspired Operator Learning with Limited Data and Unknown Physics

BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases

Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

Prompting is not Enough: Exploring Knowledge Integration and Controllable Generation on Large Language Models

HD-PiSSA: High-Rank Distributed Orthogonal Adaptation

Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study

FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning

BP-Seg: A graphical model approach to unsupervised and non-contiguous text segmentation using belief propagation

Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalized Reasoning

The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

Learning Flexible Forward Trajectories for Masked Molecular Diffusion

Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems

Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

UniErase: Towards Balanced and Precise Unlearning in Language Models

Octic Vision Transformers: Quicker Visions Through Equivariance

Intentional Gesture: Deliver Your Intentions with Gestures for Speech

UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models

VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation

Learning Hierarchical Domain Models Through Environment-Grounded Interaction

Shadow-FT: Tuning Instruct Model via Training on Paired Base Model

Structured Relational Representations

Latent Veracity Inference for Identifying Errors in Stepwise Reasoning

Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders

ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy

Created by

Haebom

Author

Hongze Tan, Jianfei Pan, Jinghao Lin, Tao Chen, Zhihang Zheng, Zhihao Tang, Haihua Yang

Outline

Reinforcement learning (RL) plays a crucial role in improving the inference performance of large-scale language models (LLMs). However, existing algorithms employ a crude credit allocation method that uniformly applies rewards to all tokens in a sequence, a critical flaw in long-chain inference tasks. To address this issue, this paper proposes a novel mechanism, Dynamic Entropy Weighting, that facilitates fine-tuned rewards via two novel algorithms: Group Token Policy Optimization (GTPO) and Sequence-Level GRPO (GRPO-S). The proposed method is based on the hypothesis that high policy entropy within the inference path is a powerful heuristic that indicates cognitive effort at critical junctures. By utilizing policy entropy in reward formation, we achieve true token-specific credit allocation. Experimental results demonstrate that our method outperforms the robust DAPO baseline, confirming that the entropy weighting mechanism is a key driver of performance improvement.

Takeaways, Limitations

•

Takeaways:

◦

We propose a novel mechanism, Dynamic Entropy Weighting, that enables token-specific credit allocation in LLM inference.

◦

Implementation of Dynamic Entropy Weighting via GTPO and GRPO-S algorithms.

◦

Demonstrated superior performance compared to a strong baseline.

◦

A novel approach that utilizes policy entropy as a reward signal is presented.

•

Limitations:

◦

There is no specific mention of Limitations in the paper.

View PDF

Made with Slashpage