Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

VarCoNet: A variability-aware self-supervised framework for functional connectome extraction from resting-state fMRI

KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting

SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment

Pack and Force Your Memory: Long-form and Consistent Video Generation

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models

Analyzing Latent Concepts in Code Language Models

Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving

DM-Bench: Benchmarking LLMs for Personalized Decision Making in Diabetes Management

YOLO-Based Defect Detection for Metal Sheets

Jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking

SecInfer: Preventing Prompt Injection via Inference-time Scaling

Putnam-like dataset summary: LLMs as mathematical competition contestants

Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation

Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement

Observation-Free Attacks on Online Learning to Rank

MTRec: Learning to Align with User Preferences via Mental Reward Models

MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANs

When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models

Flow-Induced Diagonal Gaussian Processes

Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach

Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection

Robust Pan-Cancer Mitotic Figure Detection with YOLOv12

Scam2Prompt: A Scalable Framework for Auditing Malicious Scam Endpoints in Production LLMs

Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization

STORI: A Benchmark and Taxonomy for Stochastic Environments

A Study on the Framework for Evaluating the Ethics and Trustworthiness of Generative AI

Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs

FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

Quantum-RAG and PunGPT2: Advancing Low-Resource Language Generation and Retrieval for the Punjabi Language

Tuning LLM-based Code Optimization via Meta-Prompting: An Industrial Perspective

SBP-YOLO:A Lightweight Real-Time Model for Detecting Speed Bumps and Potholes toward Intelligent Vehicle Suspension Systems

An Architecture for Spatial Networking

A Comprehensive Review on Harnessing Large Language Models to Overcome Recommender System Challenges

First Hallucination Tokens Are Different from Conditional Ones

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

Model Parallelism With Subnetwork Data Parallelism

VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting

A Survey of Pun Generation: Datasets, Evaluations and Methodologies

Controlled Generation with Equivariant Variational Flow Matching

CAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree

DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

Semantic Preprocessing for LLM-based Malware Analysis

Manipulating 3D Molecules in a Fixed-Dimensional E(3)-Equivariant Latent Space

Permissioned LLMs: Enforcing Access Control in Large Language Models

Efficient Preimage Approximation for Neural Network Certification

JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models

NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation

Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model

Pre-training Limited Memory Language Models with Internal and External Knowledge

OT Score: An OT based Confidence Score for Source Free Unsupervised Domain Adaptation

Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments

A Survey of Deep Learning for Complex Speech Spectrograms

Continuous Thought Machines

CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering

XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

PropRAG: Guiding Retrieval with Beam Search over Proposition Paths

Activated LoRA: Fine-tuned LLMs for Intrinsics

Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models

Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations

Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation

A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Rethinking the Vulnerability of Concept Erasure and a New Method

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training

MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents

CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Graph Neural Networks for Transmission Grid Topology Control: Busbar Information Asymmetry and Heterogeneous Representations

Inferring Pluggable Types with Machine Learning

Optimizing Container Loading and Unloading through Dual-Cycling and Dockyard Rehandle Reduction Using a Hybrid Genetic Algorithm

LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing

Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders

RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives

Unified Domain Adaptive Semantic Segmentation

Do AI Models Perform Human-like Abstract Reasoning Across Modalities?

Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs

Thinkquel: A Model Dedicated to Text-to-dbt Using Synthetic Data and a Span-Aware Objective

OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!

Learning to Interact in World Latent for Team Coordination

Understanding Generative Recommendation with Semantic IDs from a Model-scaling View

GUI-PRA: Process Reward Agent for GUI Tasks

PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning

Efficient & Correct Predictive Equivalence for Decision Trees

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Gala: Global LLM Agents for Text-to-Model Translation

Disentangling Multiplex Spatial-Temporal Transition Graph Representation Learning for Socially Enhanced POI Recommendation

LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems

V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving

MIRROR: Modular Internal Processing for Personalized Safety in LLM Dialogue

SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning

Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning

ViLBias: Detecting and Reasoning about Bias in Multimodal Content

OML: A Primitive for Reconciling Open Access with Owner Control in AI Model Distribution

Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes

When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models

Created by

Haebom

Author

Yingming Zheng, Hanqi Li, Kai Yu, Lu Chen

Outline

Large-Scale Language Models (LLMs) have demonstrated impressive performance in natural language processing (NLP) tasks. With the increasing demand for long context windows in real-world applications, continuous pretraining on long-context data and supervised fine-tuning (SFT) have become a common approach. While the impact of data length has been extensively studied for continuous pretraining, its impact on SFTs remains unclear. In this study, we systematically investigated how SFT data length affects LLM performance in short-context tasks. Paradoxically, we found that long-context SFTs improve short-context performance, which is contrary to the performance degradation typically observed with long-context pretraining. To elucidate the underlying mechanism of this phenomenon, we analyzed the two main components—Multi-Head Attention (MHA) and Feed-Forward Network (FFN)—separately and showed that both components independently benefit from long-context SFTs. Furthermore, we studied their interactions, revealing a knowledge preference bias: long-context SFTs promote contextual knowledge, while short-context SFTs favor parameter knowledge, suggesting that relying solely on long-context SFTs is suboptimal. Finally, we demonstrate that hybrid training mitigates these biases, providing explainable guidance for fine-tuning LLM.

Takeaways, Limitations

•

Long-context SFT can improve short-context task performance.

•

Both MHA and FFN benefit from long context SFTs.

•

Long context SFTs have a knowledge bias that favors context knowledge, while short context SFTs have a knowledge bias that favors parameter knowledge.

•

Hybrid training can mitigate these biases.

•

This study may have investigated the effects of SFT data length only on a narrow range of tasks, and generalizability to other task types may require further research.

View PDF

Made with Slashpage