Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions

On The Impact of Merge Request Deviations on Code Review Practices

Societal AI Research Has Become Less Interdisciplinary

Geometric deep learning for local growth prediction on abdominal aortic aneurysm surfaces

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation

KP-PINNs: Kernel Packet Accelerated Physics Informed Neural Networks

Teaching Physical Awareness to LLMs through Sounds

TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization

TACTIC: Translation Agents with Cognitive-Theoretic Interactive Collaboration

Your Agent Can Defend Itself against Backdoor Attacks

Learnable Spatial-Temporal Positional Encoding for Link Prediction

Unable to Forget: Proactive lnterference Reveals Working Memory Limits in LLMs Beyond Context Length

IGraSS: Learning to Identify Infrastructure Networks from Satellite Imagery by Iterative Graph-constrained Semantic Segmentation

STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation

Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting

Toward Reliable AR-Guided Surgical Navigation: Interactive Deformation Modeling with Data-Driven Biomechanics and Prompts

Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining

Vision Transformers Don't Need Trained Registers

Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations

AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking

Synthesis by Design: Controlled Data Generation via Structural Guidance

MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization

MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

Pre-trained Large Language Models Learn Hidden Markov Models In-context

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems

A Reinforcement Learning Approach for RIS-aided Fair Communications

Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR

Advancing Decoding Strategies: Enhancements in Locally Typical Sampling for LLMs

Context Is Not Comprehension: Unmasking LLM reasoning blind spots with VLO

HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model

Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025

GraphRAG-Bench: Challenging Domain-Specific Reasoning for Evaluating Graph Retrieval-Augmented Generation

Fourier-Modulated Implicit Neural Representation for Multispectral Satellite Image Compression

NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction

Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis

Bayesian Neural Scaling Law Extrapolation with Prior-Fitted Networks

DeepMultiConnectome: Deep Multi-Task Prediction of Structural Connectomes Directly from Diffusion MRI Tractography

SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting

Large Language Models Miss the Multi-Agent Mark

Rethinking Text-based Protein Understanding: Retrieval or LLM?

Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models

Discovering Forbidden Topics in Language Models

LIFEBench: Evaluating Length Instruction Following in Large Language Models

Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

Reciprocity as the Foundational Substrate of Society: How Reciprocal Dynamics Scale into Social Systems

LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification

Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective

Convert Language Model into a Value-based Strategic Planner

Griffin: Towards a Graph-Centric Relational Database Foundation Model

Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items

Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism

Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment

Assessment of Evolving Large Language Models in Upper Secondary Mathematics

TerraMind: Large-Scale Generative Multimodality for Earth Observation

LEMUR Neural Network Dataset: Towards Seamless AutoML

Style over Substance: Distilled Language Models Reason Via Stylistic Replication

Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition

Chem42: a Family of chemical Language Models for Target-aware Ligand Generation

AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification

FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts

Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring

Revisiting Self-Consistency from Dynamic Distributional Alignment Perspective on Answer Aggregation

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation

Lost in Sequence: Do Large Language Models Understand Sequential Recommendation?

Conformal Prediction as Bayesian Quadrature

On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis

Trustworthy AI: Safety, Bias, and Privacy -- A Survey

NestQuant: Nested Lattice Quantization for Matrix Products and LLMs

Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies

MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents

Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms

PatchPilot: A Cost-Efficient Software Engineering Agent with Early Attempts on Formal Verification

Bias Detection via Maximum Subgroup Discrepancy

Irony Detection, Reasoning and Understanding in Zero-shot Learning

TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval

An LLM-Empowered Adaptive Evolutionary Algorithm For Multi-Component Deep Learning Systems

Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field

7B Fully Open Source Moxin-LLM/VLM -- From Pretraining to GRPO-based Reinforcement Learning Enhancement

Multi-Party Supervised Fine-tuning of Language Models for Multi-Party Dialogue Generation

Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning

CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization

GenJoin: Conditional Generative Plan-to-Plan Query Optimizer that Learns from Subplan Hints

Code-Switching Curriculum Learning for Multilingual Transfer in LLMs

CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis

Phonology-Guided Speech-to-Speech Translation for African Languages

The Causal Information Bottleneck and Optimal Causal Variable Abstractions

Multimodal Pragmatic Jailbreak on Text-to-image Models

Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning

A Survey on Knowledge Organization Systems of Research Fields: Resources and Challenges

LogProber: Disentangling confidence from contamination in LLM responses

Holistic Uncertainty Estimation For Open-Set Recognition

AcTracer: Active Testing of Large Language Model via Multi-Stage Sampling

XMeCap: Meme Caption Generation with Sub-Image Adaptability

CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

The Remarkable Robustness of LLMs: Stages of Inference?

BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection

Length-Induced Embedding Collapse in PLM-based Models

Created by

Haebom

저자

Yuqi Zhou, Sunhao Dai, Zhanshuo Cao, Xiao Zhang, Jun Xu

개요

본 논문은 PLM 기반 모델의 텍스트 임베딩이 긴 텍스트에서 성능 저하를 보이는 현상을 "Length Collapse" 라고 명명하고, 이 현상의 원인과 해결 방안을 제시합니다. Length Collapse는 긴 텍스트의 임베딩이 서로 유사하게 군집화되는 현상으로, 짧은 텍스트와 긴 텍스트 간의 분포 불일치를 야기하여 다운스트림 작업의 성능 저하를 초래합니다. 본 논문에서는 셀프 어텐션 메커니즘이 저주파 필터 역할을 하며, 텍스트 길이가 증가할수록 저주파 필터링이 강화되어 임베딩이 저주파 성분을 더 많이 유지하게 되는 것을 이론적으로 분석합니다. 이로 인해 입력 토큰 특징이 유사해지고, 결국 긴 텍스트의 임베딩이 군집화되는 Length Collapse 현상이 발생합니다. 이 문제를 해결하기 위해, 본 논문에서는 긴 텍스트와 짧은 텍스트 간의 저주파 필터링 속도 차이를 줄이는 TempScale이라는 간단한 방법을 제안합니다. TempScale은 MTEB에서 0.94%, LongEmbed에서 1.10%의 성능 향상을 가져왔습니다.

시사점, 한계점

•

시사점:

◦

PLM 기반 모델의 텍스트 임베딩에서 발생하는 Length Collapse 현상을 규명하고, 그 원인을 이론적으로 분석하였습니다.

◦

Length Collapse 현상을 완화하는 효과적인 방법인 TempScale을 제시하고, 실험적으로 그 효과를 검증하였습니다.

◦

긴 텍스트 처리 성능 향상에 대한 새로운 방향을 제시합니다.

•

한계점:

◦

TempScale의 효과가 특정 데이터셋과 작업에 국한될 가능성이 있습니다.

◦

더 다양한 PLM 모델과 다운스트림 작업에 대한 추가적인 실험이 필요합니다.

◦

Length Collapse 현상의 원인 분석이 셀프 어텐션 메커니즘에만 국한되어, 다른 요인들의 영향을 고려하지 않았을 가능성이 있습니다.

Made with Slashpage