[공지사항]을 빙자한 안부와 근황

Show more

Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout

Energy Efficiency in AI for 5G and Beyond: A DeepRx Case Study

A PBN-RL-XAI Framework for Discovering a "Hit-and-Run" Therapeutic Strategy in Melanoma

(Almost) Free Modality Stitching of Foundation Models

Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models

SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems

Dually Hierarchical Drift Adaptation for Online Configuration Performance Learning

Tree-Structured Parzen Estimator Can Solve Black-Box Combinatorial Optimization More Efficiently

EXPO: Stable Reinforcement Learning with Expressive Policies

Reinforcement Learning with Action Chunking

On the Effect of Instruction Tuning Loss on Generalization

Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models

Text to model via SysML: Automated generation of dynamical system computational models from unstructured natural language text via enhanced System Modeling Language diagrams

Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why

DRAGON: Dynamic RAG Benchmark On News

Solar Flare Prediction Using Long Short-term Memory (LSTM) and Decomposition-LSTM with Sliding Window Pattern Recognition

Conversation Forests: The Key to Fine Tuning Large Language Models for Multi-Turn Medical Conversations is Branching

RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism

Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence

Stylometry recognizes human and LLM-generated texts in short samples

QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration

Evaluating Multimodal Large Language Models on Educational Textbook Question Answering

FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation

Alleviating User-Sensitive bias with Fair Generative Sequential Recommendation Model

MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications

DeInfoReg: A Decoupled Learning Framework for Better Training Throughput

FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE

ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge

The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products

The Limits of Tractable Marginalization

A quantum semantic framework for natural language processing

ProtocolLLM: RTL Benchmark for SystemVerilog Generation of Communication Protocols

Deepfake Technology Unveiled: The Commoditization of AI and Its Impact on Digital Trust

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Matrix Is All You Need

Temporal Chunking Enhances Recognition of Implicit Sequential Patterns

Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems

PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening

FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing

Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning

Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models

Nexus-Gen: Unified Image Understanding, Generation, and Editing via Prefilled Autoregression in Shared Embedding Space

Leveraging Large Language Models for Multi-Class and Multi-Label Detection of Drug Use and Overdose Symptoms on Social Media

Rethinking the Foundations for Continual Reinforcement Learning

Compositional Flows for 3D Molecule and Synthesis Pathway Co-design

Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Embedding

Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution

Test-time Adaptation for Foundation Medical Segmentation Model without Parametric Updates

Style over Substance: Distilled Language Models Reason Via Stylistic Replication

AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization

Multi-View Node Pruning for Accurate Graph Representation

Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models

Voting or Consensus? Decision-Making in Multi-Agent Debate

Assistance or Disruption? Exploring and Evaluating the Design and Trade-offs of Proactive AI Programming Support

A Generative Approach to LLM Harmfulness Detection with Special Red Flag Tokens

Score-of-Mixture Training: Training One-Step Generative Models Made Simple via Score Estimation of Mixture Distributions

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities

Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs

Comply: Learning Sentences with Complex Weights inspired by Fruit Fly Olfaction

Inverse Reinforcement Learning with Switching Rewards and History Dependency for Characterizing Animal Behaviors

Few-Shot Radar Signal Recognition through Self-Supervised Learning and Radio Frequency Domain Adaptation

Transfer Learning Analysis of Variational Quantum Circuits

Plancraft: an evaluation dataset for planning with LLM agents

Fully Data-driven but Interpretable Human Behavioural Modelling with Differentiable Discrete Choice Model

A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation

Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?

Searching Latent Program Spaces

The Pragmatic Frames of Spurious Correlations in Machine Learning: Interpreting How and Why They Matter

ComFairGNN: Community Fair Graph Neural Network

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback

Large Language Models Engineer Too Many Simple Features For Tabular Data

Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control

IdeaSynth: Iterative Research Idea Development Through Evolving and Composing Idea Facets with Literature-Grounded Feedback

SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning

Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy

SA-GDA: Spectral Augmentation for Graph Domain Adaptation

The GPT Surprise: Offering Large Language Model Chat in a Massive Coding Class Reduced Engagement but Increased Adopters Exam Performances

State-Constrained Offline Reinforcement Learning

SimAD: A Simple Dissimilarity-based Approach for Time Series Anomaly Detection

Unified ODE Analysis of Smooth Q-Learning Algorithms

FairTargetSim: An Interactive Simulator for Understanding and Explaining the Fairness Effects of Target Variable Definition

Fine-grained Stateful Knowledge Exploration: Effective and Efficient Graph Retrieval with Large Language Models

Learning Safe Numeric Planning Action Models

Augmenting End-to-End Steering Angle Prediction with CAN Bus Data

EASTER: Embedding Aggregation-based Heterogeneous Models Training in Vertical Federated Learning

GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks

Acquiring and Adapting Priors for Novel Tasks via Neural Meta-Architectures

VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains

Is Human-Written Data Enough? The Challenge of Teaching Reasoning to LLMs Without RL or Distillation

Working with AI: Measuring the Occupational Implications of Generative AI

Establishing Best Practices for Building Rigorous Agentic Benchmarks

An Agentic Framework for Autonomous Metamaterial Modeling and Inverse Design

Seeking to Collide: Online Safety-Critical Scenario Generation for Autonomous Driving with Retrieval Augmented Large Language Models

BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking

The Odyssey of the Fittest: Can Agents Survive and Still Be Good?

Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models

Created by

Haebom

저자

Aida Kostikova, Zhipin Wang, Deidamea Bajri, Ole Putz, Benjamin Paa{\ss}en, Steffen Eger

개요

본 논문은 2022년부터 2025년까지의 ACL과 arXiv 논문 25만 건을 대상으로, 대규모 언어 모델(LLM)의 한계에 대한 연구 동향을 데이터 기반의 준자동화 방식으로 분석한 설문 조사 결과를 제시합니다. 키워드 필터링 및 LLM 기반 분류를 통해 14,648건의 관련 논문을 추출하고, HDBSCAN+BERTopic 및 LlooM을 이용한 토픽 클러스터링을 통해 7~15가지 주요 LLM 한계 유형을 도출했습니다. 분석 결과, LLM 관련 연구는 2022년부터 2025년까지 ACL에서 6배, arXiv에서 15배 가까이 증가했으며, LLM 한계(LLLMs) 연구는 그보다 더 빠르게 증가했습니다. 가장 많이 연구된 한계는 추론이며, 일반화, 환각, 편향, 보안이 그 뒤를 이었습니다. ACL 데이터셋의 토픽 분포는 시간이 지남에 따라 비교적 안정적인 반면, arXiv 데이터셋은 2022년부터 2025년 사이에 안전성 및 제어 가능성(보안 위험, 정렬, 환각, 지식 편집 등), 그리고 다중 모달리티 쪽으로 이동하는 경향을 보였습니다. 본 논문은 주석이 달린 초록 데이터셋과 검증된 방법론을 https://github.com/a-kostikova/LLLMs-Survey 에서 공개합니다.

GitHub - a-kostikova/LLLMs-Survey: The GitHub page for the survey paper "LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models".

The GitHub page for the survey paper "LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models". - a-kostikova/LLLMs-Survey

시사점, 한계점

•

시사점:

◦

LLM의 한계에 대한 연구가 급증하고 있으며, 특히 안전성 및 제어 가능성, 다중 모달리티에 대한 연구가 활발해지고 있음을 정량적으로 보여줌.

◦

추론, 일반화, 환각, 편향, 보안 등 주요 LLM 한계에 대한 연구 동향 파악 및 분석 가능.

◦

데이터셋과 방법론 공개를 통해 후속 연구에 기여.

•

한계점:

◦

키워드 필터링 및 LLM 기반 분류에 의존하여, 일부 관련 논문이 누락되었을 가능성.

◦

토픽 클러스터링 결과의 해석에 주관성이 개입될 수 있음.

◦

분석 대상 기간이 2022년부터 2025년으로 제한됨.

◦

ACL과 arXiv 데이터셋에 편향이 존재할 가능성.

Made with Slashpage