Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Dense Video Understanding with Gated Residual Tokenization

Machines are more productive than humans until they aren't, and vice versa

BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching

Exploring Data and Parameter Efficient Strategies for Arabic Dialect Identifications

The threat of analytic flexibility in using large language models to simulate human data: A call to attention

Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study

A Graph-Based Approach to Alert Contextualisation in Security Operations Centres

FunAudio-ASR Technical Report

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering

Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models

Pluralistic Alignment for Healthcare: A Role-Driven Framework

ALIGNS: Unlocking nomological networks in psychological measurement through a large language model

A Survey of Reinforcement Learning for Large Reasoning Models

Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network

Reconstruction Alignment Improves Unified Multimodal Models

Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models

FASL-Seg: Anatomy and Tool Segmentation of Surgical Scenes

Dual-Mode Deep Anomaly Detection for Medical Manufacturing: Structural Similarity and Feature Distance

Exploit Tool Invocation Prompt for Tool Behavior Hijacking in LLM-Based Agentic System

Measuring the Measures: Discriminative Capacity of Representational Similarity Metrics Across Model Families

AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting

Ensemble of Pathology Foundation Models for MIDOG 2025 Track 2: Atypical Mitosis Classification

Deep Learning-Driven Multimodal Detection and Movement Analysis of Objects in Culinary

Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning

MovieCORE: COgnitive REasoning in Movies

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Generalized invariants meet constitutive neural networks: A novel framework for hyperelastic materials

Neural Logic Networks for Interpretable Classification

Roll Your Eyes: Gaze Redirection via Explicit 3D Eyeball Rotation

Controllable Surface Diffusion Generative Model for Neurodevelopmental Trajectories

Deciding how to respond: A deliberative framework to guide policymaker responses to AI systems

SCORPION: Addressing Scanner-Induced Variability in Histopathology

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation

FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation

EnCoBo: Energy-Guided Concept Bottlenecks for Interpretable Generation

T-SYNTH: A Knowledge-Based Dataset of Synthetic Breast Images

MedVAL: Toward Expert-Level Medical Text Validation with Language Models

Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems

"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets

Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation

An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing

DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning

Semantic Exploration and Dense Mapping of Complex Environments using Ground Robot with Panoramic LiDAR-Camera Fusion

Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data

Binarized Neural Networks Converge Toward Algorithmic Simplicity: Empirical Support for the Learning-as-Compression Hypothesis

PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models

DisastIR: A Comprehensive Information Retrieval Benchmark for Disaster Management

Preference Isolation Forest for Structure-based Anomaly Detection

Trustless Autonomy: Understanding Motivations, Benefits, and Governance Dilemmas in Self-Sovereign Decentralized AI Agents

GRADA: Graph-based Reranking against Adversarial Documents Attack

Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models

Direct Video-Based Spatiotemporal Deep Learning for Cattle Lameness Detection

Read Before You Think: Mitigating LLM Comprehension Failures with Step-by-Step Reading

Zero-Shot LLMs in Human-in-the-Loop RL: Replacing Human Feedback for Reward Shaping

Predicting Multi-Agent Specialization via Task Parallelizability

Fine-tuning Vision Language Models with Graph-based Knowledge for Explainable Medical Image Analysis

VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion

METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling

SNaRe: Domain-aware Data Generation for Low-Resource Event Detection

Superpose Task-specific Features for Model Merging

Examining False Positives under Inference Scaling for Mathematical Reasoning

SWAT: Sliding Window Adversarial Training for Gradual Domain Adaptation

Advanced Physics-Informed Neural Network with Residuals for Solving Complex Integral Equations

Retrieval-Retro: Retrieval-based Inorganic Retrosynthesis with Expert Knowledge

Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in Switzerland

Reconstruction of Differentially Private Text Sanitization via Large Language Models

3DS: Medical Domain Adaptation of LLMs via Decomposed Difficulty-based Data Selection

The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models

Top K Enhanced Reinforcement Learning Attacks on Heterogeneous Graph Node Classification

Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models

EXPLOR: Extrapolatory Pseudo-Label Matching for Out-of-distribution Uncertainty Based Rejection

Spatio-Temporal Anomaly Detection with Graph Networks for Data Quality Monitoring of the Hadron Calorimeter

Rule-Based Error Detection and Correction to Operationalize Movement Trajectory Classification

Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification

The Art of Saying "Maybe": A Conformal Lens for Uncertainty Benchmarking in VLMs

Human + AI for Accelerating Ad Localization Evaluation

Statistical Methods in Generative AI

InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

DSperse: A Framework for Targeted Verification in Zero-Knowledge Machine Learning

DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework

Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge

Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning

Automatic Mapping of AutomationML Files to Ontologies for Graph Queries and Validation

Explicit Context-Driven Neural Acoustic Modeling for High-Fidelity RIR Generation

FlowRL: Matching Reward Distributions for LLM Reasoning

Orion: Fuzzing Workflow Automation

TITAN: A Trajectory-Informed Technique for Adaptive Parameter Freezing in Large-Scale VQE

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models

Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting

Semi-Supervised 3D Medical Segmentation from 2D Natural Images Pretrained Model

Leveraging Geometric Visual Illusions as Perceptual Inductive Biases for Vision Models

Exploring How Audio Effects Alter Emotion with Foundation Models

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

The mechanization of science illustrated by the Lean formalization of the multi-graded Proj construction

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning

TextMine: LLM-Powered Knowledge Extraction for Humanitarian Mine Action

Listening, Imagining \& Refining: A Heuristic Optimized ASR Correction Framework with LLMs

Communication Efficient Split Learning of ViTs with Attention-based Double Compression

PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models

Created by

Haebom

저자

Chenzhuo Zhao, Ziqian Liu, Xinda Wang, Junting Lu, Chaoyi Ruan

개요

본 논문은 대규모 언어 모델의 성능 향상을 위한 파인튜닝 대안으로 프롬프트 최적화에 초점을 맞추고 있습니다. 기존의 프롬프트 최적화 방법들은 전체 출력 샘플링 및 자기 비판 또는 사람의 주석 기반 선호도 평가에 의존하여 확장성이 제한적이라는 한계를 지닙니다. 본 논문에서는 토큰 단위 교차 엔트로피를 경량의 직접적인 평가 신호로 사용하는 통합 프레임워크인 PMPO(Probabilistic Metric Prompt Optimization)를 제시합니다. PMPO는 마스킹 기반 분석을 통해 저품질 프롬프트 부분을 찾아내어 반복적으로 재작성하여 개선된 변형을 제안합니다. 특히, PMPO는 평가 과정에서 단일 전방 패스에서 손실을 최소화하여 변형을 선택하며, 출력 샘플링 및 사람 기반 점수 매기기를 제거합니다. 재작성 제안에는 표준 생성을 사용합니다. 이러한 손실 기반 전략은 지도 학습 및 선호도 기반 작업 모두를 지원합니다. 다양한 모델 크기와 데이터셋에서 PMPO는 기존 프롬프트 최적화보다 우수한 성능을 보였습니다. BBH에서 가장 높은 평균 정확도를 달성했고, GSM8K 및 AQUA RAT에서도 강력한 성능을 보였으며, AlpacaEval 2.0 승률을 19% 이상 높였습니다.

시사점, 한계점

•

시사점:

◦

토큰 단위 교차 엔트로피를 사용한 경량화된 프롬프트 최적화 방법 제시

◦

출력 샘플링 및 사람 평가 없이 효율적인 프롬프트 최적화 가능

◦

지도 학습 및 선호도 기반 작업 모두 지원 가능

◦

다양한 모델과 데이터셋에서 기존 방법 대비 우수한 성능 입증 (BBH, GSM8K, AQUA RAT, AlpacaEval 2.0)

•

한계점:

◦

본 논문에서 제시된 PMPO의 일반화 성능에 대한 추가적인 연구가 필요할 수 있음.

◦

특정 데이터셋이나 모델에 대한 최적화가 이루어졌을 가능성이 있으며, 다른 데이터셋이나 모델에 대한 적용성을 추가적으로 검증할 필요가 있음.

◦

마스킹 기반 분석의 한계로 인해, 일부 저품질 프롬프트 부분을 정확하게 식별하지 못할 가능성 존재.

Made with Slashpage