Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Structure Transfer: an Inference-Based Calculus for the Transformation of Representations

Ensemble of Pathology Foundation Models for MIDOG 2025 Track 2: Atypical Mitosis Classification

AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation

Understanding Space Is Rocket Science -- Only Top Reasoning Models Can Solve Spatial Understanding Tasks

DaMoC: Efficiently Selecting the Optimal Large Language Model for Fine-tuning Domain Tasks Based on Data and Model Compression

Modular Techniques for Synthetic Long-Context Data Generation in Language Model Training and Evaluation

EZhouNet:A framework based on graph neural network and anchor interval for the respiratory sound event detection

AImoclips: A Benchmark for Evaluating Emotion Conveyance in Text-to-Music Generation

First Order Model-Based RL through Decoupled Backpropagation

Pilot Study on Generative AI and Critical Thinking in Higher Education Classrooms

Beacon: Post-Training Quantization with Integrated Grid Selection

Is Artificial Intelligence Reshaping the Landscape of the International Academic Community of Geosciences?

Vectorized Attention with Learnable Encoding for Quantum Transformer

Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration

MultiGen: Child-Friendly Multilingual Speech Generator with LLMs

StreetViewAI: Making Street View Accessible Using Context-Aware Multimodal AI

Street-Level AI: Are Large Language Models Ready for Real-World Judgments?

The KG-ER Conceptual Schema Language

LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing

Conditional Video Generation for High-Efficiency Video Compression

TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP

Demographic-aware fine-grained classification of pediatric wrist fractures

An Analysis of Action-Value Temporal-Difference Methods That Learn State Values

Stochastic Parameter Decomposition

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation

MiniCPM4: Ultra-Efficient LLMs on End Devices

Evaluating the Efficacy of LLM-Based Reasoning for Multiobjective HPC Job Scheduling

How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

Optimization of Module Transferability in Single Image Super-Resolution: Universality Assessment and Cycle Residual Blocks

Transferable Mask Transformer: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation

RBT4DNN: Requirements-based Testing of Neural Networks

Robust Offline Imitation Learning Through State-level Trajectory Stitching

Beyond holography: the entropic quantum gravity foundations of image processing

KNighter: Transforming Static Analysis with LLM-Synthesized Checkers

FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response

CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection

Rapid Word Learning Through Meta In-Context Learning

Image Embedding Sampling Method for Diverse Captioning

Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases?

Extended Histogram-based Outlier Score (EHBOS)

A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models

Breaking the Context Bottleneck on Long Time Series Forecasting

Defending LVLMs Against Vision Attacks through Partial-Perception Supervision

ACING: Actor-Critic for Instruction Learning in Black-Box LLMs

Kolb-Based Experiential Learning for Generalist Agents with Human-Level Kaggle Data Science Performance

Quantifying Calibration Error in Neural Networks Through Evidence-Based Theory

Robust training of implicit generative models for multivariate and heavy-tailed distributions with an invariant statistical loss

Learning from 10 Demos: Generalisable and Sample-Efficient Policy Learning with Oriented Affordance Frames

AutoPETIII: The Tracer Frontier. What Frontier?

Long Input Sequence Network for Long Time Series Forecasting

FFHFlow: Diverse and Uncertainty-Aware Dexterous Grasp Generation via Flow Variational Inference

Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers

MTP: A Meaning-Typed Language Abstraction for AI-Integrated Programming

Diffusion on language model encodings for protein sequence generation

Style Transfer to Calvin and Hobbes comics using Stable Diffusion

Autonomation, Not Automation: Activities and Needs of European Fact-checkers as a Basis for Designing Human-Centered AI Systems

Plan Verification for LLM-Based Embodied Task Completion Agents

EigenBench: A Comparative Behavioral Measure of Value Alignment

Oyster-I: Beyond Refusal -- Constructive Safety Alignment for Responsible Language Models

Extending FKG.in: Towards a Food Claim Traceability Network

DeepVIS: Bridging Natural Language and Data Visualization Through Step-wise Reasoning

Theory of Mind Using Active Inference: A Framework for Multi-Agent Cooperation

CP-Bench: Evaluating Large Language Models for Constraint Modelling

Axiomatics of Restricted Choices by Linear Orders of Sets with Minimum as Fallback

DMN-Guided Prompting: A Framework for Controlling LLM Behavior

Computational Basis of LLM's Decision Making in Social Simulation

Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers

Enhancing FKG.in: automating Indian food composition analysis

WASP: A Weight-Space Approach to Detecting Learned Spuriousness

Transferable Belief Model on Quantum Circuits

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

(Ir)rationality in AI: State of the Art, Research Challenges and Open Questions

Intelligence Primer

ChronoGraph: A Real-World Graph-Based Multivariate Time Series Dataset

Delta Activations: A Representation for Finetuned Large Language Models

DEXOP: A Device for Robotic Transfer of Dexterous Human Manipulation

Towards a Unified View of Large Language Model Post-Training

No Thoughts Just AI: Biased LLM Recommendations Limit Human Agency in Resume Screening

IPA: An Information-Preserving Input Projection Framework for Efficient Foundation Model Adaptation

SSGaussian: Semantic-Aware and Structure-Preserving 3D Style Transfer

Parking Availability Prediction via Fusing Multi-Source Data with A Self-Supervised Learning Enhanced Spatio-Temporal Inverted Transformer

PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation

AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

From Editor to Dense Geometry Estimator

Decoupled Entity Representation Learning for Pinterest Ads Ranking

Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models

HumAIne-Chatbot: Real-Time Personalized Conversational AI via Reinforcement Learning

Reinforcement Learning for Robust Ageing-Aware Control of Li-ion Battery Systems with Data-Driven Formal Verification

An Empirical Study of Vulnerabilities in Python Packages and Their Detection

How many patients could we save with LLM priors?

Learning Active Perception via Self-Evolving Preference Optimization for GUI Grounding

MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions

VisioFirm: Cross-Platform AI-assisted Annotation Tool for Computer Vision

Crossing the Species Divide: Transfer Learning from Speech to Animal Sounds

YOLO Ensemble for UAV-based Multispectral Defect Detection in Wind Turbine Components

Attention as an Adaptive Filter

TAGAL: Tabular Data Generation using Agentic LLM Methods

Enhancing Technical Documents Retrieval for RAG

Plan Verification for LLM-Based Embodied Task Completion Agents

Created by

Haebom

저자

Ananth Hariharan, Vardhan Dongre, Dilek Hakkani-Tur, Gokhan Tur

개요

본 논문은 구현된 AI를 위한 대규모 언어 모델(LLM) 기반 작업 계획 및 해당 인간 시연이 불필요한 행동, 중복된 탐색 및 논리적 오류로 인해 정책 품질을 저하시킬 수 있다는 문제를 제기합니다. 이를 해결하기 위해, 판단 LLM이 행동 순서를 비판하고 계획 LLM이 수정 사항을 적용하는 반복적 검증 프레임워크를 제안합니다. 이는 점진적으로 더 깨끗하고 공간적으로 일관성 있는 궤적을 생성합니다. 규칙 기반 접근 방식과 달리, 자연어 프롬프팅에 의존하여 무관한 행동, 모순 및 누락된 단계를 포함한 다양한 오류 유형에 대한 광범위한 일반화를 가능하게 합니다. TEACh 구현 AI 데이터 세트의 수동으로 주석이 달린 행동 세트에서, 제안된 프레임워크는 4개의 최첨단 LLM(GPT-4-mini, DeepSeek-R1, Gemini 2.5, LLaMA 4 Scout)에 대해 최대 90%의 재현율과 100%의 정밀도를 달성합니다. 세련화 루프는 빠르게 수렴하며, 96.5%의 시퀀스가 최대 3회의 반복만 필요하며, 시간 효율성과 공간적 행동 구성을 모두 개선합니다. 중요한 것은, 이 방법이 인간의 오류 복구 패턴을 유지하면서 붕괴시키지 않아 강력한 수정 동작에 대한 향후 연구를 지원한다는 점입니다. 공간 계획 및 행동 개선을 위한 신뢰할 수 있는 LLM 기능으로 계획 검증을 확립함으로써, 구현된 AI에서 모방 학습을 위한 고품질 교육 데이터를 확장 가능한 경로를 제공합니다.

시사점, 한계점

•

시사점:

◦

LLM을 이용한 반복적인 계획 검증 프레임워크를 통해 구현된 AI의 작업 계획의 품질을 향상시킬 수 있음을 보여줍니다.

◦

자연어 프롬프팅 기반 접근 방식으로 다양한 유형의 오류에 대한 일반화가 가능합니다.

◦

시간 효율성과 공간적 행동 구성을 개선합니다.

◦

인간의 오류 복구 패턴을 보존하여 강건한 시스템 구축에 기여합니다.

◦

모방 학습을 위한 고품질 교육 데이터 생성에 대한 확장 가능한 방법을 제공합니다.

•

한계점:

◦

제안된 프레임워크의 성능은 사용된 LLM의 성능에 의존적일 수 있습니다.

◦

TEACh 데이터셋에 대한 평가 결과만 제시되어 다른 데이터셋에서의 일반화 성능은 추가 검증이 필요합니다.

◦

복잡한 작업이나 예외적인 상황에 대한 처리 성능은 추가 연구가 필요합니다.

◦

완벽한 오류 제거를 보장하지 않으며, 일부 오류는 여전히 남아있을 수 있습니다.

Made with Slashpage