Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

Detecting Quishing Attacks with Machine Learning Techniques Through QR Code Analysis

Characterizing LLM-driven Social Network: The Chirper.ai Case

ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition

Compliance of AI Systems

Advancing MAPF Toward the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription

Estimating Commonsense Plausibility through Semantic Shifts

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline

Enhancing Trust in Large Language Models via Uncertainty-Calibrated Fine-Tuning

Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement

A Computational Method for Measuring “Open Codes” in Qualitative Analysis

SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms

SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe

Inertia in Moral and Value Judgments of Large Language Models

ProTrain: Efficient LLM Training via Memory-Aware Techniques

Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment

Machine Unlearning: A Comprehensive Survey

Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach

Integrating Graphs, Large Language Models, and Agents: Reasoning and Retrieval

The World Leaks the Future: Harness Evolution for Future Prediction Agents

From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench

ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints

CAMO: An Agentic Framework for Automated Causal Discovery from Micro Behaviors to Macro Emergence in LLM Agent Simulations

Pushing the Limits of On-Device Streaming ASR: A Compact, High-Accuracy English Model for Low-Latency Inference

QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding

On the Complexity of the Discussion-based Semantics in Abstract Argumentation

From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning

EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding Models

ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks

FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning

Model Space Reasoning as Search in Feedback Space for Planning Domain Generation

Lightweight LLM Agent Memory with Small Language Models

SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents

ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework

ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

ATANT: An Evaluation Framework for AI Continuity

Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

SCMAPR: Self-Correcting Multi-Agent Prompt Refinement for Complex-Scenario Text-to-Video Generation

Memory Intelligence Agent

Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI

Why Agents Compromise Safety Under Pressure

Offline Materials Optimization with CliqueFlowmer

A Model-Free Universal AI

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities

Universal Adversarial Attacks against Closed-Source MLLMs via Target-View Routed Meta Optimization

From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models

Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems

Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning

Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

ACE-Router: Generalizing History-Aware Routing from MCP Tools to the Agent Web

C-World: A Computer Use Agent Environment Creator

Structure-Aware Diversity Pursuit as an AI Safety Strategy against Homogenization

Reinforced Efficient Reasoning via Semantically Diverse Exploration

SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Tape: A Cellular Automata Benchmark for Evaluating Rule-Shift Generalization in Reinforcement Learning

The Illusion of Insight in Reasoning Models

MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation

Subjective functions

Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users

ID-PaS+ : Identity-Aware Predict-and-Search for General Mixed-Integer Linear Programs

SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models

Multimodal Reinforcement Learning with Adaptive Verifier for AI Agents

OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection

The Impact of Off-Policy Training Data on Probe Generalization

MaLoRA: Gated Modality LoRA for Key-Space Alignment in Multimodal LLM Fine-Tuning

Beyond the Failures: Rethinking Foundation Models in Pathology

End-to-end Listen, Look, Speak and Act

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

ContractEval: A Benchmark for Evaluating Contract-Satisfying Assertions in Code Generation

Plug-and-Play Dramaturge: A Divide-and-Conquer Approach for Iterative Narrative Script Refinement via Collaborative LLM Agents

NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving

Large Language Models as Nondeterministic Causal Models

Knowledge-Driven Hallucination in Large Language Models: An Empirical Study on Process Modeling

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

BASIL: Bayesian Assessment of Sycophancy in LLMs

ORThought: Benchmarking and Automating Logistics Optimization Modeling

HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds

NaturalGAIA: A Verifiable Benchmark and Hierarchical Framework for Long-Horizon GUI Tasks

MIRROR: Converging Cognitive Principles as Computational Mechanisms for AI Reasoning

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Sensorimotor Self-Recognition in Multimodal Large Language Model-Driven Robots

SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs’ Mathematical Problem Solving

PRL: Prompts from Reinforcement Learning

Conversational Process Model Redesign

AutoSculpt: A Pattern-based Model Auto-pruning Framework Using Reinforcement Learning and Graph Learning

NumCoKE: Ordinal-Aware Numerical Reasoning over Knowledge Graphs with Mixture-of-Experts and Contrastive Learning

Generative moderated cognition and artificial intelligence. Thing with things

Plasticity Loss in Deep Reinforcement Learning: A Survey

Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Neural Operator: Is all data you need to model the world? An insight into the paradigm of data-driven scientific ML

Understanding Prompt Tuning and In-Context Learning via Meta-Learning

Created by

Haebom

저자

Tim Genewein, Li Kevin Wenliang, Jordi Grau-Moya, Anian Ruoss, Laurent Orseau, Marcus Hutter

개요

사전 훈련된 모델을 대상 작업에 적응시키는 주요 방법 중 하나인 프롬프팅에 대한 연구입니다. 수동 프롬프트 구성 외에도, 다양한 프롬프트 최적화 방법이 제안되었지만, 개념적 이해보다는 경험적 접근에 초점을 맞추었습니다. 본 논문은 베이지안 관점을 통해 최적의 프롬프팅을 이해하는 방법을 논의하고, 이를 통해 가중치 튜닝을 통해서만 극복할 수 있는 프롬프팅의 근본적인 한계를 제시합니다. 메타 훈련된 신경망이 사전 훈련 분포에 대한 베이지안 예측자처럼 동작하며, 이는 빠른 문맥 내 적응을 특징으로 한다는 점을 자세히 설명합니다. 최적의 프롬프팅은 이러한 베이지안 예측자를 조건화하는 것으로 공식적으로 연구될 수 있으며, 이를 통해 최적의 프롬프팅이 가능한 및 불가능한 대상 작업에 대한 기준을 도출합니다. LSTM과 Transformer에 대한 실험을 통해 이론을 뒷받침하고, 서로 다른 버전의 접두사 튜닝과 가중치 튜닝 방법을 비교합니다. 또한, 실제 값을 갖는 벡터 시퀀스인 소프트 접두사가, 하드 토큰으로는 달성할 수 없는 방식으로 활성화를 조작하여, 훈련된 및 훈련되지 않은 네트워크 모두에 매우 효과적인 프롬프트를 생성할 수 있음을 확인합니다. 이는 개념적 베이지안 이론을 넘어 중요한 기계론적 측면을 더합니다.