/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds
Avoidance Decoding for Diverse Multi-Branch Story Generation
HydroVision: Predicting Optically Active Parameters in Surface Water Using Computer Vision
HodgeFormer: Transformers for Learnable Operators on Triangular Meshes through Data-Driven Hodge Matrices
MSA2-Net: Utilizing Self-Adaptive Convolution Module to Extract Multi-Scale Information in Medical Image Segmentation
Q-Learning-Driven Adaptive Rewiring for Cooperative Control in Heterogeneous Networks
Spotlighter: Revisiting Prompt Tuning from a Representative Mining View
Multimodal Iterative RAG for Knowledge Visual Question Answering
Embodied AI: Emerging Risks and Opportunities for Policy Action
Meta-learning ecological priors from large language models explains human learning and decision making
Scaffold Diffusion: Sparse Multi-Category Voxel Structure Generation with Discrete Diffusion
Locus: Agentic Predicate Synthesis for Directed Fuzzing
Network-Level Prompt and Trait Leakage in Local Research Agents
The Information Dynamics of Generative Diffusion
Arbitrary Precision Printed Ternary Neural Networks with Holistic Evolutionary Approximation
Murakkab: Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms
LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery
MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents
STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports
BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models
Learning to Select MCP Algorithms: From Traditional ML to Dual-Channel GAT-MLP
MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
A DbC Inspired Neurosymbolic Layer for Trustworthy Agent Design
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
LanternNet: A Hub-and-Spoke System to Seek and Suppress Spotted Lanternfly Populations
When and Where do Data Poisons Attack Textual Inversion?
Covering a Few Submodular Constraints and Applications
Rethinking Data Protection in the (Generative) Artificial Intelligence Era
LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling
GroundingDINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
HERCULES: Hierarchical Embedding-based Recursive Clustering Using LLMs for Efficient Summarization
Multimodal Medical Image Binding via Shared Text Embeddings
Open-Set LiDAR Panoptic Segmentation Guided by Uncertainty-Aware Learning
Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity
LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis
A theoretical framework for self-supervised contrastive learning for continuous dependent data
Securing AI Agents with Information-Flow Control
FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation
Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands
Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
When a Reinforcement Learning Agent Encounters Unknown Unknowns
Group-in-Group Policy Optimization for LLM Agent Training
Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer
LawFlow: Collecting and Simulating Lawyers' Thought Processes on Business Formation Case Studies
On Developers' Self-Declaration of AI-Generated Code: An Analysis of Practices
WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada
Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond
HDVIO2.0: Wind and Disturbance Estimation with Hybrid Dynamics VIO
TruthLens: Visual Grounding for Universal DeepFake Reasoning
Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning
Efficiently Editing Mixture-of-Experts Models with Compressed Experts
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Investigating a Model-Agnostic and Imputation-Free Approach for Irregularly-Sampled Multivariate Time-Series Modeling
Rapid Word Learning Through Meta In-Context Learning
FedP$^2$EFT: Federated Learning to Personalize PEFT for Multilingual LLMs
Predict, Cluster, Refine: A Joint Embedding Predictive Self-Supervised Framework for Graph Representation Learning
Survey on Hand Gesture Recognition from Visual Input
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models
RouteNet-Gauss: Hardware-Enhanced Network Modeling with Machine Learning
GalaxAlign: Mimicking Citizen Scientists' Multimodal Guidance for Galaxy Morphology Analysis
Soft-TransFormers for Continual Learning
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Domain Consistency Representation Learning for Lifelong Person Re-Identification
Aligning Machine and Human Visual Representations across Abstraction Levels
Towards Agentic AI on Particle Accelerators
Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Banishing LLM Hallucinations Requires Rethinking Generalization
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
MF-OML: Online Mean-Field Reinforcement Learning with Occupation Measures for Large Population Games
Explainable Machine Learning-Based Security and Privacy Protection Framework for Internet of Medical Things Systems
From Metrics to Meaning: Time to Rethink Evaluation in Human-AI Collaborative Design
P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer
Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs
ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care
L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
AHELM: A Holistic Evaluation of Audio-Language Models
The Ramon Llull's Thinking Machine for Automated Ideation
Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning
KIRETT: Knowledge-Graph-Based Smart Treatment Assistant for Intelligent Rescue Operations
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Integrating Activity Predictions in Knowledge Graphs
Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks
ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP
Deep Research Agents: A Systematic Examination And Roadmap
Gradients: When Markets Meet Fine-tuning - A Distributed Approach to Model Optimisation
ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research
Shutdownable Agents through POST-Agency
CyberBOT: Towards Reliable Cybersecurity Education via Ontology-Grounded Retrieval Augmented Generation
PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation
Can Large Language Models Act as Ensembler for Multi-GNNs?
MorphAgent: Empowering Agents through Self-Evolving Profiles and Decentralized Collaboration
Frugal inference for control
On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios
A Survey on Human-AI Collaboration with Large Foundation Models
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Load more
Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?
Created by
Haebom
作者
Zhiting Mei, Christina Zhang, Tenny Yin, Justin Lidard, Ola Shorinwa, Anirudha Majumdar
概要
この論文は、強化学習を通じて多段階推論を可能にした推論言語モデルが多くのベンチマークで最先端の性能を達成しましたが、既存の言語モデルと同様に誤った答えを自信を持って提示する幻覚現象が発生することに注目しています。推論モデルの補正の有無、より深い推論がモデル補正に及ぼす影響、および推論プロセスを明示的に推論して補正を改善できるかどうかについての3つの質問に答えたいと考えています。モデルは一般的に過信しており、特に誤った応答に対して自己言語化された信頼度推定値が85%を超える場合が多く、より深い推論を通じて過信が深化するが、自己省察を通じて補正が改善される場合もあることを発見しました。
Takeaways、Limitations
•
Takeaways:
◦
推論モデルの過信問題とその重大性を明らかにする。
◦
自己性察的不確実性定量化(Introspective UQ)によるモデル補正の改善の可能性の提示
◦
推論モデルの信頼性を向上させるためのUQベンチマーク開発の重要性を強調した。
•
Limitations:
◦
自己洞察による補正の改善はすべてのモデルに適用されない(一貫性の欠如).
◦
より包括的で厳格なUQベンチマーク開発の必要性。
PDFを見る
Made with Slashpage