/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Dynaword: From One-shot to Continuously Developed Datasets
Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor
Proof2Hybrid: Automatic Mathematical Benchmark Synthesis for Proof-Centric Problems
Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy
BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability
SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy
Managing Escalation in Off-the-Shelf Large Language Models
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models
A Foundational Schema.org Mapping for a Legal Knowledge Graph: Representing Brazilian Legal Norms as FRBR Works
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity
Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints
MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation
Memorization in Fine-Tuned Large Language Models
From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation
The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?
Post-Completion Learning for Language Models
Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content
Equivariant Volumetric Grasping
SemiSegECG: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation
FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation
TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models
Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs
$\Texttt{Droid}$: A Resource Suite for AI-Generated Code Detection
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
Principled Foundations for Preference Optimization
Evaluating LLMs on Real-World Forecasting Against Expert Forecasters
STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking
S2FGL: Spatial Spectral Federated Graph Learning
AI4Research: A Survey of Artificial Intelligence for Scientific Research
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation
Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints
Causally Steered Diffusion for Automated Video Counterfactual Generation
What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
ProRefine: Inference-Time Prompt Refinement with Textual Feedback
SALAD: Systematic Assessment of Machine Unlearning on LLM-Aided Hardware Design
MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
LightRetriever: A LLM-based Hybrid Retrieval Architecture with 1000x Faster Query Inference
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
All-optical temporal integration mediated by subwavelength heat antennas
GRILL: Gradient Signal Restoration in Ill-Conditioned Layers to Enhance Adversarial Attacks on Autoencoders
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
FFCBA: Feature-based Full-target Clean-label Backdoor Attacks
Multilingual Performance Biases of Large Language Models in Education
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis
Efficient Generative Model Training via Embedded Representation Warmup
Graph Attention-Driven Bayesian Deep Unrolling for Dual-Peak Single-Photon Lidar Imaging
Spectral Architecture Search for Neural Network Models
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Augmented Adversarial Trigger Learning
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs
A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness
PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping
Entropy-Lens: The Information Signature of Transformer Computations
CAMEF: Causal-Augmented Multi-Modality Event-Driven Financial Forecasting by Integrating Time Series Patterns and Salient Macroeconomic Announcements
Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach
AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Multilingual Chain-of-Thought
AI-driven Wireless Positioning: Fundamentals, Standards, State-of-the-art, and Challenges
CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models
Average-Reward Soft Actor-Critic
Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation
From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
SANDWICH: Towards an Offline, Differentiable, Fully-Trainable Wireless Neural Ray-Tracing Surrogate
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification
Effective AGM Belief Contraction: A Journey beyond the Finitary Realm (Technical Report)
Beyond Images: Adaptive Fusion of Visual and Textual Data for Food Classification
TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks
KCR: Resolving Long-Context Knowledge Conflicts via Reasoning in LLMs
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
CADDesigner: Conceptual Design of CAD Models Based on General-Purpose Agent
Mind the Gap: The Divergence Between Human and LLM-Generated Tasks
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power
Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis
The AlphaPhysics Term Rewriting System for Marking Algebraic Expressions in Physics Exams
Modeling Deontic Modal Logic in the s(CASP) Goal-directed Predicate Answer Set Programming System
Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study
The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning
Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments
Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory
UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
Load more
Do Large Language Models Know How Much They Know?
Created by
Haebom
作者
Gabriele Prato, Jerry Huang, Prasanna Parthasarathi, Shagun Sodhani, Sarath Chandar
概要
この論文は、大規模言語モデル(LLM)の自己知識認識能力を評価するためのベンチマークを提供します。特定のトピックに関するLLMの知識範囲を把握する能力を評価するために、過剰な情報、不足、正確な量の情報を思い出しているかどうかを分析します。様々なアーキテクチャのLLMを対象に実験した結果、十分な規模のLLMは、自分が特定のトピックについてどれだけ知っているかを理解する能力を示すことがわかりました。しかし、この能力の出現速度はアーキテクチャによって異なり、さらなる研究を通じてこれらの可能性を確認し、基礎メカニズムを完全に明らかにする必要があります。
Takeaways、Limitations
•
Takeaways:
◦
大規模な言語モデルが自分の知識の範囲を認識する能力を持つことを示唆しています。
◦
LLMの自己知識認識能力はモデルの規模とアーキテクチャによって異なって現れることを示した。
◦
LLMの知能レベルを評価する新しい指標を提供します。
•
Limitations:
◦
この研究で提示されたベンチマークの一般化の可能性に関するさらなる研究が必要です。
◦
LLMの自己知識認識能力の基礎メカニズムのさらなる研究が必要である。
◦
様々な種類のLLMとより広範なトピックの研究が必要です。
PDFを見る
Made with Slashpage