Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Dynaword: From One-shot to Continuously Developed Datasets

Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor

Proof2Hybrid: Automatic Mathematical Benchmark Synthesis for Proof-Centric Problems

Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy

BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability

SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy

Managing Escalation in Off-the-Shelf Large Language Models

FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models

A Foundational Schema.org Mapping for a Legal Knowledge Graph: Representing Brazilian Legal Norms as FRBR Works

D3: Training-Free AI-Generated Video Detection Using Second-Order Features

SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity

Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Memorization in Fine-Tuned Large Language Models

From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation

The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?

Post-Completion Learning for Language Models

Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content

Equivariant Volumetric Grasping

SemiSegECG: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation

FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices

Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark

Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation

TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models

Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs

$\Texttt{Droid}$: A Resource Suite for AI-Generated Code Detection

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Principled Foundations for Preference Optimization

Evaluating LLMs on Real-World Forecasting Against Expert Forecasters

STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking

S2FGL: Spatial Spectral Federated Graph Learning

AI4Research: A Survey of Artificial Intelligence for Scientific Research

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation

Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints

Causally Steered Diffusion for Automated Video Counterfactual Generation

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

ProRefine: Inference-Time Prompt Refinement with Textual Feedback

SALAD: Systematic Assessment of Machine Unlearning on LLM-Aided Hardware Design

MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering

Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

LightRetriever: A LLM-based Hybrid Retrieval Architecture with 1000x Faster Query Inference

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

All-optical temporal integration mediated by subwavelength heat antennas

GRILL: Gradient Signal Restoration in Ill-Conditioned Layers to Enhance Adversarial Attacks on Autoencoders

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

FFCBA: Feature-based Full-target Clean-label Backdoor Attacks

Multilingual Performance Biases of Large Language Models in Education

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis

Efficient Generative Model Training via Embedded Representation Warmup

Graph Attention-Driven Bayesian Deep Unrolling for Dual-Peak Single-Photon Lidar Imaging

Spectral Architecture Search for Neural Network Models

Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

Augmented Adversarial Trigger Learning

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs

A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness

PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset

DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping

Entropy-Lens: The Information Signature of Transformer Computations

CAMEF: Causal-Augmented Multi-Modality Event-Driven Financial Forecasting by Integrating Time Series Patterns and Salient Macroeconomic Announcements

Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach

AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Multilingual Chain-of-Thought

AI-driven Wireless Positioning: Fundamentals, Standards, State-of-the-art, and Challenges

CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models

Average-Reward Soft Actor-Critic

Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation

From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

SANDWICH: Towards an Offline, Differentiable, Fully-Trainable Wireless Neural Ray-Tracing Surrogate

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

Effective AGM Belief Contraction: A Journey beyond the Finitary Realm (Technical Report)

Beyond Images: Adaptive Fusion of Visual and Textual Data for Food Classification

TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks

KCR: Resolving Long-Context Knowledge Conflicts via Reasoning in LLMs

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

CADDesigner: Conceptual Design of CAD Models Based on General-Purpose Agent

Mind the Gap: The Divergence Between Human and LLM-Generated Tasks

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis

The AlphaPhysics Term Rewriting System for Marking Algebraic Expressions in Physics Exams

Modeling Deontic Modal Logic in the s(CASP) Goal-directed Predicate Answer Set Programming System

Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study

The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning

Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments

Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory

UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization

Prompting Science Report 3: I'll pay you or I'll kill you -- but will you care?

Created by

Haebom

作者

Lennart Meincke, Ethan Mollick, Lilach Mollick, Dan Shapiro

概要

この論文は、AIモデルとの相互作用の技術的詳細を厳密なテストで理解するのを助ける一連のフラグメントレポートの3番目のレポートです。このレポートでは、AI のパフォーマンスを向上させるためによく使用される 2 つの方法である「AI モデルにヒントを提供すること」と「AI モデルを脅かすこと」に対する効果を調査します. GPQAとMMLU-Proのベンチマークを使用して実験を行った結果、モデルに脅威やヒントを提供することは、ベンチマークのパフォーマンスに大きな影響を与えないことを示しています。しかし、質問ごとに、プロンプトのバリエーションがパフォーマンスに大きな影響を与える可能性がありますが、特定のプロンプト方式が特定の質問に役立つかどうかを事前に知ることは困難です。したがって、特に困難な問題の場合、単純なプロンプトバリアントは、前に想定したのと同じくらい効果的ではない可能性があることを示唆している。

Takeaways、Limitations

•

Takeaways： AIモデルへのヒントの提供や脅威は、ベンチマークのパフォーマンスに大きな影響を及ぼさないことを実証的に確認しました。

•

Limitations：特定の質問に対するプロンプトの影響を予測するのが難しいことは限界として指摘されています。この研究は特定のベンチマークとモデルに限定された結果であり、他のベンチマークやモデルの一般化には注意が必要です。

Made with Slashpage