/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization
Towards Provable (In)Secure Model Weight Release Schemes
Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance
IndieFake Dataset: A Benchmark Dataset for Audio Deepfake Detection
These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining
In-Context Learning Strategies Emerge Rationally
Fake it till You Make it: Reward Modeling as Discriminative Prediction
Semantic Preprocessing for LLM-based Malware Analysis
PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling
TracLLM: A Generic Framework for Attributing Long Context LLMs
TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation
Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data
Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations
Thinkless: LLM Learns When to Think
A3: an Analytical Low-Rank Approximation Framework for Attention
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling
AI-Driven Sentiment Analytics: Unlocking Business Value in the E-Commerce Landscape
Towards Adaptive Memory-Based Optimization for Enhanced Retrieval-Augmented Generation
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective
Revealing higher-order neural representations of uncertainty with the Noise Estimation through Reinforcement-based Diffusion (NERD) model
Zero-TIG: Temporal Consistency-Aware Zero-Shot Illumination-Guided Low-light Video Enhancement
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent
DisCoPatch: Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection
Materialist: Physically Based Editing Using Single-Image Inverse Rendering
Representation Learning of Lab Values via Masked AutoEncoders
Lagrangian Index Policy for Restless Bandits with Average Reward
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
ToolScan: A Benchmark for Characterizing Errors in Tool-Use LLMs
Recall and Refine: A Simple but Effective Source-free Open-set Domain Adaptation Framework
InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction
Prompting with Phonemes: Enhancing LLMs' Multilinguality for Non-Latin Script Languages
Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery
Rapid Gyroscope Calibration: A Deep Learning Approach
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
A GREAT Architecture for Edge-Based Graph Problems Like TSP
ClimateIQA: A New Dataset and Benchmark to Advance Vision-Language Models in Meteorology Anomalies Analysis
MockLLM: A Multi-Agent Behavior Collaboration Framework for Online Job Seeking and Recruiting
Is my Data in your AI Model? Membership Inference Test with Application to Face Images
PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks
Continual Learning as Computationally Constrained Reinforcement Learning
Efficient Image Generation with Variadic Attention Heads
Smart Ride and Delivery Services with Electric Vehicles: Leveraging Bidirectional Charging for Profit Optimisation
From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers
Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities
Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown
Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues
Doppelganger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning
NFISiS: New Perspectives on Fuzzy Inference Systems for Renewable Energy Forecasting
The State of Large Language Models for African Languages: Progress and Challenges
Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance
Super Co-alignment for Sustainable Symbiotic Society
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Review learning: Real world validation of privacy preserving continual learning across medical institutions
Whole-Body Conditioned Egocentric Video Prediction
MTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale
HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation
WorldVLA: Towards Autoregressive Action World Model
"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
Potemkin Understanding in Large Language Models
SkLEP: A Slovak General Language Understanding Benchmark
Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems
TITAN: Query-Token based Domain Adaptive Adversarial Learning
SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture
Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage
Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection
Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference
Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation
Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection
Pay Attention to Small Weights
Real-time and personalized product recommendations for large e-commerce platforms
RQdia: Regularizing Q-Value Distributions With Image Augmentation
CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection
A Systematic Review of Human-AI Co-Creativity
Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models
On Uniform Weighted Deep Polynomial approximation
Exploring Adapter Design Tradeoffs for Low Resource Music Generation
Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models
Small Encoders Can Rival Large Decoders in Detecting Groundedness
Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution
Integrating Vehicle Acoustic Data for Enhanced Urban Traffic Management: A Study on Speed Classification in Suzhou
DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster
Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents
From On-chain to Macro: Assessing the Importance of Data Source Diversity in Cryptocurrency Market Forecasting
$T^3$: Multi-level Tree-based Automatic Program Repair with Large Language Models
BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models
Task-Aware KV Compression For Cost-Effective Long Video Understanding
Load more
Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues
Created by
Haebom
作者
Myke C. Cohen, Zhe Su, Hsien-Te Kao, Daniel Nguyen, Spencer Lynch, Maarten Sap, Svitlana Volkova
概要
本論文は、ミッションを実施する上で重要な交渉の状況におけるアクチュエータAIシステムのための評価フレームワークを提示します。さまざまな人間オペレーターとステークホルダーに適応できるAIエージェントの必要性について説明します。 Sotopiaシミュレーション環境を使用して、2つの実験で、性格特性とAIエージェント特性がLLMでシミュレートされた社会的交渉の結果にどのように影響するかを体系的に評価します。これは、チーム間の調整や民間の対話など、さまざまなアプリケーションに不可欠な機能です。実験1では、因果的発見法を用いて、性格特性が価格交渉に及ぼす影響を測定し、親和性と外向性が信頼性、目標達成及び知識獲得結果に大きな影響を及ぼすことを見出した。チームコミュニケーションから抽出された社会認知語彙測定により、エージェントの共感的コミュニケーション、道徳的基盤、意見パターンの微妙な違いを検出し、高リスクの運用シナリオで安定して動作する必要があるアクチュエータAIシステムへの実行可能な洞察を提供します。実験2は、シミュレートされた人間の性格とAIシステムの特性(特に透明性、能力、適応性)を操作して人間とAIの職務交渉を評価し、AIエージェントの信頼性がミッション効率に与える影響を示しています。これらの結果は、さまざまなオペレーターの性格と人間エージェントチームのダイナミクスにわたってAIエージェントの信頼性を実験するための反復可能な評価方法論を確立し、信頼できるAIシステムの運用要件を直接サポートします。この研究は、標準的なパフォーマンス指標を超えて複雑な運用におけるミッションの成功に不可欠な社会的ダイナミクスを統合することによって、Action AIワークフローの評価を進めます。
Takeaways、Limitations
•
Takeaways:
◦
ミッションクリティカルな交渉状況でのオペレータAIシステムの信頼性を評価するための繰り返し可能なフレームワークの提示
◦
人格特性(親和性、外向性)とAIエージェント特性(透明性、能力、適応性)が交渉結果に与える影響の実証的証拠の提示。
◦
社会認知語彙測定を通じて、エージェントの共感的コミュニケーション、道徳的基盤、意見パターン分析の可能性を提示します。
◦
標準性能指標を超えた社会的ダイナミクスを考慮したAIシステム評価の重要性の強調
•
Limitations:
◦
ソトピアシミュレーション環境を使って実験を進めたので、実際の世界状況への一般化の可能性に関するさらなる研究が必要です。
◦
特定の性格特性とAIエージェント特性のみを考慮したので、他の要因の影響に関するさらなる研究が必要です。
◦
LLMを使って人間をシミュレートしたので、実際の人間の複雑さを完全に反映できない可能性存在。
PDFを見る
Made with Slashpage