Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization

Towards Provable (In)Secure Model Weight Release Schemes

Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance

IndieFake Dataset: A Benchmark Dataset for Audio Deepfake Detection

These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining

In-Context Learning Strategies Emerge Rationally

Fake it till You Make it: Reward Modeling as Discriminative Prediction

Semantic Preprocessing for LLM-based Malware Analysis

PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling

TracLLM: A Generic Framework for Attributing Long Context LLMs

TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation

Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data

Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations

Thinkless: LLM Learns When to Think

A3 : an Analytical Low-Rank Approximation Framework for Attention

Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

AI-Driven Sentiment Analytics: Unlocking Business Value in the E-Commerce Landscape

Towards Adaptive Memory-Based Optimization for Enhanced Retrieval-Augmented Generation

AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference

Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective

Revealing higher-order neural representations of uncertainty with the Noise Estimation through Reinforcement-based Diffusion (NERD) model

Zero-TIG: Temporal Consistency-Aware Zero-Shot Illumination-Guided Low-light Video Enhancement

PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks

CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent

DisCoPatch: Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection

Materialist: Physically Based Editing Using Single-Image Inverse Rendering

Representation Learning of Lab Values via Masked AutoEncoders

Lagrangian Index Policy for Restless Bandits with Average Reward

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models

Pretrained Reversible Generation as Unsupervised Visual Representation Learning

MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

ToolScan: A Benchmark for Characterizing Errors in Tool-Use LLMs

Recall and Refine: A Simple but Effective Source-free Open-set Domain Adaptation Framework

InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction

Prompting with Phonemes: Enhancing LLMs' Multilinguality for Non-Latin Script Languages

Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery

Rapid Gyroscope Calibration: A Deep Learning Approach

HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics

A GREAT Architecture for Edge-Based Graph Problems Like TSP

ClimateIQA: A New Dataset and Benchmark to Advance Vision-Language Models in Meteorology Anomalies Analysis

MockLLM: A Multi-Agent Behavior Collaboration Framework for Online Job Seeking and Recruiting

Is my Data in your AI Model? Membership Inference Test with Application to Face Images

PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks

Continual Learning as Computationally Constrained Reinforcement Learning

Efficient Image Generation with Variadic Attention Heads

Smart Ride and Delivery Services with Electric Vehicles: Leveraging Bidirectional Charging for Profit Optimisation

From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers

Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities

Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown

Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues

Doppelganger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack

Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning

Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning

NFISiS: New Perspectives on Fuzzy Inference Systems for Renewable Energy Forecasting

The State of Large Language Models for African Languages: Progress and Challenges

Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance

Super Co-alignment for Sustainable Symbiotic Society

Improving Human-AI Coordination through Online Adversarial Training and Generative Models

WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis

Review learning: Real world validation of privacy preserving continual learning across medical institutions

Whole-Body Conditioned Egocentric Video Prediction

mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale

HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

WorldVLA: Towards Autoregressive Action World Model

"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets

Potemkin Understanding in Large Language Models

skLEP: A Slovak General Language Understanding Benchmark

Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems

TITAN: Query-Token based Domain Adaptive Adversarial Learning

SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture

Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage

Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection

Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference

Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation

Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection

Pay Attention to Small Weights

Real-time and personalized product recommendations for large e-commerce platforms

rQdia: Regularizing Q-Value Distributions With Image Augmentation

CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection

A Systematic Review of Human-AI Co-Creativity

Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models

On Uniform Weighted Deep Polynomial approximation

Exploring Adapter Design Tradeoffs for Low Resource Music Generation

Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models

Small Encoders Can Rival Large Decoders in Detecting Groundedness

Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution

Integrating Vehicle Acoustic Data for Enhanced Urban Traffic Management: A Study on Speed Classification in Suzhou

DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster

Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents

From On-chain to Macro: Assessing the Importance of Data Source Diversity in Cryptocurrency Market Forecasting

$T^3$: Multi-level Tree-based Automatic Program Repair with Large Language Models

BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models

Task-Aware KV Compression For Cost-Effective Long Video Understanding

Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues

Created by

Haebom

저자

Myke C. Cohen, Zhe Su, Hsien-Te Kao, Daniel Nguyen, Spencer Lynch, Maarten Sap, Svitlana Volkova

개요

본 논문은 임무 수행에 중요한 협상 상황에서 작용자 AI 시스템을 위한 평가 프레임워크를 제시합니다. 다양한 인간 운영자와 이해 관계자에 적응할 수 있는 AI 에이전트의 필요성을 다룹니다. Sotopia 시뮬레이션 환경을 사용하여, 두 가지 실험을 통해 성격 특성과 AI 에이전트 특성이 LLM으로 시뮬레이션된 사회적 협상 결과에 어떻게 영향을 미치는지 체계적으로 평가합니다. 이는 팀 간 조정 및 민군 상호 작용을 포함한 다양한 응용 분야에 필수적인 기능입니다. 실험 1에서는 인과적 발견 방법을 사용하여 성격 특성이 가격 협상에 미치는 영향을 측정하여, 친화성과 외향성이 신뢰성, 목표 달성 및 지식 획득 결과에 상당한 영향을 미친다는 것을 발견했습니다. 팀 커뮤니케이션에서 추출한 사회인지 어휘 측정을 통해 에이전트의 공감적 의사소통, 도덕적 기반 및 의견 패턴의 미묘한 차이를 감지하여, 고위험 운영 시나리오에서 안정적으로 작동해야 하는 작용자 AI 시스템에 대한 실행 가능한 통찰력을 제공합니다. 실험 2에서는 시뮬레이션된 인간의 성격과 AI 시스템 특성(특히 투명성, 역량, 적응성)을 조작하여 인간-AI 직무 협상을 평가하여 AI 에이전트의 신뢰성이 임무 효율성에 미치는 영향을 보여줍니다. 이러한 결과는 다양한 운영자의 성격과 인간-에이전트 팀 역학에 걸쳐 AI 에이전트의 신뢰성을 실험하기 위한 반복 가능한 평가 방법론을 확립하여 신뢰할 수 있는 AI 시스템에 대한 운영 요구 사항을 직접 지원합니다. 본 연구는 표준 성과 지표를 넘어 복잡한 운영에서 임무 성공에 필수적인 사회적 역동성을 통합함으로써 작용자 AI 워크플로의 평가를 발전시킵니다.

시사점, 한계점

•

시사점:

◦

임무 중요도가 높은 협상 상황에서 작용자 AI 시스템의 신뢰성을 평가하기 위한 반복 가능한 프레임워크 제시.

◦

성격 특성(친화성, 외향성)과 AI 에이전트 특성(투명성, 역량, 적응성)이 협상 결과에 미치는 영향에 대한 실증적 증거 제시.

◦

사회인지 어휘 측정을 통해 에이전트의 공감적 의사소통, 도덕적 기반 및 의견 패턴 분석 가능성 제시.

◦

표준 성과 지표를 넘어 사회적 역동성을 고려한 AI 시스템 평가의 중요성 강조.

•

한계점:

◦

Sotopia 시뮬레이션 환경을 사용하여 실험을 진행했으므로, 실제 세계 상황으로의 일반화 가능성에 대한 추가 연구 필요.

◦

특정 성격 특성과 AI 에이전트 특성만을 고려했으므로, 다른 요인들의 영향에 대한 추가 연구 필요.

◦

LLM을 사용하여 인간을 시뮬레이션했으므로, 실제 인간의 복잡성을 완전히 반영하지 못할 가능성 존재.

Made with Slashpage