Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Maximizing Confidence Alone Improves Reasoning

Pre-training for Recommendation Unlearning

FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control

On the performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models

Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems

Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems

Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning

SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting

Skywork Open Reasoner 1 Technical Report

Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design

DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling

Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement

VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining

Hume: Introducing System-2 Thinking in Visual-Language-Action Model

DeSocial: Blockchain-based Decentralized Social Networks

Subgroups Matter for Robust Bias Mitigation

Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection

Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language Models

The challenge of hidden gifts in multi-agent reinforcement learning

Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models

Risk-aware Direct Preference Optimization under Nested Risk Measure

Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network

A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control

BroadGen: A Framework for Generating Effective and Efficient Advertiser Broad Match Keyphrase Recommendations

Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects

How We Won the ISLES'24 Challenge by Preprocessing

SP2RINT: Spatially-Decoupled Physics-Inspired Progressive Inverse Optimization for Scalable, PDE-Constrained Meta-Optical Neural Network Training

Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek

Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in Infographics

EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection

Edge-First Language Model Inference: Models, Metrics, and Tradeoffs

Smaller, Smarter, Closer: The Edge of Collaborative Generative AI

YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering

DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models

Articulatory Feature Prediction from Surface EMG during Speech Production

Exploring Spatiotemporal Emotional Synchrony in Dyadic Interactions: The Role of Speech Conditions in Facial and Vocal Affective Alignment

LEXam: Benchmarking Legal Reasoning on 340 Law Exams

Neural Networks as Universal Finite-State Machines: A Constructive Deterministic Finite Automaton Theory

The Geometry of ReLU Networks through the ReLU Transition Graph

RepCali: High Efficient Fine-tuning Via Representation Calibration in Latent Space for Pre-trained Language Models

Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage

Multimodal Survival Modeling in the Age of Foundation Models

Burger: Robust Graph Denoising-augmentation Fusion and Multi-semantic Modeling in Social Recommendation

The Aloe Family Recipe for Open and Specialized Healthcare LLMs

To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay

LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios

Learning to Reason under Off-Policy Guidance

A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization

Error Broadcast and Decorrelation as a Potential Artificial and Natural Learning Mechanism

Carbon-Efficient 3D DNN Acceleration: Optimizing Performance and Sustainability

Agentic Knowledgeable Self-awareness

Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?

MiZero: The Shadowy Defender Against Text Style Infringements

Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions

LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty

Temporal Relation Extraction in Clinical Texts: A Span-based Graph Transformer Approach

LEAVS: An LLM-based Labeler for Abdominal CT Supervision

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

Enhancing Retrieval for ESGLLM via ESG-CID -- A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS

DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation

Aligning Text to Image in Diffusion Models is Easier Than You Think

Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference

BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction

Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts

ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance

Audio Visual Segmentation Through Text Embeddings

Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services

Learning to Reason from Feedback at Test-Time

ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation

STeCa: Step-level Trajectory Calibration for LLM Agent Learning

GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning

Enhancing Semi-supervised Learning with Zero-shot Pseudolabels

DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing

Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages

Jailbreaking to Jailbreak

Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning

Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

Toward universal steering and monitoring of AI models

SPRI: Aligning Large Language Models with Context-Situated Principles

MedRAX: Medical Reasoning Agent for Chest X-ray

Adaptive Exploration for Multi-Reward Multi-Policy Evaluation

Wake-Informed 3D Path Planning for Autonomous Underwater Vehicles Using A* and Neural Network Approximations

Fast Large Language Model Collaborative Decoding via Speculation

A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers

Joint Localization and Activation Editing for Low-Resource Fine-Tuning

KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search

Towards Unified Attribution in Explainable AI, Data-Centric AI, and Mechanistic Interpretability

Chain of Grounded Objectives: Bridging Process and Goal-oriented Prompting for Code Generation

Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms

Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment

Tensor Product Attention Is All You Need

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Exploring Spatiotemporal Emotional Synchrony in Dyadic Interactions: The Role of Speech Conditions in Facial and Vocal Affective Alignment

Created by

Haebom

저자

Von Ralph Dane Marquez Herbuela, Yukie Nagai

개요

본 논문은 얼굴 표정과 음성을 포함한 여러 통신 채널에서 인간이 감정을 표현하고 동기화하는 방법에 대한 이해가 감정 인식 시스템과 인간-컴퓨터 상호 작용에 중요한 의미를 갖는다는 점을 다룹니다. 중첩되지 않은 음성이 더 명확한 감정 조율을 촉진하고, 중첩된 음성은 동기화를 방해한다는 개념에 착안하여, 본 연구는 이러한 대화 역학이 얼굴과 음성 양식에서 각성과 valence의 공간적 및 시간적 정렬에 어떻게 영향을 미치는지 조사합니다. IEMOCAP 데이터셋의 2인 상호 작용을 사용하여 EmoNet(얼굴 영상)과 Wav2Vec2 기반 모델(음성 오디오)을 통해 연속적인 감정 추정치를 추출했습니다. 음성 중첩을 기준으로 구간을 분류하고, 피어슨 상관관계, 지연 조정 분석 및 동적 시간 왜곡(DTW)을 사용하여 감정 정렬을 평가했습니다. 분석 결과, 중첩되지 않은 음성은 중첩된 음성보다 더 안정적이고 예측 가능한 감정 동기화와 관련이 있었습니다. 영 지연 상관 관계는 낮았고 통계적으로 유의미한 차이가 없었지만, 중첩되지 않은 음성은 특히 각성에 대해 변동성이 감소했습니다. 지연 조정 상관 관계와 최적 지연 분포는 이러한 구간에서 더 명확하고 일관된 시간적 정렬을 보여주었습니다. 반대로, 중첩된 음성은 더 높은 변동성과 더 평평한 지연 프로파일을 나타냈지만, DTW는 예상치 못하게 더 긴밀한 정렬을 나타내어 구별되는 조정 전략을 시사했습니다. 특히, 방향성 패턴은 차례를 기다리는 동안 얼굴 표정이 음성보다 앞서는 경우가 더 많았고, 동시에 발성하는 동안에는 음성이 앞서는 것을 보여주었습니다. 이러한 결과는 대화 구조가 감정 전달을 조절하는 데 중요함을 강조하며, 실제 상호 작용에서 다중 모드 정서적 정렬의 공간적 및 시간적 역학에 대한 새로운 통찰력을 제공합니다.

시사점, 한계점

•

시사점:

◦

대화 구조(중첩된 음성 vs. 중첩되지 않은 음성)가 감정적 동기화에 상당한 영향을 미친다는 것을 밝힘.

◦

중첩되지 않은 음성은 더 안정적이고 예측 가능한 감정 동기화와 관련이 있음.

◦

얼굴 표정과 음성 간의 시간적 정렬 패턴(선행/후행)이 대화 구조에 따라 다름을 규명.

◦

실제 상호 작용에서 다중 모드 감정 정렬의 공간적 및 시간적 역학에 대한 새로운 이해 제공.

•

한계점:

◦

IEMOCAP 데이터셋에만 의존하여 일반화 가능성에 제한이 있을 수 있음.

◦

감정 추정에 사용된 EmoNet과 Wav2Vec2 모델의 정확도에 따라 결과의 신뢰도가 영향을 받을 수 있음.

◦

분석에 사용된 지표(Pearson 상관관계, DTW 등)의 한계로 인해 감정 동기화의 복잡한 측면을 완전히 포착하지 못할 수 있음.

◦

더 다양한 대화 유형과 참가자를 포함한 추가 연구가 필요함.

Made with Slashpage