Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

CTA: Cross-Task Alignment for Better Test Time Training

OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning

What's Making That Sound Right Now? Video-centric Audio-Visual Localization

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization

Domain Generalizable Portrait Style Transfer

StreamDiT: Real-Time Streaming Text-to-Video Generation

From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

Neural-Network solver of ideal MHD equilibria

RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism

Evaluating AI Counseling in Japanese: Counselor, Client, and Evaluator Roles Assessed by Motivational Interviewing Criteria

Hita: Holistic Tokenizer for Autoregressive Image Generation

Empirical Analysis Of Heuristic and Approximation Algorithms for the The Mutual-Visibility Problem

Horus: A Protocol for Trustless Delegation Under Uncertainty

Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-hot Subsurface Understanding

SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

WATS: Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling

IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes

Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager

Enhancing Generalization of Spiking Neural Networks Through Temporal Regularization

Instruction Following by Boosting Attention of Large Language Models

Evaluating Logit-Based GOP Scores for Mispronunciation Detection

LLMs on support of privacy and security of mobile apps: state of the art and research directions

On the Fundamental Impossibility of Hallucination Control in Large Language Models

Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting

CuVSLAM: CUDA accelerated visual odometry and mapping

Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge

An empirical study of task and feature correlations in the reuse of pre-trained models

EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG

Hume: Introducing System-2 Thinking in Visual-Language-Action Model

Towards General Continuous Memory for Vision-Language Models

Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)

Bayesian Hierarchical Invariant Prediction

Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

Enhancing Satellite Object Localization with Dilated Convolutions and Attention-aided Spatial Pooling

Overcoming Data Scarcity in Generative Language Modelling for Low-Resource Languages: A Systematic Review

The GenAI Generation: Student Views of Awareness, Preparedness, and Concern

Variational OOD State Correction for Offline Reinforcement Learning

Heat Diffusion Models - Interpixel Attention Mechanism

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Offline Learning and Forgetting for Reasoning with Large Language Models

Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean Capabilities of Language Models

PVChat: Personalized Video Chat with One-Shot Learning

Challenges and Trends in Egocentric Vision: A Survey

Eyes on the Environment: AI-Driven Analysis for Fire and Smoke Classification, Segmentation, and Detection

Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model

A Survey on Transformer Context Extension: Approaches and Evaluation

Ethical AI for Young Digital Citizens: A Call to Action on Privacy Governance

UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer

The Algorithmic State Architecture (ASA): An Integrated Framework for AI-Enabled Government

A Cascading Cooperative Multi-agent Framework for On-ramp Merging Control Integrating Large Language Models

Zero-shot Medical Event Prediction Using a Generative Pre-trained Transformer on Electronic Health Records

GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

Fundamental Limits of Hierarchical Secure Aggregation with Cyclic User Association

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling

RSPO: Regularized Self-Play Alignment of Large Language Models

Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering

Efficient Risk-sensitive Planning via Entropic Risk Measures

Bayesian Optimization for Controlled Image Editing via LLMs

Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation

Composable Strategy Framework with Integrated Video-Text based Large Language Models for Heart Failure Assessment

Safe Beyond the Horizon: Efficient Sampling-based MPC with Neural Control Barrier Functions

A Theory for Conditional Generative Modeling on Multiple Data Sources

Unsupervised Anomaly Detection through Mass Repulsing Optimal Transport

Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics

DeepCell: Self-Supervised Multiview Fusion for Circuit Representation Learning

VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play

ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding

Holistic Construction Automation with Modular Robots: From High-Level Task Specification to Execution

Aria-UI: Visual Grounding for GUI Instructions

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Pretrained Reversible Generation as Unsupervised Visual Representation Learning

Pre-Training Graph Contrastive Masked Autoencoders are Strong Distillers for EEG

Random Walks with Tweedie: A Unified View of Score-Based Diffusion Models

Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning

Advancing Stroke Risk Prediction Using a Multi-modal Foundation Model

An AI Theory of Mind Will Enhance Our Collective Intelligence

Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle

Longitudinal Ensemble Integration for sequential classification with multimodal data

Improving Trust Estimation in Human-Robot Collaboration Using Beta Reputation at Fine-grained Timescales

Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs

The Nexus of AR/VR, AI, UI/UX, and Robotics Technologies in Enhancing Learning and Social Interaction for Children with Autism Spectrum Disorders: A Systematic Review

What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning

Liability and Insurance for Catastrophic Losses: the Nuclear Power Precedent and Lessons for AI

Insuring Uninsurable Risks from AI: The State as Insurer of Last Resort

Empirical evidence of Large Language Model's influence on human spoken communication

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

Curvature-Aligned Federated Learning (CAFe): Harmonizing Loss Landscapes for Fairness Without Demographics

CoDy: Counterfactual Explainers for Dynamic Graphs

Optimal Transport for Domain Adaptation through Gaussian Mixture Models

Learning Federated Neural Graph Databases for Answering Complex Queries from Distributed Knowledge Graphs

Detecting value-expressive text posts in Russian social media

Deep neural networks have an inbuilt Occam's razor

TT-TFHE: a Torus Fully Homomorphic Encryption-Friendly Neural Network Architecture

SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?

MedGemma Technical Report

Rule Learning for Knowledge Graph Reasoning under Agnostic Distribution Shift

Activation Steering for Chain-of-Thought Compression

Evaluating AI Counseling in Japanese: Counselor, Client, and Evaluator Roles Assessed by Motivational Interviewing Criteria

Created by

Haebom

作者

ケイタキウチ、ヨシカウフジモト、ヒデユキゴト、トモノリホソカワ、マコトニシムラ、ヨウケサト、イズミセザイ

概要

本研究は、日本語治療環境における三つの相談の役割にわたって大規模言語モデル（LLM）の性能を総合的に評価した最初の研究です。カウンセラーのAIシステム（ゼロショットプロンプトまたは構造化多段階会話プロンプト（SMDP）を使用したGPT-4-turbo、Claude-3-Opus-SMDP）、クライアントAIシミュレーション、および評価AIシステム（o3、Claude-3.7-Sonnet、Gemini-2.5-pro）を同時に評価しました。カウンセリング経験豊富な人間専門家（n = 15）は、同期インタビューの完全性（MITI）コーディングマニュアル4.2.1を使用してAIによって作成された会話を評価しました。 SMDPの実装は、ゼロショットプロンプトと比較して、すべてのMITI全体の評価でカウンセラーAIのパフォーマンスを大幅に向上させ、GPT-SMDPとOpus-SMDPの間に有意な違いはありませんでした。評価AIは変化対話促進において人間評価者と同様の性能を示したが、維持対話緩和及び全体的な品質指標を体系的に過大評価した。 Geminiは権力共有を、o3は技術的上手さを、Sonnetは感情表現を優先するなど、モデル別の偏りが現れました。クライアントAIシミュレーションは、感情の範囲が限られており、異常に高いコンプライアンスを示し、現実感を向上させる必要性を示唆しています。これらの結果は、英語以外のAIサポートカウンセリングのベンチマークを確立し、高度なプロンプトエンジニアリング、検索拡張の作成、および目標指向の微調整を介して改善する必要がある重要な分野を提示し、文化的に敏感なAIメンタルヘルスツールの開発に重要な意味を持ちます。

Takeaways、Limitations

•

Takeaways：

◦

日本語治療環境におけるLLMのカウンセリングの役割遂行性能の最初の総合評価を提供

◦

SMDPプロンプト技術がカウンセリングAIパフォーマンスの向上に有効であることを証明しました。

◦

評価AIシステムの活用可能性とその限界（過大評価傾向）提示。

◦

モデル別偏向やクライアントAIシミュレーションの現実感不足など、改善が必要な領域を提示。

◦

文化的に敏感なAIメンタルヘルスツール開発のための重要なTakeaways提示。

•

Limitations：

◦

クライアントAIシミュレーションの感情範囲制限と非現実的で高いコンプライアンス

◦

評価AIの一貫性のない評価結果（特に、メンテナンス対話緩和と全体的な品質過大評価）。

◦

サンプルサイズ（人間専門家15名）の制限。

◦

さまざまなカウンセリングの種類と文化的背景をより包括的に検討する必要性。

Made with Slashpage