/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
CTA: Cross-Task Alignment for Better Test Time Training
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
Domain Generalizable Portrait Style Transfer
StreamDiT: Real-Time Streaming Text-to-Video Generation
From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
Neural-Network solver of ideal MHD equilibria
RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
Evaluating AI Counseling in Japanese: Counselor, Client, and Evaluator Roles Assessed by Motivational Interviewing Criteria
Hita: Holistic Tokenizer for Autoregressive Image Generation
Empirical Analysis Of Heuristic and Approximation Algorithms for the The Mutual-Visibility Problem
Horus: A Protocol for Trustless Delegation Under Uncertainty
Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-hot Subsurface Understanding
SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures
WATS: Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes
Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager
Enhancing Generalization of Spiking Neural Networks Through Temporal Regularization
Instruction Following by Boosting Attention of Large Language Models
Evaluating Logit-Based GOP Scores for Mispronunciation Detection
LLMs on support of privacy and security of mobile apps: state of the art and research directions
On the Fundamental Impossibility of Hallucination Control in Large Language Models
Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting
CuVSLAM: CUDA accelerated visual odometry and mapping
Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge
An empirical study of task and feature correlations in the reuse of pre-trained models
EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG
Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Towards General Continuous Memory for Vision-Language Models
Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)
Bayesian Hierarchical Invariant Prediction
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
Enhancing Satellite Object Localization with Dilated Convolutions and Attention-aided Spatial Pooling
Overcoming Data Scarcity in Generative Language Modelling for Low-Resource Languages: A Systematic Review
The GenAI Generation: Student Views of Awareness, Preparedness, and Concern
Variational OOD State Correction for Offline Reinforcement Learning
Heat Diffusion Models - Interpixel Attention Mechanism
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Offline Learning and Forgetting for Reasoning with Large Language Models
Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean Capabilities of Language Models
PVChat: Personalized Video Chat with One-Shot Learning
Challenges and Trends in Egocentric Vision: A Survey
Eyes on the Environment: AI-Driven Analysis for Fire and Smoke Classification, Segmentation, and Detection
Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model
A Survey on Transformer Context Extension: Approaches and Evaluation
Ethical AI for Young Digital Citizens: A Call to Action on Privacy Governance
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
The Algorithmic State Architecture (ASA): An Integrated Framework for AI-Enabled Government
A Cascading Cooperative Multi-agent Framework for On-ramp Merging Control Integrating Large Language Models
Zero-shot Medical Event Prediction Using a Generative Pre-trained Transformer on Electronic Health Records
GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification
Fundamental Limits of Hierarchical Secure Aggregation with Cyclic User Association
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling
RSPO: Regularized Self-Play Alignment of Large Language Models
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
Efficient Risk-sensitive Planning via Entropic Risk Measures
Bayesian Optimization for Controlled Image Editing via LLMs
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Composable Strategy Framework with Integrated Video-Text based Large Language Models for Heart Failure Assessment
Safe Beyond the Horizon: Efficient Sampling-based MPC with Neural Control Barrier Functions
A Theory for Conditional Generative Modeling on Multiple Data Sources
Unsupervised Anomaly Detection through Mass Repulsing Optimal Transport
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics
DeepCell: Self-Supervised Multiview Fusion for Circuit Representation Learning
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play
ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding
Holistic Construction Automation with Modular Robots: From High-Level Task Specification to Execution
Aria-UI: Visual Grounding for GUI Instructions
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Pre-Training Graph Contrastive Masked Autoencoders are Strong Distillers for EEG
Random Walks with Tweedie: A Unified View of Score-Based Diffusion Models
Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning
Advancing Stroke Risk Prediction Using a Multi-modal Foundation Model
An AI Theory of Mind Will Enhance Our Collective Intelligence
Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle
Longitudinal Ensemble Integration for sequential classification with multimodal data
Improving Trust Estimation in Human-Robot Collaboration Using Beta Reputation at Fine-grained Timescales
Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs
The Nexus of AR/VR, AI, UI/UX, and Robotics Technologies in Enhancing Learning and Social Interaction for Children with Autism Spectrum Disorders: A Systematic Review
What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning
Liability and Insurance for Catastrophic Losses: the Nuclear Power Precedent and Lessons for AI
Insuring Uninsurable Risks from AI: The State as Insurer of Last Resort
Empirical evidence of Large Language Model's influence on human spoken communication
The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control
Curvature-Aligned Federated Learning (CAFe): Harmonizing Loss Landscapes for Fairness Without Demographics
CoDy: Counterfactual Explainers for Dynamic Graphs
Optimal Transport for Domain Adaptation through Gaussian Mixture Models
Learning Federated Neural Graph Databases for Answering Complex Queries from Distributed Knowledge Graphs
Detecting value-expressive text posts in Russian social media
Deep neural networks have an inbuilt Occam's razor
TT-TFHE: a Torus Fully Homomorphic Encryption-Friendly Neural Network Architecture
SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?
MedGemma Technical Report
Rule Learning for Knowledge Graph Reasoning under Agnostic Distribution Shift
Activation Steering for Chain-of-Thought Compression
Load more
Evaluating AI Counseling in Japanese: Counselor, Client, and Evaluator Roles Assessed by Motivational Interviewing Criteria
Created by
Haebom
作者
ケイタキウチ、ヨシカウフジモト、ヒデユキゴト、トモノリホソカワ、マコトニシムラ、ヨウケサト、イズミセザイ
概要
本研究は、日本語治療環境における三つの相談の役割にわたって大規模言語モデル(LLM)の性能を総合的に評価した最初の研究です。カウンセラーのAIシステム(ゼロショットプロンプトまたは構造化多段階会話プロンプト(SMDP)を使用したGPT-4-turbo、Claude-3-Opus-SMDP)、クライアントAIシミュレーション、および評価AIシステム(o3、Claude-3.7-Sonnet、Gemini-2.5-pro)を同時に評価しました。カウンセリング経験豊富な人間専門家(n = 15)は、同期インタビューの完全性(MITI)コーディングマニュアル4.2.1を使用してAIによって作成された会話を評価しました。 SMDPの実装は、ゼロショットプロンプトと比較して、すべてのMITI全体の評価でカウンセラーAIのパフォーマンスを大幅に向上させ、GPT-SMDPとOpus-SMDPの間に有意な違いはありませんでした。評価AIは変化対話促進において人間評価者と同様の性能を示したが、維持対話緩和及び全体的な品質指標を体系的に過大評価した。 Geminiは権力共有を、o3は技術的上手さを、Sonnetは感情表現を優先するなど、モデル別の偏りが現れました。クライアントAIシミュレーションは、感情の範囲が限られており、異常に高いコンプライアンスを示し、現実感を向上させる必要性を示唆しています。これらの結果は、英語以外のAIサポートカウンセリングのベンチマークを確立し、高度なプロンプトエンジニアリング、検索拡張の作成、および目標指向の微調整を介して改善する必要がある重要な分野を提示し、文化的に敏感なAIメンタルヘルスツールの開発に重要な意味を持ちます。
Takeaways、Limitations
•
Takeaways:
◦
日本語治療環境におけるLLMのカウンセリングの役割遂行性能の最初の総合評価を提供
◦
SMDPプロンプト技術がカウンセリングAIパフォーマンスの向上に有効であることを証明しました。
◦
評価AIシステムの活用可能性とその限界(過大評価傾向)提示。
◦
モデル別偏向やクライアントAIシミュレーションの現実感不足など、改善が必要な領域を提示。
◦
文化的に敏感なAIメンタルヘルスツール開発のための重要なTakeaways提示。
•
Limitations:
◦
クライアントAIシミュレーションの感情範囲制限と非現実的で高いコンプライアンス
◦
評価AIの一貫性のない評価結果(特に、メンテナンス対話緩和と全体的な品質過大評価)。
◦
サンプルサイズ(人間専門家15名)の制限。
◦
さまざまなカウンセリングの種類と文化的背景をより包括的に検討する必要性。
PDFを見る
Made with Slashpage