Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization

Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models

Multi-modal Relational Item Representation Learning for Inferring Substitutable and Complementary Items

SourceSplice: Source Selection for Machine Learning Tasks

OneShield -- the Next Generation of LLM Guardrails

RecPS: Privacy Risk Scoring for Recommender Systems

HuiduRep: A Robust Self-Supervised Framework for Learning Neural Representations from Extracellular Recordings

Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain

Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback

A Segmented Robot Grasping Perception Neural Network for Edge AI

Binarizing Physics-Inspired GNNs for Combinatorial Optimization

Disentangling Neural Disjunctive Normal Form Models

The Second Machine Turn: From Checking Proofs to Creating Concepts

EmissionNet: Air Quality Pollution Forecasting for Agriculture

Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations

Evaluating LLMs on Real-World Forecasting Against Human Superforecasters

Sign Spotting Disambiguation using Large Language Models

RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism

Discovering the underlying analytic structure within Standard Model constants using artificial intelligence

MR-CLIP: Efficient Metadata-Guided Learning of MRI Contrast Representations

Curious Causality-Seeking Agents Learn Meta Causal World

Theoretically Unmasking Inference Attacks Against LDP-Protected Clients in Federated Vision Models

Private GPTs for LLM-driven testing in software development and machine learning

DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models

AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora

Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs

HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

Are Sparse Autoencoders Useful for Java Function Bug Detection?

Credible Plan-Driven RAG Method for Multi-Hop Question Answering

Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories

E2E Parking Dataset: An Open Benchmark for End-to-End Autonomous Parking

Dominated Actions in Imperfect-Information Games

FakeIDet: Exploring Patches for Privacy-Preserving Fake ID Detection

Simultaneous Motion And Noise Estimation with Event Cameras

Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation

Novice Developers' Perspectives on Adopting LLMs for Software Development: A Systematic Literature Review

ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning

A Survey on Post-training of Large Language Models

Do Large Language Models Know How Much They Know?

Better Embeddings with Coupled Adam

Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks

An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage

Embracing Large Language Models in Traffic Flow Forecasting

A Large Sensor Foundation Model Pretrained on Continuous Glucose Monitor Data for Diabetes Management

FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Un-mixing Test-time Adaptation under Heterogeneous Data Streams

PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

Cobblestone: Iterative Automation for Formal Verification

Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots

Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors

AttnMod: Attention-Based New Art Styles

Loss Landscape Degeneracy and Stagewise Development in Transformers

Tackling Size Generalization of Graph Neural Networks on Biological Data from a Spectral Perspective

Gradient Leakage Defense with Key-Lock Module for Federated Learning

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Semantic Chain-of-Trust: Autonomous Trust Orchestration for Collaborator Selection via Hypergraph-Aided Agentic AI

How Far Are AI Scientists from Changing the World?

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

E.A.R.T.H.: Structuring Creative Evolution through Model Error in Generative AI

On Gradual Semantics for Assumption-Based Argumentation

Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations

Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team

ORFS-agent: Tool-Using Agents for Chip Design Optimization

World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks

The Urban Impact of AI: Modeling Feedback Loops in Next-Venue Recommendation

BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking

OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problems with Reasoning LLM

Causal Explanations for Image Classifiers

BCR-DRL: Behavior- and Context-aware Reward for Deep Reinforcement Learning in Human-AI Coordination

Federated Cross-Training Learners for Robust Generalization under Data Heterogeneity

Identifying Unique Spatial-Temporal Bayesian Network without Markov Equivalence

Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models

SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation

Sample-Aware Test-Time Adaptation for Medical Image-to-Image Translation

MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations

A Simple and Effective Method for Uncertainty Quantification and OOD Detection

Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking

Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos

Agentic large language models improve retrieval-based radiology question answering

Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data

How LLMs are Shaping the Future of Virtual Reality

Adaptive Machine Learning-Driven Multi-Fidelity Stratified Sampling for Failure Analysis of Nonlinear Stochastic Systems

Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and Context-Aware KGQA

Nested Graph Pseudo-Label Refinement for Noisy Label Domain Adaptation Learning

JSON-Bag: A generic game trajectory representation

NyayaRAG: Realistic Legal Judgment Prediction with RAG under the Indian Common Law System

Efficient Solution and Learning of Robust Factored MDPs

D3: Training-Free AI-Generated Video Detection Using Second-Order Features

On-Device Diffusion Transformer Policy for Efficient Robot Manipulation

Segment First, Retrieve Better: Realistic Legal Search via Rhetorical Role-Based Queries

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications

Advancing Quantum Information Science Pre-College Education: The Case for Learning Sciences Collaboration

Backdoor Attacks on Deep Learning Face Detection

Similarity-Based Self-Construct Graph Model for Predicting Patient Criticalness Using Graph Neural Networks and EHR Data

Prompting Science Report 3: I'll pay you or I'll kill you -- but will you care?

Composable OS Kernel Architectures for Autonomous Intelligence

LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks

Wukong Framework for Not Safe For Work Detection in Text-to-Image systems

OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery

Clinical knowledge in LLMs does not translate to human interactions

작성자

Haebom

저자

Andrew M. Bean, Rebecca Payne, Guy Parsons, Hannah Rose Kirk, Juan Ciro, Rafael Mosquera, Sara Hincapie Monsalve, Aruna S. Ekanayaka, Lionel Tarassenko, Luc Rocher, Adam Mahdi

개요

본 논문은 대규모 언어 모델(LLM)을 활용한 의료 상담의 실효성을 검증하기 위해 1,298명의 참가자를 대상으로 한 통제 연구를 수행했습니다. GPT-4, Llama 3, Command R+ 세 가지 LLM과 대조군(참가자 스스로 판단)을 비교하여 10가지 의료 시나리오에서 질병 진단 및 처치 방안 제시 능력을 평가했습니다. LLM은 단독으로 시나리오를 수행했을 때 질병 진단 정확도는 평균 94.9%, 처치 방안 제시 정확도는 평균 56.3%를 기록했습니다. 하지만 참가자가 LLM을 활용했을 때는 질병 진단 정확도가 34.5% 미만, 처치 방안 제시 정확도가 44.2% 미만으로 대조군과 유의미한 차이를 보이지 않았습니다. 이는 LLM의 의료 상담 활용에 있어 사용자 상호작용의 어려움을 시사합니다. 의료 지식 평가 및 시뮬레이션 환자 상호작용 기준만으로는 실제 사용자와의 상호작용에서 발생하는 문제점을 예측하기 어렵다는 점을 밝혔습니다.

시사점, 한계점

•

시사점:

◦

LLM이 의료 면허 시험에서 높은 점수를 얻더라도 실제 의료 상담 환경에서는 정확도가 크게 떨어질 수 있음을 보여줍니다.

◦

의료 분야에서 LLM을 활용하기 위해서는 단순한 지식 평가가 아닌, 실제 사용자와의 상호작용 능력을 평가하는 체계적인 사용자 테스트가 필수적임을 강조합니다.

◦

기존의 의료 지식 평가 기준과 시뮬레이션 환자 상호작용 기준으로는 실제 환경에서의 LLM 성능을 정확하게 예측할 수 없다는 한계를 지적합니다.

•

한계점:

◦

연구에 사용된 LLM과 시나리오의 제한으로 일반화에 어려움이 있을 수 있습니다.

◦

사용자 인터페이스 디자인 및 교육의 영향에 대한 고려가 부족할 수 있습니다.

◦

다양한 의료 분야와 질병 유형에 대한 추가 연구가 필요합니다.

Slashpage로 제작됨