[공지사항]을 빙자한 안부와 근황

Show more

Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

Romance, Relief, and Regret: Teen Narratives of Chatbot Overreliance

Supernova: Achieving More with Less in Transformer Architectures

GR-3 Technical Report

EndoControlMag: Robust Endoscopic Vascular Motion Magnification with Periodic Reference Resetting and Hierarchical Tissue-aware Dual-Mask Contro

AI-Enhanced Precision in Sport Taekwondo: Increasing Fairness, Speed, and Trust in Competition (FST.ai)

Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models

NeuroHD-RA: Neural-distilled Hyperdimensional Model with Rhythm Alignment

IPPRO: Importance-based Pruning with PRojective Offset for Magnitude-indifferent Structural Pruning

Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track

Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters

Physical models realizing the transformer architecture of large language models

GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Multimodal Coordinated Online Behavior: Trade-offs and Strategies

A Survey of Deep Learning for Geometry Problem Solving

Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models

Pre-Training LLMs on a budget: A comparison of three optimizers

OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting

Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

The Joys of Categorical Conformal Prediction

Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge

Capacity Planning and Scheduling for Jobs with Uncertainty in Resource Usage and Duration

FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization

Neural Approaches for Multi-Objective Routing on Multigraphs

Diffusion-Based Electrocardiography Noise Quantification via Anomaly Detection

CogStream: Context-guided Streaming Video Question Answering

Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems

Towards provable probabilistic safety for scalable embodied AI systems

SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods

Human Empathy as Encoder: AI-Assisted Depression Assessment in Special Education

Multimodal Forecasting of Sparse Intraoperative Hypotension Events Powered by Language Model

Autocomp: LLM-Driven Code Optimization for Tensor Accelerators

ViP$^2$-CLIP: Visual-Perception Prompting with Unified Alignment for Zero-Shot Anomaly Detection

ReMi: A Random Recurrent Neural Network Approach to Music Production

Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations

A Goal-Oriented Reinforcement Learning-Based Path Planning Algorithm for Modular Self-Reconfigurable Satellites

A Method for the Architecture of a Medical Vertical Large Language Model Based on Deepseek R1

Balancing Robustness and Efficiency in Embedded DNNs Through Activation Function Selection

Antithetic Sampling for Top-k Shapley Identification

GeoFlow-SLAM: A Robust Tightly-Coupled RGBD-Inertial and Legged Odometry Fusion SLAM for Dynamic Legged Robotics

SciFi-Benchmark: Leveraging Science Fiction To Improve Robot Behavior

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $\mu$P Parametrization

Curating Demonstrations using Online Experience

OMNISEC: LLM-Driven Provenance-based Intrusion Detection via Retrieval-Augmented Behavior Prompting

PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

Reasoning Does Not Necessarily Improve Role-Playing Ability

BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning

Revealing Bias Formation in Deep Neural Networks Through the Geometric Mechanisms of Human Visual Decoupling

Conformal Predictions for Human Action Recognition with Vision-Language Models

Cross-Encoder Rediscovers a Semantic Variant of BM25

DisCoPatch: Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection

RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment

R-Bot: An LLM-based Query Rewrite System

Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation

Atomic Calibration of LLMs in Long-Form Generations

LibEER: A Comprehensive Benchmark and Algorithm Library for EEG-based Emotion Recognition

Aligning AI with Public Values: Deliberation and Decision-Making for Governing Multimodal LLMs in Political Video Analysis

VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models

From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent AI safety benchmarks

V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard?

FLAIN: Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons

FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation

Analysis of the 2024 BraTS Meningioma Radiotherapy Planning Automated Segmentation Challenge

Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

LangBiTe: A Platform for Testing Bias in Large Language Models

Practical Insights into Knowledge Distillation for Pre-Trained Models

Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy

Energy-Efficient and Real-Time Sensing for Federated Continual Learning via Sample-Driven Control

Gemini 2.5 Pro Capable of Winning Gold at IMO 2025

Hierarchical Budget Policy Optimization for Adaptive Reasoning

BioGraphFusion: Graph Knowledge Embedding for Biological Completion and Reasoning

Routine: A Structural Planning Framework for LLM Agent System in Enterprise

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Assessing Adaptive World Models in Machines with Novel Games

A Multi-granularity Concept Sparse Activation and Hierarchical Knowledge Graph Fusion Framework for Rare Disease Diagnosis

An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis

Hierarchical Reasoning Model

The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning

DCG-SQL: Enhancing In-Context Learning for Text-to-SQL with Deep Contextual Schema Link Graph

InternAgent: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification

R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory

Efficient Strategy Learning by Decoupling Searching and Pathfinding for Object Navigation

Alto: Orchestrating Distributed Compound AI Systems with Nested Ancestry

Toward A Causal Framework for Modeling Perception

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Decoding Translation-Related Functional Sequences in 5'UTRs Using Interpretable Deep Learning Models

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support

AI-enhanced conversational agents for personalized asthma support Factors for engagement, value and efficacy

RAVine: Reality-Aligned Evaluation for Agentic Search

Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory

SciFi-Benchmark: Leveraging Science Fiction To Improve Robot Behavior

Created by

Haebom

저자

Pierre Sermanet, Anirudha Majumdar, Vikas Sindhwani

개요

본 논문은 인공지능(AI) 및 로봇 기술의 발전 속도를 고려하여, AI 시스템으로 제어되는 로봇이 인간의 가치와 얼마나 잘 부합하는지에 대한 질문에 답하기 위한 확장 가능한 방법을 제안한다. 824편의 주요 공상과학 문학 작품(영화, TV, 소설, 과학 서적)에서 AI 또는 로봇이 중대한 결정을 내린 순간들을 분석하여 벤치마크를 구축했다. 최첨단 거대언어모델(LLM)을 활용하여 유사한 상황에서의 질문, 에이전트의 결정, 그리고 대안적인 결정(선택 또는 악의적인 선택)을 생성한다. 인간이 투표한 답변을 기반으로 모델이 인간의 가치와 얼마나 잘 부합하는지 측정하고, AI 및 로봇의 윤리적 행동을 촉진하기 위한 공상과학에서 영감을 받은 규칙(헌법)을 생성한다. 본 연구는 생성된 헌법이 AI의 인간 가치 정합성을 크게 향상시키고(79.4%에서 95.8%로), 실제 상황에도 적용 가능함을 보여준다. 'SciFi-Benchmark'라는 대규모 데이터셋을 공개하여 로봇 윤리 및 안전 연구를 발전시킨다.

시사점, 한계점

•

시사점:

◦

공상과학 문학 데이터를 활용하여 AI의 윤리적 행동을 평가하고 개선하는 새로운 방법 제시.

◦

생성된 헌법이 AI의 인간 가치 정합성을 크게 향상시키는 것을 실험적으로 증명.

◦

실제 상황(ASIMOV 벤치마크)에서도 높은 성능을 보이는 SciFi-inspired constitutions 개발.

◦

대규모 데이터셋(SciFi-Benchmark) 공개를 통한 로봇 윤리 및 안전 연구 발전에 기여.

•

한계점:

◦

LLM의 편향성이나 한계가 결과에 영향을 미칠 수 있음.

◦

공상과학 문학 데이터의 일반화 가능성에 대한 검토 필요.

◦

생성된 헌법의 실제 세계 적용에 대한 추가적인 연구 필요.

◦

인간의 가치 판단의 주관성에 따른 한계 존재.

Made with Slashpage