Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Ada-TransGNN: An Air Quality Prediction Model Based On Adaptive Graph Convolutional Networks

Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery

Consistent Opponent Modeling of Static Opponents in Imperfect-Information Games

Finding Outliers in a Haystack: Anomaly Detection for Large Pointcloud Scenes

Agentic AI for Software: thoughts from Software Engineering community

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Dream to Chat: Model-based Reinforcement Learning on Dialogues with User Belief Modeling

A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems

Generative Artificial Intelligence and Agents in Research and Teaching

CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression

Comparative Analysis of UAV Path Planning Algorithms for Efficient Navigation in Urban 3D Environments

Retrieval Enhanced Feedback via In-context Neural Error-book

From Confidence to Collapse in LLM Factual Robustness

On Task Vectors and Gradients

Learning in Repeated Multi-Objective Stackelberg Games with Payoff Manipulation

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

DLLMQuant: Quantizing Diffusion-based Large Language Models

LLM-Enhanced Linear Autoencoders for Recommendation

Leveraging GNN to Enhance MEF Method in Predicting ENSO

Uncertainty-Guided Face Matting for Occlusion-Aware Face Transformation

New Kid in the Classroom: Exploring Student Perceptions of AI Coding Assistants

Large Language Model-Based Framework for Explainable Cyberattack Detection in Automatic Generation Control Systems

SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs

Apple Intelligence Foundation Language Models: Tech Report 2025

SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models

Demographic-aware fine-grained classification of pediatric wrist fractures

Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing

Solar Altitude Guided Scene Illumination

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Spectra-to-Structure and Structure-to-Spectra Inference Across the Periodic Table

UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation

Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models

EVM-Fusion: An Explainable Vision Mamba Architecture with Neural Algorithmic Fusion

RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection

Revisiting SSL for sound event detection: complementary fusion and adaptive post-processing

Concept-Guided Interpretability via Neural Chunking

Unveiling the Landscape of LLM Deployment in the Wild: An Empirical Study

An Ontology-Driven Graph RAG for Legal Norms: A Hierarchical, Temporal, and Deterministic Approach

Prefill-level Jailbreak: A Black-Box Risk Analysis of Large Language Models

Video CLIP Model for Multi-View Echocardiography Interpretation

A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Disease Detection from Retinal Fundus Images

M$^2$IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering

Noise-based reward-modulated learning

Faster Parameter-Efficient Tuning with Token Redundancy Reduction

UniGenX: a unified generative foundation model that couples sequence, structure and function to accelerate scientific design across proteins, molecules and materials

Collaborative Evaluation of Deepfake Text with Deliberation-Enhancing Dialogue Systems

Large Language Models Badly Generalize across Option Length, Problem Types, and Irrelevant Noun Replacements

TableTalk: Scaffolding Spreadsheet Development with a Language Agent

StagFormer: Time Staggering Transformer Decoding for RunningLayers In Parallel

Provably-Safe Neural Network Training Using Hybrid Zonotope Reachability Analysis

Generative Artificial Intelligence-Supported Pentesting: A Comparison between Claude Opus, GPT-4, and Copilot

Safe Multiagent Coordination via Entropic Exploration

TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use

Cultural Dimensions of AI Perception: Charting Expectations, Risks, Benefits, Tradeoffs, and Value in Germany and China

CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers

Perception Gaps in Risk, Benefit, and Value Between Experts and Public Challenge Socially Accepted AI

Hierarchical Object-Oriented POMDP Planning for Object Rearrangement

From Intents to Conversations: Generating Intent-Driven Dialogues with Contrastive Learning for Multi-Turn Classification

Secure Reinforcement Learning via Shuffle Privacy Model

Overcoming label shift with target-aware federated learning

Benchmarking XAI Explanations with Human-Aligned Evaluations

HonestCyberEval: An AI Cyber Risk Benchmark for Automated Software Exploitation

Leveraging Multi-facet Paths for Heterogeneous Graph Representation Learning

GeNet: A Multimodal LLM-Based Co-Pilot for Network Topology and Configuration

ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context

Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL

Exploring the Robustness of Language Models for Tabular Question Answering via Attention Analysis

Learning county from pixels: corn yield prediction with attention-weighted multiple instance learning

Memory augment is All You Need for image restoration

Rethinking Distribution Shifts: Empirical Analysis and Inductive Modeling for Tabular Data

DiffBlender: Composable and Versatile Multimodal Text-to-Image Diffusion Models

Beyond Discriminant Patterns: On the Robustness of Decision Rule Ensembles

Bayesian Deep Learning for Segmentation for Autonomous Safe Planetary Landing

ST-Raptor: LLM-Powered Semi-Structured Table Question Answering

Route-and-Execute: Auditable Model-Card Matching and Specialty-Level Deployment

LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence

Response and Prompt Evaluation to Prevent Parasocial Relationships with Chatbots

Profile-Aware Maneuvering: A Dynamic Multi-Agent System for Robust GAIA Problem Solving by AWorld

Multi-Agent LLMs as Ethics Advocates for AI-Based Systems

Feature-Guided Neighbor Selection for Non-Expert Evaluation of Model Predictions

Architecting Clinical Collaboration: Multi-Agent Reasoning Systems for Multimodal Medical VQA

mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation

Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models

The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners

YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models

Consensus in Motion: A Case of Dynamic Rationality of Sequential Learning in Probability Aggregation

Can Large Language Models Act as Ensembler for Multi-GNNs?

Pessimistic Iterative Planning with RNNs for Robust POMDPs

Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding

Integrating Large Language Model for Improved Causal Discovery

A Survey on Causal Discovery: Theory and Practice

Generative Interfaces for Language Models

Interpolating Speaker Identities in Embedding Space for Data Expansion

VibeVoice Technical Report

LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding

Understanding Tool-Integrated Reasoning

Emotions as Ambiguity-aware Ordinal Representations

Real-Time Model Checking for Closed-Loop Robot Reactive Planning

UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation

Created by

Haebom

저자

Yihe Tang, Wenlong Huang, Yingke Wang, Chengshu Li, Roy Yuan, Ruohan Zhang, Jiajun Wu, Li Fei-Fei

개요

본 논문은 비구조화된 환경에서 로봇의 물체 조작을 위해 미세한 물체 기능(affordance) 이해의 중요성을 강조합니다. 기존의 시각적 기능 예측 방법들은 수동 주석 데이터에 의존하거나 미리 정의된 작업 집합에만 국한되는 한계를 가지고 있습니다. 이에 본 논문에서는 UAD (Unsupervised Affordance Distillation)라는 방법을 제시합니다. UAD는 어떠한 수동 주석 없이도 기초 모델(foundation model)로부터 기능 지식을 작업 조건부 기능 모델로 증류하는 방법입니다. 대규모 비전 모델과 비전-언어 모델의 상호 보완적인 강점을 활용하여, UAD는 <지시, 시각적 기능> 쌍으로 구성된 대규모 데이터셋을 자동으로 주석 처리합니다. 고정된 특징 위에 경량의 작업 조건부 디코더만을 학습시킴으로써, UAD는 시뮬레이션의 렌더링된 물체에 대해서만 학습되었음에도 불구하고, 실제 로봇 환경과 다양한 인간 활동에 대한 주목할 만한 일반화 성능을 보여줍니다. UAD가 제공하는 기능을 관측 공간으로 사용하여, 본 논문은 단 10개의 시범만으로 학습 후에도 보이지 않는 물체 인스턴스, 물체 범주, 그리고 작업 지시의 변화에 대한 유망한 일반화 성능을 보여주는 모방 학습 정책을 제시합니다.

시사점, 한계점

•

시사점:

◦

수동 주석 없이도 대규모 데이터셋을 자동으로 주석 처리하여 기능(affordance) 학습을 가능하게 함.

◦

기초 모델을 활용하여 시뮬레이션 데이터만으로 실제 환경에 대한 일반화 성능을 확보.

◦

적은 수의 시범 학습만으로도 새로운 물체, 작업 지시에 대한 일반화 성능을 보임.

◦

모방 학습 정책과의 결합을 통해 실제 로봇 조작에 적용 가능성을 제시.

•

한계점:

◦

시뮬레이션 데이터에 의존하여 실제 환경과의 도메인 격차(domain gap) 문제 존재 가능성.

◦

기초 모델의 성능에 의존적이며, 기초 모델의 한계가 UAD의 성능에 영향을 미칠 수 있음.

◦

다양한 물체 및 작업에 대한 일반화 성능의 한계는 추가 연구가 필요.

Made with Slashpage