Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

VisPlay: Self-Evolving Vision-Language Models from Images

CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking

Multimodal Evaluation of Russian-language Architectures

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition

Finetuning LLMs for Automatic Form Interaction on Web-Browser in Selenium Testing Framework

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

Can Artificial Intelligence Accelerate Technological Progress? Researchers' Perspectives on AI in Manufacturing and Materials Science

Node-Level Uncertainty Estimation in LLM-Generated SQL

Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy

Mesh-based Super-resolution of Detonation Flows with Multiscale Graph Transformers

STAMP: Spatial-Temporal Adapter with Multi-Head Pooling

Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging

Statistically Assuring Safety of Control Systems using Ensembles of Safety Filters and Conformal Prediction

LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging

CaberNet: Causal Representation Learning for Cross-Domain HVAC Energy Prediction

Kaggle Chronicles: 15 Years of Competitions, Community and Data Science Innovation

Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

TabDistill: Distilling Transformers into Neural Nets for Few-Shot Tabular Classification

Diagnosing Hallucination Risk in AI Surgical Decision-Support: A Sequential Framework for Sequential Validation

GAPO: Robust Advantage Estimation for Real-World Code LLMs

Steering Evaluation-Aware Language Models to Act Like They Are Deployed

Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents

HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models

Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks

Interpretability as Alignment: Making Internal Understanding a Design Principle

How many patients could we save with LLM priors?

CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples

Interpreting the Effects of Quantization on LLMs

From Confidence to Collapse in LLM Factual Robustness

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models

MAQuA: Adaptive Question-Asking for Multidimensional Mental Health Screening using Item Response Theory

LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations

Efficient Solution and Learning of Robust Factored MDPs

To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks

Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems

HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models

When concept-based XAI is imprecise: Do people distinguish between generalisations and misrepresentations?

Eliciting Reasoning in Language Models with Cognitive Tools

Beyond Bias Scores: Unmasking Vacuous Neutrality in Small Language Models

From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks

Policy Search, Retrieval, and Composition via Task Similarity in Collaborative Agentic Systems

Fast-DataShapley: Neural Modeling for Training Data Valuation

An Iterative Question-Guided Framework for Knowledge Base Question Answering

A survey of using EHR as real-world evidence for discovering and validating new drug indications

CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi Sensing

A Distributionally Robust Framework for Nuisance in Causal Effect Estimation

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms

Vector Quantized-Elites: Unsupervised and Problem-Agnostic Quality-Diversity Optimization

CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners

A Closer Look at Adversarial Suffix Learning for Jailbreaking LLMs: Augmented Adversarial Trigger Learning

CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation

LLMInit: A Free Lunch from Large Language Models for Selective Initialization of Recommendation

Securing Smart Contract Languages with a Unified Agentic Framework for Vulnerability Repair in Solidity and Move

TRADES: Generating Realistic Market Simulations with Diffusion Models

Recent Advances in Discrete Speech Tokens: A Review

Oracular Programming: A Modular Foundation for Building LLM-Enabled Software

KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference

RAPID: Robust and Agile Planner Using Inverse Reinforcement Learning for Vision-Based Drone Navigation

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

Efficient Architectures for High Resolution Vision-Language Models

MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions

Automatically Detecting Online Deceptive Patterns

LEARNER: Contrastive Pretraining for Learning Fine-Grained Patient Progression from Coarse Inter-Patient Labels

Atomic Calibration of LLMs in Long-Form Generations

TopoTune : A Framework for Generalized Combinatorial Complex Neural Networks

Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise

Introducing DEFORMISE: A deep learning framework for dementia diagnosis in the elderly using optimized MRI slice selection

KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer

Provably Robust Pre-Trained Ensembles for Biomarker-Based Cancer Classification

DiffuSyn Bench: Evaluating Vision-Language Models on Real-World Complexities with Diffusion-Generated Synthetic Benchmarks

Property-guided Inverse Design of Metal-Organic Frameworks Using Quantum Natural Language Processing

Can LLMs Replace Economic Choice Prediction Labs? The Case of Language-based Persuasion Games

LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space

As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files

SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models

Benchmarking Multi-Step Legal Reasoning and Analyzing Chain-of-Thought Effects in Large Language Models

Multi-dimensional Data Analysis and Applications Basing on LLM Agents and Knowledge Graph Interactions

Towards Efficient Multimodal Unified Reasoning Model via Model Merging

FATHOMS-RAG: A Framework for the Assessment of Thinking and Observation in Multimodal Systems that use Retrieval Augmented Generation

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

Evaluating Multimodal Large Language Models with Daily Composite Tasks in Home Environments

Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems

Constraint-Guided Prediction Refinement via Deterministic Diffusion Trajectories

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

PoE-World: Compositional World Modeling with Products of Programmatic Experts

Bridging the Gap in XAI-Why Reliable Metrics Matter for Explainability and Compliance

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

Teacher-Guided One-Shot Pruning via Context-Aware Knowledge Distillation

Evolution Strategies at the Hyperscale

Faster Certified Symmetry Breaking Using Orders With Auxiliary Variables

Stabilizing Policy Gradient Methods via Reward Profiling

SAM 3D: 3Dfy Anything in Images

Improving Long-Tailed Object Detection with Balanced Group Softmax and Metric Learning

Generative AI for Enhanced Wildfire Detection: Bridging the Synthetic-Real Domain Gap

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

Created by

Haebom

저자

Irmak Guzey, Haozhi Qi, Julen Urain, Changhao Wang, Jessica Yin, Krishna Bodduluri, Mike Lambeta, Lerrel Pinto, Akshara Rai, Jitendra Malik, Tingfan Wu, Akash Sharma, Homanga Bharadhwaj

개요

AINA 프레임워크는 자연 환경에서 일상적인 작업을 수행하는 인간의 데이터를 기반으로 다지 로봇 정책을 학습하는 것을 목표로 한다. Aria Gen 2 안경을 사용하여 수집된 데이터를 활용하여, 배경 변화에 강하고 로봇 데이터 없이 직접 배포할 수 있는 3D 포인트 기반 다지 정책을 학습한다. 이 프레임워크는 기존의 인간-로봇 정책 학습 방식과 비교하여 우수한 성능을 보이며, 9가지 일상 조작 작업에 대한 결과를 제시한다.

시사점, 한계점

•

시사점:

◦

인간 데이터만으로 다지 로봇 정책을 학습하는 새로운 접근 방식 제시.

◦

간편한 하드웨어(Aria Gen 2 안경)를 활용하여 데이터 수집의 용이성 확보.

◦

배경 변화에 강하고 로봇 데이터가 필요 없는 정책 학습.

◦

다양한 일상 조작 작업에서의 성공적인 결과 제시.

•

한계점:

◦

논문에서 구체적인 한계점에 대한 직접적인 언급은 없음.

◦

성능 검증에 사용된 작업의 종류가 제한적일 수 있음.

◦

웹사이트를 통해 로봇 실행 결과를 확인해야 함.

Made with Slashpage