haebom
Daily Arxiv
전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.
VisPlay: Self-Evolving Vision-Language Models from Images
CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking
Multimodal Evaluation of Russian-language Architectures
Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models
OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition
Finetuning LLMs for Automatic Form Interaction on Web-Browser in Selenium Testing Framework
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
Can Artificial Intelligence Accelerate Technological Progress? Researchers' Perspectives on AI in Manufacturing and Materials Science
Node-Level Uncertainty Estimation in LLM-Generated SQL
Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy
Mesh-based Super-resolution of Detonation Flows with Multiscale Graph Transformers
STAMP: Spatial-Temporal Adapter with Multi-Head Pooling
Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging
Statistically Assuring Safety of Control Systems using Ensembles of Safety Filters and Conformal Prediction
LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging
CaberNet: Causal Representation Learning for Cross-Domain HVAC Energy Prediction
Kaggle Chronicles: 15 Years of Competitions, Community and Data Science Innovation
Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
TabDistill: Distilling Transformers into Neural Nets for Few-Shot Tabular Classification
Diagnosing Hallucination Risk in AI Surgical Decision-Support: A Sequential Framework for Sequential Validation
GAPO: Robust Advantage Estimation for Real-World Code LLMs
Steering Evaluation-Aware Language Models to Act Like They Are Deployed
Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents
HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models
Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks
Interpretability as Alignment: Making Internal Understanding a Design Principle
How many patients could we save with LLM priors?
CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples
Interpreting the Effects of Quantization on LLMs
From Confidence to Collapse in LLM Factual Robustness
PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
MAQuA: Adaptive Question-Asking for Multidimensional Mental Health Screening using Item Response Theory
LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations
Efficient Solution and Learning of Robust Factored MDPs
To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks
Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems
HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models
When concept-based XAI is imprecise: Do people distinguish between generalisations and misrepresentations?
Eliciting Reasoning in Language Models with Cognitive Tools
Beyond Bias Scores: Unmasking Vacuous Neutrality in Small Language Models
From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks
Policy Search, Retrieval, and Composition via Task Similarity in Collaborative Agentic Systems
Fast-DataShapley: Neural Modeling for Training Data Valuation
An Iterative Question-Guided Framework for Knowledge Base Question Answering
A survey of using EHR as real-world evidence for discovering and validating new drug indications
CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi Sensing
A Distributionally Robust Framework for Nuisance in Causal Effect Estimation
BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
Vector Quantized-Elites: Unsupervised and Problem-Agnostic Quality-Diversity Optimization
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
A Closer Look at Adversarial Suffix Learning for Jailbreaking LLMs: Augmented Adversarial Trigger Learning
CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation
LLMInit: A Free Lunch from Large Language Models for Selective Initialization of Recommendation
Securing Smart Contract Languages with a Unified Agentic Framework for Vulnerability Repair in Solidity and Move
TRADES: Generating Realistic Market Simulations with Diffusion Models
Recent Advances in Discrete Speech Tokens: A Review
Oracular Programming: A Modular Foundation for Building LLM-Enabled Software
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
RAPID: Robust and Agile Planner Using Inverse Reinforcement Learning for Vision-Based Drone Navigation
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Efficient Architectures for High Resolution Vision-Language Models
MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions
Automatically Detecting Online Deceptive Patterns
LEARNER: Contrastive Pretraining for Learning Fine-Grained Patient Progression from Coarse Inter-Patient Labels
Atomic Calibration of LLMs in Long-Form Generations
TopoTune : A Framework for Generalized Combinatorial Complex Neural Networks
Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise
Introducing DEFORMISE: A deep learning framework for dementia diagnosis in the elderly using optimized MRI slice selection
KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer
Provably Robust Pre-Trained Ensembles for Biomarker-Based Cancer Classification
DiffuSyn Bench: Evaluating Vision-Language Models on Real-World Complexities with Diffusion-Generated Synthetic Benchmarks
Property-guided Inverse Design of Metal-Organic Frameworks Using Quantum Natural Language Processing
Can LLMs Replace Economic Choice Prediction Labs? The Case of Language-based Persuasion Games
LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space
As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files
SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models
Benchmarking Multi-Step Legal Reasoning and Analyzing Chain-of-Thought Effects in Large Language Models
Multi-dimensional Data Analysis and Applications Basing on LLM Agents and Knowledge Graph Interactions
Towards Efficient Multimodal Unified Reasoning Model via Model Merging
FATHOMS-RAG: A Framework for the Assessment of Thinking and Observation in Multimodal Systems that use Retrieval Augmented Generation
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Evaluating Multimodal Large Language Models with Daily Composite Tasks in Home Environments
Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems
Constraint-Guided Prediction Refinement via Deterministic Diffusion Trajectories
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
PoE-World: Compositional World Modeling with Products of Programmatic Experts
Bridging the Gap in XAI-Why Reliable Metrics Matter for Explainability and Compliance
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations
Teacher-Guided One-Shot Pruning via Context-Aware Knowledge Distillation
Evolution Strategies at the Hyperscale
Faster Certified Symmetry Breaking Using Orders With Auxiliary Variables
Stabilizing Policy Gradient Methods via Reward Profiling
SAM 3D: 3Dfy Anything in Images
Improving Long-Tailed Object Detection with Balanced Group Softmax and Metric Learning
Generative AI for Enhanced Wildfire Detection: Bridging the Synthetic-Real Domain Gap
Load more
Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations
Created by
Haebom
저자
Irmak Guzey, Haozhi Qi, Julen Urain, Changhao Wang, Jessica Yin, Krishna Bodduluri, Mike Lambeta, Lerrel Pinto, Akshara Rai, Jitendra Malik, Tingfan Wu, Akash Sharma, Homanga Bharadhwaj
개요
AINA 프레임워크는 자연 환경에서 일상적인 작업을 수행하는 인간의 데이터를 기반으로 다지 로봇 정책을 학습하는 것을 목표로 한다. Aria Gen 2 안경을 사용하여 수집된 데이터를 활용하여, 배경 변화에 강하고 로봇 데이터 없이 직접 배포할 수 있는 3D 포인트 기반 다지 정책을 학습한다. 이 프레임워크는 기존의 인간-로봇 정책 학습 방식과 비교하여 우수한 성능을 보이며, 9가지 일상 조작 작업에 대한 결과를 제시한다.
시사점, 한계점
•
시사점:
◦
인간 데이터만으로 다지 로봇 정책을 학습하는 새로운 접근 방식 제시.
◦
간편한 하드웨어(Aria Gen 2 안경)를 활용하여 데이터 수집의 용이성 확보.
◦
배경 변화에 강하고 로봇 데이터가 필요 없는 정책 학습.
◦
다양한 일상 조작 작업에서의 성공적인 결과 제시.
•
한계점:
◦
논문에서 구체적인 한계점에 대한 직접적인 언급은 없음.
◦
성능 검증에 사용된 작업의 종류가 제한적일 수 있음.
◦
웹사이트를 통해 로봇 실행 결과를 확인해야 함.
PDF 보기
Made with Slashpage