/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and their Impact on Power and Cooling
NLKI: A lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commons VQA Tasks
Interact-Custom: Customized Human Object Interaction Image Generation
A Self-Supervised Mixture-of-Experts Framework for Multi-behavior Recommendation
MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation
From Tabula Rasa to Emergent Abilities: Discovering Robot Skills via Real-World Unsupervised Quality-Diversity
Dynamic Triangulation-Based Graph Rewiring for Graph Neural Networks
STDiff: A State Transition Diffusion Framework for Time Series Imputation in Industrial Systems
LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions
Graph-R1: Incentivizing the Zero-Shot Graph Learning Capability in LLMs via Explicit Reasoning
Modality-Specific Speech Enhancement and Noise-Adaptive Fusion for Acoustic and Body-Conduction Microphone Framework
Humans Perceive Wrong Narratives from AI Reasoning Texts
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
Pareto Actor-Critic for Communication and Computation Co-Optimization in Non-Cooperative Federated Learning Services
Learning to Drive Ethically: Embedding Moral Reasoning into Autonomous Driving
Generative AI Against Poaching: Latent Composite Flow Matching for Wildlife Conservation
Privacy-Aware Detection of Fake Identity Documents: Methodology, Benchmark, and Improved Algorithms (FakeIDet2)
Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics
Steering Towards Fairness: Mitigating Political Bias in LLMs
Dynamic Context Compression for Efficient RAG
Irredundant $k$-Fold Cross-Validation
Prompt Engineering and the Effectiveness of Large Language Models in Enhancing Human Productivity
A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task
Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs
The Joys of Categorical Conformal Prediction
Adversarial Manipulation of Reasoning Models using Internal Representations
Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models
A Hybrid Artificial Intelligence Method for Estimating Flicker in Power Systems (Changes are marked)
GLProtein: Global-and-Local Structure Aware Protein Representation Learning
Program Semantic Inequivalence Game with Large Language Models
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
Improving Quantization with Post-Training Model Expansion
Safe and Efficient Social Navigation through Explainable Safety Regions Based on Topological Features
A Simple Approach to Constraint-Aware Imitation Learning with Application to Autonomous Racing
Federated nnU-Net for Privacy-Preserving Medical Image Segmentation
ExPath: Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation
Enhancing Automated Loop Invariant Generation for Complex Programs with Large Language Models
RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis
Categorical Data Clustering via Value Order Estimated Distance Metric Learning
Application of AI to formal methods - an analysis of current trends
Reconsidering the Performance of GAE in Link Prediction
See then Tell: Enhancing Key Information Extraction with Vision Grounding
Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
FFHFlow: Diverse and Uncertainty-Aware Dexterous Grasp Generation via Flow Variational Inference
SoAy: A Solution-based LLM API-using Methodology for Academic Information Seeking
Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study
Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off
Network Formation and Dynamics Among Multi-LLMs
NetGPT: Generative Pretrained Transformer for Network Traffic
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
Explainability of Text Processing and Retrieval Methods: A Survey
The Ramon Llull's Thinking Machine for Automated Ideation
RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence
MSARL: Decoupling Reasoning and Tool Use with Multi-Small-Agent Reinforcement Learning
Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search
Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess
Technology as uncharted territory: Contextual integrity and the notion of AI as new ethical ground
Possible Principles for Aligned Structure Learning Agents
OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale
Prompt-to-Product: Generative Assembly via Bimanual Manipulation
OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models
Mixture of Contexts for Long Video Generation
FakeParts: a New Family of AI-Generated DeepFakes
Enabling Equitable Access to Trustworthy Financial Reasoning
Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
Understanding, Protecting, and Augmenting Human Cognition with Generative AI: A Synthesis of the CHI 2025 Tools for Thought Workshop
Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance
ChainReaction! Structured Approach with Causal Chains as Intermediate Representations for Improved and Explainable Causal Video Question Answering
Train-Once Plan-Anywhere Kinodynamic Motion Planning via Diffusion Trees
ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts
WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations
ProactiveEval: A Unified Evaluation Framework for Proactive Dialogue Agents
Research Challenges in Relational Database Management Systems for LLM Queries
Quantum Verifiable Rewards for Post-Training Qiskit Code Assistant
AI Agentic Vulnerability Injection And Transformation with Optimized Reasoning
JADES: A Universal Framework for Jailbreak Assessment via Decompositional Scoring
Learning Primitive Embodied World Models: Towards Scalable Robotic Learning
Multi-Agent Penetration Testing AI for the Web
Uncertainty Aware-Predictive Control Barrier Functions: Safer Human Robot Interaction through Probabilistic Motion Forecasting
Exploring Machine Learning and Language Models for Multimodal Depression Detection
Speech Emotion Recognition via Entropy-Aware Score Selection
Surfel-based 3D Registration with Equivariant SE(3) Features
Evaluating Compositional Generalisation in VLMs and Diffusion Models
Safer Skin Lesion Classification with Global Class Activation Probability Map Evaluation and SafeML
Unleashing Uncertainty: Efficient Machine Unlearning for Generative AI
Signs of Struggle: Spotting Cognitive Distortions across Language and Register
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding
SKGE-SWIN: End-To-End Autonomous Vehicle Waypoint Prediction and Navigation Using Skip Stage Swin Transformer
Occlusion Robustness of CLIP for Military Vehicle Classification
SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual Grounding
Provable Benefits of In-Tool Learning for Large Language Models
${C}^{3}$-GS: Learning Context-aware, Cross-dimension, Cross-scale Feature for Generalizable Gaussian Splatting
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol
EEGDM: Learning EEG Representation with Latent Diffusion Model
Generative Annotation for ASR Named Entity Correction
MobileCLIP2: Improving Multi-Modal Reinforced Training
Task Allocation for Autonomous Machines using Computational Intelligence and Deep Reinforcement Learning
Load more
FFHFlow: Diverse and Uncertainty-Aware Dexterous Grasp Generation via Flow Variational Inference
Created by
Haebom
作者
Qian Feng, Jianxiang Feng, Zhaopeng Chen, Rudolph Triebel, Alois Knoll
概要
部分的な観測から多様で不確実性を認識する複数指の手のグリップを合成することは、ロボット学習において重要な課題として残っています。従来の生成モデルは、手の込んだ手の複雑なグリップ分布をモデル化するのが困難であり、部分的な点群に固有の形態の不確実性を考慮せず、信頼できないか過度に保守的なグリップを生成することがよくあります。本稿では、部分的な点群の知覚不確実性を明示的に定量化しながら、多様で堅牢なマルチフィンガーグリップを生成するフローベースの変分フレームワークであるFFHFlowを提案します。提案された方法は、正規化フローベースの深い潜在変数モデルを利用して階層的グリップ多様体を学習することによって、条件付き変分オートエンコーダ(cVAEs)のモード崩壊と固定事前制限を克服します。フローの可逆性と正確な可能性を活用して、FFHFlowは部分的な観測で形状不確実性を内部的に調査し、新しいオブジェクト構造を識別してリスクを認識するグリップ合成を可能にします。信頼性をさらに高めるために、流れの可能性と判別的なグリップ評価器を統合し、形の曖昧性に強いグリップを優先する不確実性認識ランキング戦略を確立します。シミュレーションと実際の環境での広範な実験により、FFHFlowはグリップの多様性と成功率の観点から最先端の基準(拡散モデルを含む)を上回り、実行時間が効率的なサンプリングを達成することを示しています。さらに、多様性ベースのサンプリングは衝突を軽減し、優れたパフォーマンスを発揮する複雑で限られた環境での実用的な価値を示しています(プロジェクトページ:
_____
T178176_____ )。
FFHFlow
Abstract
sites.google.com
Takeaways、Limitations
•
Takeaways:
◦
部分的な観測から多様で不確実性を認識する複数の指グリップを効率的に生成する新しい方法を提示する。
◦
フローベースのモデルを使用して、従来の方法のLimitationsであるモード崩壊と固定辞書を克服します。
◦
不確実性を明示的に考慮して、より信頼性が高く信頼性の高いグリップを作成します。
◦
シミュレーションと実環境で最先端のパフォーマンスを実現します。
◦
複雑で限られた環境でも効果的に機能します。
•
Limitations:
◦
提案された方法の性能は、使用されるデータセットとモデルの複雑さに依存する可能性があります。
◦
実際の環境での一般化性能をさらに向上させる必要がある。
◦
計算コストが比較的高い場合があります。
◦
様々な物体形態および材料の一般化性能のさらなる研究が必要である。
PDFを見る
Made with Slashpage