haebom
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Language Models are Injective and Hence Invertible
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Latent Diffusion Model without Variational Autoencoder
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
Beyond One World: Benchmarking Super Heros in Role-Playing Across Multiversal Contexts
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
Max It or Miss It: Benchmarking LLM On Solving Extremal Problems
Phenome-Wide Multi-Omics Integration Uncovers Distinct Archetypes of Human Aging
When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
A Vision for Access Control in LLM-based Agent Systems
Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models
Formally Verified Certification of Unsolvability of Temporal Planning Problems
DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction
MSDM: Generating Task-Specific Pathology Images with a Multimodal Conditioned Diffusion Model for Cell and Nuclei Segmentation
Synthetic Series-Symbol Data Generation for Time Series Foundation Models
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy
A New Digital Divide? Coder Worldviews, the Slop Economy, and Democracy in the Age of AI
Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs
Creative synthesis of kinematic mechanisms
Market-Driven Subset Selection for Budgeted Training
Mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations
A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation
TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting
Learning Generalizable Shape Completion with SIM(3) Equivariance
Dolphin v1.0 Technical Report
A Measurement Study of Model Context Protocol Ecosystem
Diffusion Models are Kelly Gamblers
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
Semantic Representation Attack against Aligned Large Language Models
Chiplet-Based RISC-V SoC with Modular AI Acceleration
Accurate and Efficient Low-Rank Model Merging in Core Space
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
Graph Coloring for Multi-Task Learning
Robust LLM Training Infrastructure at ByteDance
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Communications to Circulations: Real-Time 3D Wind Field Prediction Using 5G GNSS Signals and Deep Learning
Why and How Auxiliary Tasks Improve JEPA Representations
Creativity Benchmark: A benchmark for marketing creativity for large language models
SpikingBrain: Spiking Brain-inspired Large Models
Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection
Epistemic Trade-Off: An Analysis of the Operational Breakdown and Ontological Limits of "Certainty-Scope" in AI
ZeST: an LLM ベースの Zero-Shot Traversability Navigation for Unknown Environments
Interpretable Decision-Making for End-to-End Autonomous Driving
A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs
Limitations of Normalization in Attention Mechanism
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance: A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
SegDAC: Improving Visual Reinforcement Learning by Extracting Dynamic Objectc-Centric Representations from Pretrained Vision Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches
CAPO: Towards Enhancing LLM Reasoning through Generative Credit Assignment
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models
SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches
A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification
From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models
ReDi: Rectified Discrete Flow
Adaptive Policy Synchronization for Scalable Reinforcement Learning
From Sequence to Structure: Uncovering Substructure Reasoning in Transformers
Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation
Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
AI-Generated Video Detection via Perceptual Straightening
From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging
Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning
ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM
GeNIE: A Generalizable Navigation System for In-the-Wild Environments
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Code Execution as Grounded Supervision for LLM Reasoning
Denoising the Future: Top-p Distributions for Moving Through Time
HauntAttack: When Attack Follows Reasoning as a Shadow
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
VERINA: Benchmarking Verifiable Code Generation
RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation
The quest for the GRAph Level autoEncoder (GRALE)
Efficient Large Language Model Inference with Neural Block Linearization
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Load more
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Created by
Haebom
作者
Blake Werner, Lizhi Yang, Aaron D. Ames
概要
本論文は、不規則な環境でロボットを安定して歩行するためのアーキテクチャを提示します。高速の自己感覚安定化装置と低速の認知ポリシーを組み合わせたレイヤード制御アーキテクチャ(LCA)を提案し、これは単一のアーキテクチャよりも強力な性能を示す。 2段階のトレーニング方式でLCAの効果を実証し、Unitree G1ヒューマノイドロボットを用いた実験では、階段や顎などの課題で単一政策より成功した結果を示した。
Takeaways、Limitations
•
Takeaways:
◦
アーキテクチャの時間スケール分離は、ロボットの安定した歩行性能向上の重要な要素であることを強調する。
◦
単純なアーキテクチャと最小限の認知エンコーダを使用しても堅牢なパフォーマンスを達成できることを示しています。
◦
2段階のトレーニング方式がLCAの性能向上に寄与することを立証する。
◦
実際のハードウェア実験を通じて提案された方法の実用性を確認した。
•
Limitations:
◦
論文で具体的なアーキテクチャ設計や訓練方式に関する詳細情報が不足することがある。
◦
様々な環境での一般化性能のさらなる研究が必要である。
◦
他の種類のヒューマノイドロボットへの適用可能性をさらに検証する必要があります。
PDFを見る
Made with Slashpage