haebom
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Language Models are Injective and Hence Invertible
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Latent Diffusion Model without Variational Autoencoder
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
Beyond One World: Benchmarking Super Heros in Role-Playing Across Multiversal Contexts
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
Max It or Miss It: Benchmarking LLM On Solving Extremal Problems
Phenome-Wide Multi-Omics Integration Uncovers Distinct Archetypes of Human Aging
When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
A Vision for Access Control in LLM-based Agent Systems
Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models
Formally Verified Certification of Unsolvability of Temporal Planning Problems
DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction
MSDM: Generating Task-Specific Pathology Images with a Multimodal Conditioned Diffusion Model for Cell and Nuclei Segmentation
Synthetic Series-Symbol Data Generation for Time Series Foundation Models
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy
A New Digital Divide? Coder Worldviews, the Slop Economy, and Democracy in the Age of AI
Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs
Creative synthesis of kinematic mechanisms
Market-Driven Subset Selection for Budgeted Training
Mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations
A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation
TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting
Learning Generalizable Shape Completion with SIM(3) Equivariance
Dolphin v1.0 Technical Report
A Measurement Study of Model Context Protocol Ecosystem
Diffusion Models are Kelly Gamblers
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
Semantic Representation Attack against Aligned Large Language Models
Chiplet-Based RISC-V SoC with Modular AI Acceleration
Accurate and Efficient Low-Rank Model Merging in Core Space
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
Graph Coloring for Multi-Task Learning
Robust LLM Training Infrastructure at ByteDance
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Communications to Circulations: Real-Time 3D Wind Field Prediction Using 5G GNSS Signals and Deep Learning
Why and How Auxiliary Tasks Improve JEPA Representations
Creativity Benchmark: A benchmark for marketing creativity for large language models
SpikingBrain: Spiking Brain-inspired Large Models
Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection
Epistemic Trade-Off: An Analysis of the Operational Breakdown and Ontological Limits of "Certainty-Scope" in AI
ZeST: an LLM ベースの Zero-Shot Traversability Navigation for Unknown Environments
Interpretable Decision-Making for End-to-End Autonomous Driving
A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs
Limitations of Normalization in Attention Mechanism
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance: A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
SegDAC: Improving Visual Reinforcement Learning by Extracting Dynamic Objectc-Centric Representations from Pretrained Vision Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches
CAPO: Towards Enhancing LLM Reasoning through Generative Credit Assignment
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models
SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches
A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification
From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models
ReDi: Rectified Discrete Flow
Adaptive Policy Synchronization for Scalable Reinforcement Learning
From Sequence to Structure: Uncovering Substructure Reasoning in Transformers
Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation
Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
AI-Generated Video Detection via Perceptual Straightening
From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging
Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning
ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM
GeNIE: A Generalizable Navigation System for In-the-Wild Environments
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Code Execution as Grounded Supervision for LLM Reasoning
Denoising the Future: Top-p Distributions for Moving Through Time
HauntAttack: When Attack Follows Reasoning as a Shadow
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
VERINA: Benchmarking Verifiable Code Generation
RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation
The quest for the GRAph Level autoEncoder (GRALE)
Efficient Large Language Model Inference with Neural Block Linearization
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Load more
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Created by
Haebom
作者
Leander Diaz-Bone, Marco Bagatella, Jonas H ubotter, Andreas Krause
DISCOVER: Directed Sparse-Reward Goal-Conditioned RL
概要
DISCOVERは、希少補償強化学習(RL)で非常に複雑な作業をモデル化するために使用される方法論です。目標は、困難な個々のタスクを解決するために関連性が高い、つまり、目標タスクを解決するために必要なスキルを教えるより簡単なタスクを解決することです。 DISCOVER は、既存の RL アルゴリズムから方向性を抽出し、目標作業方向にナビゲーション目標を選択します。これは、バンディット問題の原理的な探索に関連し、目標タスクが達成可能になるまでの時間をエージェントの初期目標までの距離に結び付ける。高次元環境でDISCOVERの性能を評価した結果、従来の最先端RL探索方法では解決できない探索問題を解決することがわかりました。
Takeaways、Limitations
•
Takeaways:
◦
DISCOVERは、希少補償環境で効率的なナビゲーションのための新しいアプローチを提供します。
◦
既存のRLアルゴリズムを活用して、目標作業方向にナビゲートできる方向性を確保します。
◦
バンディット問題との接続を通じて理論的保証を提供します。
◦
高次元環境では、従来の方法より優れた性能を示します。
•
Limitations:
◦
DISCOVERが適用される可能性がある特定の環境とタスクの追加分析が必要です。
◦
実際の世界問題に対するスケーラビリティの研究が必要です。
◦
DISCOVERの効率に影響を与えるハイパーパラメータのチューニングに関する研究が必要です。
PDFを見る
Made with Slashpage