haebom
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Language Models are Injective and Hence Invertible
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Latent Diffusion Model without Variational Autoencoder
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
Beyond One World: Benchmarking Super Heros in Role-Playing Across Multiversal Contexts
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
Max It or Miss It: Benchmarking LLM On Solving Extremal Problems
Phenome-Wide Multi-Omics Integration Uncovers Distinct Archetypes of Human Aging
When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
A Vision for Access Control in LLM-based Agent Systems
Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models
Formally Verified Certification of Unsolvability of Temporal Planning Problems
DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction
MSDM: Generating Task-Specific Pathology Images with a Multimodal Conditioned Diffusion Model for Cell and Nuclei Segmentation
Synthetic Series-Symbol Data Generation for Time Series Foundation Models
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy
A New Digital Divide? Coder Worldviews, the Slop Economy, and Democracy in the Age of AI
Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs
Creative synthesis of kinematic mechanisms
Market-Driven Subset Selection for Budgeted Training
Mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations
A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation
TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting
Learning Generalizable Shape Completion with SIM(3) Equivariance
Dolphin v1.0 Technical Report
A Measurement Study of Model Context Protocol Ecosystem
Diffusion Models are Kelly Gamblers
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
Semantic Representation Attack against Aligned Large Language Models
Chiplet-Based RISC-V SoC with Modular AI Acceleration
Accurate and Efficient Low-Rank Model Merging in Core Space
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
Graph Coloring for Multi-Task Learning
Robust LLM Training Infrastructure at ByteDance
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Communications to Circulations: Real-Time 3D Wind Field Prediction Using 5G GNSS Signals and Deep Learning
Why and How Auxiliary Tasks Improve JEPA Representations
Creativity Benchmark: A benchmark for marketing creativity for large language models
SpikingBrain: Spiking Brain-inspired Large Models
Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection
Epistemic Trade-Off: An Analysis of the Operational Breakdown and Ontological Limits of "Certainty-Scope" in AI
ZeST: an LLM ベースの Zero-Shot Traversability Navigation for Unknown Environments
Interpretable Decision-Making for End-to-End Autonomous Driving
A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs
Limitations of Normalization in Attention Mechanism
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance: A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
SegDAC: Improving Visual Reinforcement Learning by Extracting Dynamic Objectc-Centric Representations from Pretrained Vision Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches
CAPO: Towards Enhancing LLM Reasoning through Generative Credit Assignment
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models
SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches
A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification
From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models
ReDi: Rectified Discrete Flow
Adaptive Policy Synchronization for Scalable Reinforcement Learning
From Sequence to Structure: Uncovering Substructure Reasoning in Transformers
Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation
Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
AI-Generated Video Detection via Perceptual Straightening
From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging
Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning
ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM
GeNIE: A Generalizable Navigation System for In-the-Wild Environments
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Code Execution as Grounded Supervision for LLM Reasoning
Denoising the Future: Top-p Distributions for Moving Through Time
HauntAttack: When Attack Follows Reasoning as a Shadow
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
VERINA: Benchmarking Verifiable Code Generation
RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation
The quest for the GRAph Level autoEncoder (GRALE)
Efficient Large Language Model Inference with Neural Block Linearization
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Load more
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Created by
Haebom
作者
Jane Luo, Xin Zhang, Steven Liu, Jie Wu, Jianfeng Liu, Yiming Huang, Yangyu Huang, Chengyu Yin, Ying Xin, Yuefeng Zhan, Hao Sun, Qi Chen, Scarlett Li, Mao Yang
概要
大規模言語モデル(LLM)はコード生成に優れていますが、リポジトリ全体を最初から生成するのは困難です。本研究では、高水準仕様から一貫したソフトウェアシステムを構築するためにリポジトリプランニンググラフ(RPG)を導入する。 RPGは、機能、ファイル構造、データフロー、および関数を統合グラフにエンコードし、リポジトリを生成するための一貫した長期計画を可能にします。 ZeroRepoというRPGベースのフレームワークを開発し、提案、実装、およびテスト検証を通じてコードを生成します。 RepoCraftベンチマークにより、ZeroRepoは既存のモデルと比較して大幅に向上したコード生成量とテスト精度を達成しました。
Takeaways、Limitations
•
Takeaways:
◦
構造化された表現であるRPGを活用して、リポジトリ作成の計画能力を向上させる。
◦
ZeroRepoフレームワークを通じて、実際のプロジェクトベンチマークで優れたパフォーマンスを実証。
◦
RPGの複雑な依存性モデリングと拡張性の確保
◦
エージェントのリポジトリ理解度の向上とトラブルシューティング時間の短縮
•
Limitations:
◦
本稿ではLimitationsへの直接的な言及はありません。
PDFを見る
Made with Slashpage