Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and their Impact on Power and Cooling

NLKI: A lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commons VQA Tasks

Interact-Custom: Customized Human Object Interaction Image Generation

A Self-Supervised Mixture-of-Experts Framework for Multi-behavior Recommendation

MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation

From Tabula Rasa to Emergent Abilities: Discovering Robot Skills via Real-World Unsupervised Quality-Diversity

Dynamic Triangulation-Based Graph Rewiring for Graph Neural Networks

STDiff: A State Transition Diffusion Framework for Time Series Imputation in Industrial Systems

LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions

Graph-R1: Incentivizing the Zero-Shot Graph Learning Capability in LLMs via Explicit Reasoning

Modality-Specific Speech Enhancement and Noise-Adaptive Fusion for Acoustic and Body-Conduction Microphone Framework

Humans Perceive Wrong Narratives from AI Reasoning Texts

SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning

Pareto Actor-Critic for Communication and Computation Co-Optimization in Non-Cooperative Federated Learning Services

Learning to Drive Ethically: Embedding Moral Reasoning into Autonomous Driving

Generative AI Against Poaching: Latent Composite Flow Matching for Wildlife Conservation

Privacy-Aware Detection of Fake Identity Documents: Methodology, Benchmark, and Improved Algorithms (FakeIDet2)

Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics

Steering Towards Fairness: Mitigating Political Bias in LLMs

Dynamic Context Compression for Efficient RAG

Irredundant $k$-Fold Cross-Validation

Prompt Engineering and the Effectiveness of Large Language Models in Enhancing Human Productivity

A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task

Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs

The Joys of Categorical Conformal Prediction

Adversarial Manipulation of Reasoning Models using Internal Representations

Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models

A Hybrid Artificial Intelligence Method for Estimating Flicker in Power Systems (Changes are marked)

GLProtein: Global-and-Local Structure Aware Protein Representation Learning

Program Semantic Inequivalence Game with Large Language Models

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

Improving Quantization with Post-Training Model Expansion

Safe and Efficient Social Navigation through Explainable Safety Regions Based on Topological Features

A Simple Approach to Constraint-Aware Imitation Learning with Application to Autonomous Racing

Federated nnU-Net for Privacy-Preserving Medical Image Segmentation

ExPath: Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation

Enhancing Automated Loop Invariant Generation for Complex Programs with Large Language Models

RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis

Categorical Data Clustering via Value Order Estimated Distance Metric Learning

Application of AI to formal methods - an analysis of current trends

Reconsidering the Performance of GAE in Link Prediction

See then Tell: Enhancing Key Information Extraction with Vision Grounding

Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

FFHFlow: Diverse and Uncertainty-Aware Dexterous Grasp Generation via Flow Variational Inference

SoAy: A Solution-based LLM API-using Methodology for Academic Information Seeking

Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study

Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

Network Formation and Dynamics Among Multi-LLMs

NetGPT: Generative Pretrained Transformer for Network Traffic

OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

Explainability of Text Processing and Retrieval Methods: A Survey

The Ramon Llull's Thinking Machine for Automated Ideation

RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing

LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence

MSARL: Decoupling Reasoning and Tool Use with Multi-Small-Agent Reinforcement Learning

Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search

Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

Technology as uncharted territory: Contextual integrity and the notion of AI as new ethical ground

Possible Principles for Aligned Structure Learning Agents

OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale

Prompt-to-Product: Generative Assembly via Bimanual Manipulation

OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models

Mixture of Contexts for Long Video Generation

FakeParts: a New Family of AI-Generated DeepFakes

Enabling Equitable Access to Trustworthy Financial Reasoning

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Understanding, Protecting, and Augmenting Human Cognition with Generative AI: A Synthesis of the CHI 2025 Tools for Thought Workshop

Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

ChainReaction! Structured Approach with Causal Chains as Intermediate Representations for Improved and Explainable Causal Video Question Answering

Train-Once Plan-Anywhere Kinodynamic Motion Planning via Diffusion Trees

ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts

WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations

ProactiveEval: A Unified Evaluation Framework for Proactive Dialogue Agents

Research Challenges in Relational Database Management Systems for LLM Queries

Quantum Verifiable Rewards for Post-Training Qiskit Code Assistant

AI Agentic Vulnerability Injection And Transformation with Optimized Reasoning

JADES: A Universal Framework for Jailbreak Assessment via Decompositional Scoring

Learning Primitive Embodied World Models: Towards Scalable Robotic Learning

Multi-Agent Penetration Testing AI for the Web

Uncertainty Aware-Predictive Control Barrier Functions: Safer Human Robot Interaction through Probabilistic Motion Forecasting

Exploring Machine Learning and Language Models for Multimodal Depression Detection

Speech Emotion Recognition via Entropy-Aware Score Selection

Surfel-based 3D Registration with Equivariant SE(3) Features

Evaluating Compositional Generalisation in VLMs and Diffusion Models

Safer Skin Lesion Classification with Global Class Activation Probability Map Evaluation and SafeML

Unleashing Uncertainty: Efficient Machine Unlearning for Generative AI

Signs of Struggle: Spotting Cognitive Distortions across Language and Register

Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection

Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding

SKGE-SWIN: End-To-End Autonomous Vehicle Waypoint Prediction and Navigation Using Skip Stage Swin Transformer

Occlusion Robustness of CLIP for Military Vehicle Classification

SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual Grounding

Provable Benefits of In-Tool Learning for Large Language Models

${C}^{3}$-GS: Learning Context-aware, Cross-dimension, Cross-scale Feature for Generalizable Gaussian Splatting

Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol

EEGDM: Learning EEG Representation with Latent Diffusion Model

Generative Annotation for ASR Named Entity Correction

MobileCLIP2: Improving Multi-Modal Reinforced Training

Task Allocation for Autonomous Machines using Computational Intelligence and Deep Reinforcement Learning

Program Semantic Inequivalence Game with Large Language Models

Created by

Haebom

作者

Antonio Valerio Miceli-Barone, Vaishak Belle, Ali Payani

概要

本稿では、大規模言語モデル（LLM）の複雑なコード推論能力を向上させるための新しい方法を紹介します。 LLMは日常的なコーディング作業では優れたパフォーマンスを示していますが、プログラムの意味についての非汎用的な推論を必要とする複雑な作業では失敗する可能性があります。これらの問題を解決するために、この研究は、意味的不均等ゲーム（SInQ）に基づいてコード推論学習データを合成的に生成する方法を探ります。生成エージェントは、実際のプログラミングジョブデータセットから派生した意味的に区別されるプログラムバリアントを生成し、評価エージェントは、元のプログラムと生成されたバリアントの動作が異なる入力例を識別します。両方のエージェントは半分の敵対的にお互いを学習し、これらの設定は理論的に無限の計算リソースを想定して自己再生を通じて無限に改善できることを証明します。さまざまなコード生成と理解ベンチマーク（多言語脆弱性検出、Python組み込み識別子交換ベンチマークを含む）で実験を通じて提案された方法の効果を検証し、Pythonコードのみで学習したにもかかわらず、C / C ++コードの脆弱性検出を改善し、既存LLMが困難を経験したPython内蔵実験の再現に必要なコードと生成された合成データを公開し、他の研究者がLLMの微調整に活用できるようにしました。

Takeaways、Limitations

•

Takeaways：

◦

セマンティック不均等ゲーム(SInQ)ベースの合成データ生成法を用いたLLMの複雑なコード推論能力の向上の可能性を提示

◦

限られたデータでさえ、多言語とさまざまな種類のコード推論問題に対するパフォーマンスの向上の可能性を示しています。

◦

生成された合成データ開示によるLLM研究の発展に貢献。

◦

セルフプレイベースの継続的なパフォーマンス向上の可能性を提示します。

•

Limitations：

◦

無限の計算資源を想定した理論的証明に対する実際の環境適用可能性の検討

◦

生成された合成データの品質と多様性に関する追加の研究が必要です。

◦

特定のベンチマークのパフォーマンス向上が他のすべての種類のコード推論問題に一般化できることを確認する必要があります。

◦

実際の世界における複雑で多様なコード推論問題に対する一般化性能評価の必要性

Made with Slashpage