Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Dynaword: From One-shot to Continuously Developed Datasets

Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor

Proof2Hybrid: Automatic Mathematical Benchmark Synthesis for Proof-Centric Problems

Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy

BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability

SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy

Managing Escalation in Off-the-Shelf Large Language Models

FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models

A Foundational Schema.org Mapping for a Legal Knowledge Graph: Representing Brazilian Legal Norms as FRBR Works

D3: Training-Free AI-Generated Video Detection Using Second-Order Features

SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity

Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Memorization in Fine-Tuned Large Language Models

From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation

The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?

Post-Completion Learning for Language Models

Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content

Equivariant Volumetric Grasping

SemiSegECG: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation

FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices

Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark

Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation

TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models

Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs

$\Texttt{Droid}$: A Resource Suite for AI-Generated Code Detection

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Principled Foundations for Preference Optimization

Evaluating LLMs on Real-World Forecasting Against Expert Forecasters

STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking

S2FGL: Spatial Spectral Federated Graph Learning

AI4Research: A Survey of Artificial Intelligence for Scientific Research

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation

Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints

Causally Steered Diffusion for Automated Video Counterfactual Generation

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

ProRefine: Inference-Time Prompt Refinement with Textual Feedback

SALAD: Systematic Assessment of Machine Unlearning on LLM-Aided Hardware Design

MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering

Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

LightRetriever: A LLM-based Hybrid Retrieval Architecture with 1000x Faster Query Inference

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

All-optical temporal integration mediated by subwavelength heat antennas

GRILL: Gradient Signal Restoration in Ill-Conditioned Layers to Enhance Adversarial Attacks on Autoencoders

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

FFCBA: Feature-based Full-target Clean-label Backdoor Attacks

Multilingual Performance Biases of Large Language Models in Education

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis

Efficient Generative Model Training via Embedded Representation Warmup

Graph Attention-Driven Bayesian Deep Unrolling for Dual-Peak Single-Photon Lidar Imaging

Spectral Architecture Search for Neural Network Models

Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

Augmented Adversarial Trigger Learning

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs

A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness

PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset

DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping

Entropy-Lens: The Information Signature of Transformer Computations

CAMEF: Causal-Augmented Multi-Modality Event-Driven Financial Forecasting by Integrating Time Series Patterns and Salient Macroeconomic Announcements

Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach

AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Multilingual Chain-of-Thought

AI-driven Wireless Positioning: Fundamentals, Standards, State-of-the-art, and Challenges

CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models

Average-Reward Soft Actor-Critic

Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation

From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

SANDWICH: Towards an Offline, Differentiable, Fully-Trainable Wireless Neural Ray-Tracing Surrogate

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

Effective AGM Belief Contraction: A Journey beyond the Finitary Realm (Technical Report)

Beyond Images: Adaptive Fusion of Visual and Textual Data for Food Classification

TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks

KCR: Resolving Long-Context Knowledge Conflicts via Reasoning in LLMs

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

CADDesigner: Conceptual Design of CAD Models Based on General-Purpose Agent

Mind the Gap: The Divergence Between Human and LLM-Generated Tasks

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis

The AlphaPhysics Term Rewriting System for Marking Algebraic Expressions in Physics Exams

Modeling Deontic Modal Logic in the s(CASP) Goal-directed Predicate Answer Set Programming System

Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study

The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning

Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments

Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory

UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization

Private GPTs for LLM-driven testing in software development and machine learning

Created by

Haebom

作者

Jakub Jagielski, Consuelo Rojas, Markus Abel

概要

本稿では、個人用GPT（Private GPT）が要件に基づいて実行可能なテストコードを自動的に生成する能力を調査します。具体的には、現代の開発プロセスで一般的に使用されているエピックまたはストーリーの一部として定式化された受け入れ基準を入力として使用して、製品所有者またはビジネスインテリジェンスがLLMを介して直接テスト可能な基準を生成できるようにします。 LLMが要件から直接コードを生成する方法と、Gherkin構文を使用する中間段階を経る2つの方法で生成されたテストの品質を評価します。その結果、2段階の手順は、人間が読みやすく、最高のコーディング慣行（コード行数とテストに一般的に使用される追加のライブラリを使用）の面でより良い結果をもたらすことがわかりました。「Hello World」プログラムと数値分類モデルの2つのシナリオで、プロンプト効果を具体的に評価して、構造化されたプロンプトがより高品質のテスト出力につながることを示しています。

Takeaways、Limitations

•

Takeaways:

◦

LLMを活用して、要件ベースのテストコードを自動生成できることを示します。

◦

Gherkin構文を活用した2段階の手順がテストコードの品質向上に有効であることを示唆。

◦

構造化されたプロンプトがテストコード生成の品質に重要な影響を与えることを確認してください。

◦

製品所有者やビジネスインテリジェンスのテスト基準生成プロセスを簡素化する可能性を提示。

•

Limitations:

◦

評価に使用されるシナリオは限られています（単純な「Hello World」プログラムと数値分類モデル）。

◦

さまざまな種類の要件と複雑なシステムの一般化の可能性に関するさらなる研究が必要です。

◦

Private GPTの特性と制約の具体的な説明の欠如

◦

「より良い結果」の定量的指標の提示の欠如。（単に人間が読みやすく、最高のコーディング慣行であるという主観的な評価に依存）。

Made with Slashpage