/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization
Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models
Multi-modal Relational Item Representation Learning for Inferring Substitutable and Complementary Items
SourceSplice: Source Selection for Machine Learning Tasks
OneShield - the Next Generation of LLM Guardrails
RecPS: Privacy Risk Scoring for Recommender Systems
HuiduRep: A Robust Self-Supervised Framework for Learning Neural Representations from Extracellular Recordings
Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain
Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback
A Segmented Robot Grasping Perception Neural Network for Edge AI
Binarizing Physics-Inspired GNNs for Combinatorial Optimization
Disentangling Neural Disjunctive Normal Form Models
The Second Machine Turn: From Checking Proofs to Creating Concepts
EmissionNet: Air Quality Pollution Forecasting for Agriculture
Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations
Evaluating LLMs on Real-World Forecasting Against Human Superforecasters
Sign Spotting Disambiguation using Large Language Models
RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
Discovering the underlying analytic structure within Standard Model constants using artificial intelligence
MR-CLIP: Efficient Metadata-Guided Learning of MRI Contrast Representations
Curious Causality-Seeking Agents Learn Meta Causal World
Theoretically Unmasking Inference Attacks Against LDP-Protected Clients in Federated Vision Models
Private GPTs for LLM-driven testing in software development and machine learning
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora
Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Credible Plan-Driven RAG Method for Multi-Hop Question Answering
Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories
E2E Parking Dataset: An Open Benchmark for End-to-End Autonomous Parking
Dominated Actions in Imperfect-Information Games
FakeIDet: Exploring Patches for Privacy-Preserving Fake ID Detection
Simultaneous Motion And Noise Estimation with Event Cameras
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Novice Developers' Perspectives on Adopting LLMs for Software Development: A Systematic Literature Review
ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning
A Survey on Post-training of Large Language Models
Do Large Language Models Know How Much They Know?
Better Embeddings with Coupled Adam
Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks
An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage
Embracing Large Language Models in Traffic Flow Forecasting
A Large Sensor Foundation Model Pretrained on Continuous Glucose Monitor Data for Diabetes Management
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Un-mixing Test-time Adaptation under Heterogeneous Data Streams
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
Cobblestone: Iterative Automation for Formal Verification
Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots
Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors
AttnMod: Attention-Based New Art Styles
Loss Landscape Degeneracy and Stagewise Development in Transformers
Tackling Size Generalization of Graph Neural Networks on Biological Data from a Spectral Perspective
Gradient Leakage Defense with Key-Lock Module for Federated Learning
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Semantic Chain-of-Trust: Autonomous Trust Orchestration for Collaborator Selection via Hypergraph-Aided Agentic AI
How Far Are AI Scientists from Changing the World?
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
EARTH: Structuring Creative Evolution through Model Error in Generative AI
On Gradual Semantics for Assumption-Based Argumentation
Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations
Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team
ORFS-agent: Tool-Using Agents for Chip Design Optimization
World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks
The Urban Impact of AI: Modeling Feedback Loops in Next-Venue Recommendation
BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking
OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problems with Reasoning LLM
Causal Explanations for Image Classifiers
BCR-DRL: Behavior- and Context-aware Reward for Deep Reinforcement Learning in Human-AI Coordination
Federated Cross-Training Learners for Robust Generalization under Data Heterogeneity
Identifying Unique Spatial-Temporal Bayesian Network without Markov Equivalence
Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models
SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation
Sample-Aware Test-Time Adaptation for Medical Image-to-Image Translation
MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations
A Simple and Effective Method for Uncertainty Quantification and OOD Detection
Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking
Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos
Agentic large language models improve retrieval-based radiology question answering
Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data
How LLMs are Shaping the Future of Virtual Reality
Adaptive Machine Learning-Driven Multi-Fidelity Stratified Sampling for Failure Analysis of Nonlinear Stochastic Systems
Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and Context-Aware KGQA
Nested Graph Pseudo-Label Refinement for Noisy Label Domain Adaptation Learning
JSON-Bag: A generic game trajectory representation
NyayaRAG: Realistic Legal Judgment Prediction with RAG under the Indian Common Law System
Efficient Solution and Learning of Robust Factored MDPs
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
Segment First, Retrieve Better: Realistic Legal Search via Rhetorical Role-Based Queries
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Advancing Quantum Information Science Pre-College Education: The Case for Learning Sciences Collaboration
Backdoor Attacks on Deep Learning Face Detection
Similarity-Based Self-Construct Graph Model for Predicting Patient Criticalness Using Graph Neural Networks and EHR Data
Prompting Science Report 3: I'll pay you or I'll kill you -- but will you care?
Composable OS Kernel Architectures for Autonomous Intelligence
LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks
Wukong Framework for Not Safe For Work Detection in Text-to-Image systems
OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery
Load more
Towards Formal Verification of LLM-Generated Code from Natural Language Prompts
Created by
Haebom
作者
Aaron Councilman, David Fu, Aryan Gupta, Chengxiao Wang, David Grove, Yu-Xiong Wang, Vikram Adve
概要
本論文では、自然言語記述に基づいてコードを生成する大規模言語モデル(LLM)のエラー問題を解決するために、形式的なクエリ言語を導入してユーザーの意図を明確にし、生成されたコードの正確性を検証するシステムであるAstrogatorを提案します。 AstrogatorはAnsibleプログラミング言語を対象としており、形式的なクエリ言語、Ansibleプログラムの動作を表す計算法、および検証に使用されるシンボリックインタプリタで構成されています。 21のコード生成タスクベンチマークでは、正確なコードが83%の場合に検証され、誤ったコードが92%の場合に識別されました。
Takeaways、Limitations
•
Takeaways:
◦
LLMベースのコード生成の精度を高めるための新しいアプローチを提示します。
◦
形式的な検証により、ユーザーの意図と生成されたコードの一致が確認できます。
◦
プログラミング知識が不足しているユーザーも自然言語プログラミングを可能にする可能性。
◦
Ansibleなどの特定の言語のコード生成と検証の効率の向上
•
Limitations:
◦
AstrogatorはAnsible言語に特化しており、他のプログラミング言語への拡張性が限られている可能性があります。
◦
ベンチマークの規模が比較的小さく、一般化に関するさらなる研究が必要です。
◦
形式的なクエリ言語の使いやすさとユーザー学習コストの考慮が必要です。
◦
あらゆる種類のコードエラーを完全に検出できない可能性があります(精度83%、92%は完全な精度を意味しません)。
PDFを見る
Made with Slashpage