/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Dynaword: From One-shot to Continuously Developed Datasets
Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor
Proof2Hybrid: Automatic Mathematical Benchmark Synthesis for Proof-Centric Problems
Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy
BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability
SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy
Managing Escalation in Off-the-Shelf Large Language Models
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models
A Foundational Schema.org Mapping for a Legal Knowledge Graph: Representing Brazilian Legal Norms as FRBR Works
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity
Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints
MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation
Memorization in Fine-Tuned Large Language Models
From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation
The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?
Post-Completion Learning for Language Models
Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content
Equivariant Volumetric Grasping
SemiSegECG: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation
FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation
TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models
Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs
$\Texttt{Droid}$: A Resource Suite for AI-Generated Code Detection
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
Principled Foundations for Preference Optimization
Evaluating LLMs on Real-World Forecasting Against Expert Forecasters
STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking
S2FGL: Spatial Spectral Federated Graph Learning
AI4Research: A Survey of Artificial Intelligence for Scientific Research
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation
Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints
Causally Steered Diffusion for Automated Video Counterfactual Generation
What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
ProRefine: Inference-Time Prompt Refinement with Textual Feedback
SALAD: Systematic Assessment of Machine Unlearning on LLM-Aided Hardware Design
MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
LightRetriever: A LLM-based Hybrid Retrieval Architecture with 1000x Faster Query Inference
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
All-optical temporal integration mediated by subwavelength heat antennas
GRILL: Gradient Signal Restoration in Ill-Conditioned Layers to Enhance Adversarial Attacks on Autoencoders
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
FFCBA: Feature-based Full-target Clean-label Backdoor Attacks
Multilingual Performance Biases of Large Language Models in Education
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis
Efficient Generative Model Training via Embedded Representation Warmup
Graph Attention-Driven Bayesian Deep Unrolling for Dual-Peak Single-Photon Lidar Imaging
Spectral Architecture Search for Neural Network Models
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Augmented Adversarial Trigger Learning
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs
A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness
PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping
Entropy-Lens: The Information Signature of Transformer Computations
CAMEF: Causal-Augmented Multi-Modality Event-Driven Financial Forecasting by Integrating Time Series Patterns and Salient Macroeconomic Announcements
Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach
AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Multilingual Chain-of-Thought
AI-driven Wireless Positioning: Fundamentals, Standards, State-of-the-art, and Challenges
CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models
Average-Reward Soft Actor-Critic
Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation
From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
SANDWICH: Towards an Offline, Differentiable, Fully-Trainable Wireless Neural Ray-Tracing Surrogate
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification
Effective AGM Belief Contraction: A Journey beyond the Finitary Realm (Technical Report)
Beyond Images: Adaptive Fusion of Visual and Textual Data for Food Classification
TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks
KCR: Resolving Long-Context Knowledge Conflicts via Reasoning in LLMs
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
CADDesigner: Conceptual Design of CAD Models Based on General-Purpose Agent
Mind the Gap: The Divergence Between Human and LLM-Generated Tasks
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power
Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis
The AlphaPhysics Term Rewriting System for Marking Algebraic Expressions in Physics Exams
Modeling Deontic Modal Logic in the s(CASP) Goal-directed Predicate Answer Set Programming System
Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study
The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning
Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments
Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory
UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
Load more
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Created by
Haebom
作者
Rui Melo, Claudia Mamede, Andre Catarino, Rui Abreu, Henrique Lopes Cardoso
概要
この論文は、バッファオーバーフローやSQLインジェクションなどのソフトウェア脆弱性検出の既存の方法の限界(高い誤検出率、スケーラビリティの問題、手動操作依存性)を指摘し、AIベースのアプローチに関心を持ちます。具体的には、解釈の可能性と展開の難しさを克服するために、軽量化された解釈可能な代替案としてSparse Autoencoder(SAE)を提示します。 GPT-2 SmallとGemma 2Bから生成された表現にSAEを適用してJava関数のバグ検出を評価し、従来のファインチューニングされたトランスフォーマーベースのモデルよりも優れたパフォーマンス(最大89%のF1スコア)を達成することを示しています。これは、事前に訓練されたLLMの内部表現から、fine-tuningや特定のタスクの監督なしでSAEがソフトウェアのバグを検出できることを実証的に示す最初の研究です。ソースコードはFitHubで公開されています。
Takeaways、Limitations
•
Takeaways:
◦
事前訓練されたLLMの内部表現を利用してソフトウェアのバグを検出するための新しい方法を提示します。
◦
SAEを用いた軽量化と解釈可能なバグ検知モデルの構築可能性の提示
◦
Fine-tuningなしで高い性能(最大89%F1スコア)を達成。
◦
既存のAIベースの脆弱性検出方法の限界を克服するための貢献
•
Limitations:
◦
Java関数の評価のみが行われ、他のプログラミング言語への一般化の可能性に関するさらなる研究が必要です。
◦
使用されるLLMの種類に応じて、パフォーマンスの違いと一般化の可能性に関するさらなる分析が必要です。
◦
実際の環境での適用性と拡張性に関するさらなる研究が必要
◦
SAE の解釈可能性のより詳細な分析と説明が必要になる場合があります。
PDFを見る
Made with Slashpage