/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Sea-ing Through Scattered Rays: Revisiting the Image Formation Model for Realistic Underwater Image Generation
DPANet: Dual Pyramid Attention Network for Multivariate Time Series Forecasting
MeanFlowSE: one-step generative speech enhancement via conditional mean flow
Empathy-R1: A Chain-of-Empathy and Reinforcement Learning Framework for Long-Form Mental Health Support
Threat Modeling for Enhancing Security of IoT Audio Classification Devices under a Secure Protocols Framework
AToken: A Unified Tokenizer for Vision
TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning
Comprehensive Evaluation of CNN-Based Audio Tagging Models on Resource-Constrained Devices
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training
Hardness, Structural Knowledge, and Opportunity: An Analytical Framework for Modular Performance Modeling
Benchmark of stylistic variation in LLM-generated texts
SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints
Structure Matters: Brain Graph Augmentation via Learnable Edge Masking for Data-efficient Psychiatric Diagnosis
DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge
Riemannian Batch Normalization: A Gyro Approach
On the Security of Tool-Invocation Prompts for LLM-Based Agentic Systems: An Empirical Risk Assessment
MIDOG 2025: Mitotic Figure Detection with Attention-Guided False Positive Correction
Do Retrieval Augmented Language Models Know When They Don't Know?
LongCat-Flash Technical Report
MedCOD: Enhancing English-to-Spanish Medical Translation of Large Language Models Using Enriched Chain-of-Dictionary Framework
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning
OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages
Subjective Behaviors and Preferences in LLM: Language of Browsing
Using Natural Language for Human-Robot Collaboration in the Real World
RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding
Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models
VLA-Mark: A cross modal watermark for large vision-language alignment model
Deformable Dynamic Convolution for Accurate yet Efficient Spatio-Temporal Traffic Prediction
Deep Reinforcement Learning with Gradient Eligibility Traces
Generating Moving 3D Soundscapes with Latent Diffusion Models
Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
Discrete Diffusion in Large Language and Multimodal Models: A Survey
DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models
Algorithmic Fairness: Not a Purely Technical but Socio-Technical Property
OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization
Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition
LLMs Can Compensate for Deficiencies in Visual Representations
Spatial Understanding from Videos: Structured Prompts Meet Simulation Data
Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation
Cross-Attention Speculative Decoding
Beyond Linear Steering: Unified Multi-Attribute Control for Language Models
Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert
SEMMA: A Semantic Aware Knowledge Graph Foundation Model
AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection
Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fairness in Recommender Systems
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
A Survey of Large Language Models for Data Challenges in Graphs
CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Creative Preference Optimization
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language
Search and Refine During Think: Facilitating Knowledge Refinement for Improved Retrieval-Augmented Reasoning
Space Group Equivariant Crystal Diffusion
Schreier-Coset Graph Propagation
Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
AttentionDrop: A Novel Regularization Method for Transformer Models
MigGPT: Harnessing Large Language Models for Automated Migration of Out-of-Tree Linux Kernel Patches Across Versions
Hybrid Temporal Differential Consistency Autoencoder for Efficient and Sustainable Anomaly Detection in Cyber-Physical Systems
Who is Responsible When AI Fails? Mapping Causes, Entities, and Consequences of AI Privacy and Ethical Incidents
No Black Box Anymore: Demystifying Clinical Predictive Modeling with Temporal-Feature Cross Attention Mechanism
Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes - Insights from Urban Studies
MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling
Pruning the Paradox: How CLIP's Most Informative Heads Enhance Performance While Amplifying Bias
KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature Analysis
SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data
"It Felt Like I Was Left in the Dark": Exploring Information Needs and Design Opportunities for Family Caregivers of Older Adult Patients in Critical Care Settings
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models
A Layered Multi-Expert Framework for Long-Context Mental Health Assessments
Efficient Real-time Refinement of Language Model Text Generation
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Dynamic Neural Curiosity Enhances Learning Flexibility for Autonomous Goal Discovery
Bayesian Concept Bottleneck Models with LLM Priors
G2D2: Gradient-Guided Discrete Diffusion for Inverse Problem Solving
SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
DiRW: Path-Aware Digraph Learning for Heterophily
Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework
DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition
CrackSCF: Lightweight Cascaded Fusion Network for Robust and Efficient Structural Crack Segmentation
ConfReady: A RAG based Assistant and Dataset for Conference Checklist Responses
FOVAL: Calibration-Free and Subject-Invariant Fixation Depth Estimation Across Diverse Eye-Tracking Datasets
Assessing invariance to affine transformations in image quality metrics
The Great AI Witch Hunt: Reviewers Perception and (Mis)Conception of Generative AI in Research Writing
Database-Augmented Query Representation for Information Retrieval
Two Is Better Than One: Aligned Representation Pairs for Anomaly Detection
BBScoreV2: Learning Time-Evolution and Latent Alignment from Stochastic Representation
Beyond Pixels: Enhancing LIME with Hierarchical Features and Segmentation Foundation Models
Spatio-Temporal Anomaly Detection with Graph Networks for Data Quality Monitoring of the Hadron Calorimeter
Understanding AI Evaluation Patterns: How Different GPT Models Assess Vision-Language Descriptions
Online Robust Planning under Model Uncertainty: A Sample-Based Approach
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?
Load more
SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
Created by
Haebom
作者
Yuzhou Nie, Zhun Wang, Yu Yang, Ruizhe Jiang, Yuheng Tang, Xander Davies, Yarin Gal, Bo Li, Wenbo Guo, Dawn Song
概要
この論文は、コード生成大規模言語モデル(LLM)のセキュリティリスクと機能(脆弱性の検出など)を評価するための既存のベンチマークのLimitations(制限されたリスクと機能の範囲、静的評価指標への依存、データ品質とベンチマーク規模との間の矛盾を拡張するための拡張可能な高品質シードの例)ベンチマーク構成フレームワークを提示します。このアプローチは、動的指標を使用した包括的なリスク評価とセキュリティ機能評価をサポートするための包括的な成果物のコレクションを提供します。専門家の洞察と自動生成を組み合わせて、手動作業、データ品質、ベンチマーク規模のバランスをとります。 Python、C / C ++、Javaにこのフレームワークを適用して、44のCWEベースのリスクカテゴリと3つのセキュリティ機能を含む5,900以上のサンプルで構成されたSeCodePLTデータセットを構築します。 SeCodePLTは、既存の最先端のベンチマークと比較して、より広い範囲、より高いデータ忠実度、およびかなり大きな規模を提供し、それを使用して主要なコードLLMおよびエージェントを評価し、安全なコード生成および脆弱性の識別または修正における強みと弱点を実証します。
Takeaways、Limitations
•
Takeaways:
◦
コード生成LLMのセキュリティリスクと機能評価のためのより包括的で拡張可能で正確なベンチマークフレームワークを提供します。
◦
既存のベンチマークのLimitationsを克服し、より洗練された実用的な評価を可能にする。
◦
SeCodePLTデータセットは、さまざまなプログラミング言語とリスクカテゴリを含む幅広い研究開発に活用できます。
◦
主なコードLLMとエージェントのセキュリティパフォーマンスの詳細な分析結果を提供します。
•
Limitations:
◦
フレームワークの一般化の可能性は、他のプログラミング言語またはセキュリティ領域に拡張するときに追加の検証が必要です。
◦
手動検証に依存するため、ベンチマークの拡張性に制限がある可能性があります。
◦
新しい脅威や脆弱性への適応性を維持するために、継続的な更新が必要です。
◦
評価対象のLLMとエージェントの種類とバージョンによって結果が異なる場合があります。
PDFを見る
Made with Slashpage