Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

Saturation Self-Organizing Map

PiPViT: Patch-based Visual Interpretable Prototypes for Retinal Image Analysis

Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications

Intra-Trajectory Consistency for Reward Modeling

Foundation Models in Medical Imaging -- A Review and Outlook

Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data

Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces

Improving Large Language Models with Concept-Aware Fine-Tuning

Consistent Video Editing as Flow-Driven Image-to-Video Generation

Evidential Spectrum-Aware Contrastive Learning for OOD Detection in Dynamic Graphs

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

DURA-CPS: A Multi-Role Orchestrator for Dependability Assurance in LLM-Enabled Cyber-Physical Systems

Cartridges: Lightweight and general-purpose long context representations via self-study

Fast-DataShapley: Neural Modeling for Training Data Valuation

NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks

Improving the Calibration of Confidence Scores in Text Generation Using the Output Distribution's Characteristics

Training RL Agents for Multi-Objective Network Defense Tasks

Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations

RSCF: Relation-Semantics Consistent Filter for Entity Embedding of Knowledge Graph

MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English

Can reasoning models comprehend mathematical problems in Chinese ancient texts? An empirical study based on data from Suanjing Shishu

Table-R1: Region-based Reinforcement Learning for Table Understanding

Spatiotemporal Field Generation Based on Hybrid Mamba-Transformer with Physics-informed Fine-tuning

Graph-Based Floor Separation Using Node Embeddings and Clustering of WiFi Trajectories

Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

Quantitative Analysis of Performance Drop in DeepSeek Model Quantization

PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services

Towards Personalized Conversational Sales Agents: Contextual User Profiling for Strategic Action

PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices

Beyond the Visible: Multispectral Vision-Language Learning for Earth Observation

Ensemble Knowledge Distillation for Machine Learning Interatomic Potentials

Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts

V-Max: A Reinforcement Learning Framework for Autonomous Driving

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts

Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior

MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

Understanding the Emergence of Multimodal Representation Alignment

Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization

Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis

TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages

BalanceBenchmark: A Survey for Multimodal Imbalance Learning

AB-UPT: Scaling Neural CFD Surrogates for High-Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics Transformers

Vision-Language Models for Edge Networks: A Comprehensive Survey

Foundation Models for Anomaly Detection: Vision and Challenges

Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search

Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation

Evaluating Sample Utility for Efficient Data Selection by Mimicking Model Weights

Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning

Bi-directional Mapping of Morphology Metrics and 3D City Blocks for Enhanced Characterization and Generation of Urban Form

One Diffusion to Generate Them All

Entropy Controllable Direct Preference Optimization

TrajAgent: An LLM-based Agent Framework for Automated Trajectory Modeling via Collaboration of Large and Small Models

Control Industrial Automation System with Large Language Model Agents

How Well Do Large Language Models Serve as End-to-End Secure Code Agents for Python?

Self-interpreting Adversarial Images

Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

Ad Auctions for LLMs via Retrieval Augmented Generation

Dynamic and Adaptive Feature Generation with LLM

MiniMaxAD: A Lightweight Autoencoder for Feature-Rich Anomaly Detection

PLeak: Prompt Leaking Attacks against Large Language Model Applications

HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images

DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

Evolution Guided Generative Flow Networks

Manipulating Feature Visualizations with Gradient Slingshots

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Agent Semantics, Semantic Spacetime, and Graphical Reasoning

Decomposability-Guaranteed Cooperative Coevolution for Large-Scale Itinerary Planning

DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDEs

Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models

The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints

Epistemic Artificial Intelligence is Essential for Machine Learning Models to Truly `Know When They Do Not Know'

Enhancing multimodal analogical reasoning with Logic Augmented Generation

Beating Transformers using Synthetic Cognition

HypRL: Reinforcement Learning of Control Policies for Hyperproperties

From Idea to Implementation: Evaluating the Influence of Large Language Models in Software Development -- An Opinion Paper

Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines

Dynamic Policy Fusion for User Alignment Without Re-Interaction

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing

An overview of domain-specific foundation model: key technologies, applications and challenges

Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery

Combining Deep Reinforcement Learning and Search with Generative Models for Game-Theoretic Opponent Modeling

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

code_transformed: The Influence of Large Language Models on Code

Reimagining Dance: Real-time Music Co-creation between Dancers and AI

Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?

VGR: Visual Grounded Reasoning

Technical Evaluation of a Disruptive Approach in Homomorphic AI

SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies

Subjective Experience in AI Systems: What Do AI Researchers and the Public Believe?

Today's Cat Is Tomorrow's Dog: Accounting for Time-Based Changes in the Labels of ML Vulnerability Detection Approaches

How Well Do Large Language Models Serve as End-to-End Secure Code Agents for Python?

Created by

Haebom

저자

Jianian Gong, Nachuan Duan, Ziheng Tao, Zhaohui Gong, Yuan Yuan, Minlie Huang

개요

본 논문은 GPT-3.5와 GPT-4를 포함한 대규모 언어 모델(LLM)이 안전한 코드를 생성하는 능력에 대한 체계적인 조사를 제시합니다. 4개의 인기 있는 LLM(GPT-3.5, GPT-4, Code Llama, CodeGeeX2)이 생성한 4,900개의 코드를 분석하여 LLM의 취약점 식별 및 수정 능력을 평가했습니다. 연구 결과, LLM은 상황 관련 보안 위험에 대한 인식이 부족하여 SecurityEval 벤치마크에서 75% 이상의 취약한 코드를 생성하며, 자체 생성 코드의 취약점을 정확하게 식별하지 못하는 것으로 나타났습니다. GPT-3.5와 GPT-4는 다른 LLM이 생성한 불안전한 코드를 33.2%59.6%의 성공률로 수정했지만, 자체 생성 코드 수정에서는 성능이 저조했습니다. 이러한 한계를 해결하기 위해, 본 논문에서는 반복적인 수정 절차를 기반으로 LLM이 더 안전한 소스 코드를 생성하도록 돕는 경량 도구를 개발했습니다. 이 도구는 의미 분석 엔진의 지원을 받아 수정 성공률을 65.9%85.5%로 크게 향상시켰습니다.

시사점, 한계점

•

시사점:

◦

LLM이 안전한 코드 생성에 사용될 가능성을 확인하고, 그 한계를 명확히 제시.

◦

LLM의 자기 수정 능력의 부족과 이를 개선하기 위한 반복적 수정 도구의 효용성을 증명.

◦

의미 분석 엔진과의 결합을 통해 LLM의 코드 수정 성공률을 향상시킬 수 있음을 보여줌.

•

한계점:

◦

연구에 사용된 LLM과 벤치마크의 제한으로 일반화에 어려움이 있을 수 있음.

◦

개발된 경량 도구의 성능은 의미 분석 엔진의 성능에 의존적임.

◦

실제 소프트웨어 개발 환경에서의 적용 가능성에 대한 추가 연구 필요.

◦

다양한 종류의 취약점에 대한 포괄적인 분석이 부족할 수 있음.

Made with Slashpage