Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

Training language models to be warm and empathetic makes them less reliable and more sycophantic

Decision Transformer-Based Drone Trajectory Planning with Dynamic Safety-Efficiency Trade-Offs

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation

A ChatGPT-based approach for questions generation in higher education

ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions

SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

Policy-Driven AI in Dataspaces: Taxonomy, Explainability, and Pathways for Compliant Innovation

Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast Attention

RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection

TextSAM-EUS: Text Prompt Learning for SAM to Accurately Segment Pancreatic Tumor in Endoscopic Ultrasound

Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Probing EFX via PMMS: (Non-)Existence Results in Discrete Fair Division

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation

Can GPT-4o mini and Gemini 2.0 Flash Predict Fine-Grained Fashion Product Attributes? A Zero-Shot Analysis

Scaling RL to Long Videos

Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis

Mitigating loss of variance in ensemble data assimilation: machine learning-based and distance-free localization

LLM-as-a-qualitative-judge: automating error analysis in natural language generation

Equivariant Flow Matching for Point Cloud Assembly

The challenge of hidden gifts in multi-agent reinforcement learning

Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation

Outcome-based Reinforcement Learning to Predict the Future

Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Anti-Inpainting: A Proactive Defense Approach against Malicious Diffusion-based Inpainters under Unknown Conditions

Will AI Take My Job? Evolving Perceptions of Automation and Labor Risk in Latin America

Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization

D\'ej\`a Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation

OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

GneissWeb: Preparing High Quality Data for LLMs at Scale

Year-over-Year Developments in Financial Fraud Detection via Deep Learning: A Systematic Literature Review

Unsupervised Learning in Echo State Networks for Input Reconstruction

Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis

Rationale-guided Prompting for Knowledge-based Visual Question Answering

Modeling Story Expectations to Understand Engagement: A Generative Framework Using LLMs

FastTrackTr:Towards Fast Multi-Object Tracking with Transformers

Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy

StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training

Neutral Residues: Revisiting Adapters for Model Extension

Past Meets Present: Creating Historical Analogy with Large Language Models

Automated Prompt Engineering for Cost-Effective Code Generation Using Evolutionary Algorithm

Strategic Integration of Artificial Intelligence in the C-Suite: The Role of the Chief AI Officer

The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns

NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

Bridging Privacy and Robustness for Trustworthy Machine Learning

Towards the Law of Capacity Gap in Distilling Language Models

Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline

Multi-Representation Diagrams for Pain Recognition: Integrating Various Electrodermal Activity Signals into a Single Image

Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis

MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors

DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework

ST-GDance: Long-Term and Collision-Free Group Choreography from Music

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Multi-Agent Reinforcement Learning for Dynamic Mobility Resource Allocation with Hierarchical Adaptive Grouping

HypKG: Hypergraph-based Knowledge Graph Contextualization for Precision Healthcare

The wall confronting large language models

Clustering via Self-Supervised Diffusion

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

Subgoal-Guided Policy Heuristic Search with Learned Subgoals

Don't Lag, RAG: Training-Free Adversarial Detection Using RAG

AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence

Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

CollabLLM: From Passive Responders to Active Collaborators

A Survey on Large Language Model Acceleration based on KV Cache Management

Can adversarial attacks by large language models be attributed?

Learning Neural Strategy-Proof Matching Mechanism from Examples

KIX: A Knowledge and Interaction-Centric Metacognitive Framework for Task Generalization

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning

A Bit of Freedom Goes a Long Way: Classical and Quantum Algorithms for Reinforcement Learning under a Generative Model

Repair-R1: Better Test Before Repair

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Advancing Fetal Ultrasound Image Quality Assessment in Low-Resource Settings

G-Core: A Simple, Scalable and Balanced RLHF Trainer

Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection

Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization

Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models

Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision

Reducing Hallucinations in Summarization via Reinforcement Learning with Entity Hallucination Index

OFCnetLLM: Large Language Model for Network Monitoring and Alertness

Bifr\"{o}st: Spatial Networking with Bigraphs

Hydra-Bench: A Benchmark for Multi-Modal Leaf Wetness Sensing

Designing for Self-Regulation in Informal Programming Learning: Insights from a Storytelling-Centric Approach

RobEthiChor: Automated Context-aware Ethics-based Negotiation for Autonomous Robots

A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models

Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction

H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity

LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing

Adaptive Duration Model for Text Speech Alignment

Saffron-1: Safety Inference Scaling

Created by

Haebom

저자

Ruizhong Qiu, Gaotang Li, Tianxin Wei, Jingrui He, Hanghang Tong

개요

기존의 LLM 안전성 연구는 주로 훈련 단계에서 안전한 행동을 주입하는 데 초점을 맞추었지만, 최근 연구에 따르면 이러한 방법들은 다양한 탈옥 공격에 취약한 것으로 나타났습니다. 동시에 추론 확장(inference scaling)은 LLM 추론 능력을 크게 향상시켰지만, 안전성 보장 측면에서는 아직 연구되지 않았습니다. 본 연구는 새롭게 등장하는 위협에 대한 강력하고 효과적인 LLM 안전성을 위해 추론 확장을 개척합니다. 기존의 추론 확장 기법은 추론 작업에서는 성공적이지만, 안전성 맥락에서는 성능이 저조하며 Best-of-N 샘플링과 같은 기본적인 접근 방식보다도 못하다는 것을 밝혔습니다. 이러한 비효율성은 프로세스 보상 모델(PRM) 평가와 관련된 높은 계산 오버헤드로 인해 발생하는 탐색-효율성 딜레마라는 새롭게 확인된 과제 때문입니다. 이러한 딜레마를 극복하기 위해, 본 연구는 안전성 보장을 위해 특별히 고안된 새로운 추론 확장 패러다임인 SAFFRON을 제안합니다. 본 접근 방식의 핵심은 보상 모델 평가 횟수를 크게 줄이는 다중 분기 보상 모델(MRM)을 도입하는 것입니다. 이 패러다임을 실현하기 위해 (i) MRM에 대한 부분 감독 훈련 목표, (ii) 분포 외 탐색을 방지하기 위한 보수적인 탐색 제약, (iii) 트리 탐색 중 시퀀스 간 캐시 공유를 용이하게 하는 Trie 기반 키-값 캐싱 전략을 추가로 제안합니다. 광범위한 실험을 통해 본 방법의 효과를 검증했습니다. 또한, 향후 LLM 안전성 연구를 가속화하기 위해 훈련된 다중 분기 보상 모델(Saffron-1)과 함께하는 토큰 수준 안전 보상 데이터 세트(Safety4M)를 공개합니다. 코드, 모델 및 데이터는 https://github.com/q-rz/saffron 에서 이용 가능하며, 프로젝트 홈페이지는 https://q-rz.github.io/p/saffron 입니다.

SAFFRON-1: Inference Scaling for LLM Safety Assurance | Ruizhong Qiu

GitHub - q-rz/saffron: Inference scaling for LLM safety assurance

Inference scaling for LLM safety assurance. Contribute to q-rz/saffron development by creating an account on GitHub.

시사점, 한계점

•

시사점:

◦

기존 LLM 안전성 연구의 한계를 극복하고 추론 확장을 통한 새로운 안전성 향상 방안 제시.

◦

탐색-효율성 딜레마라는 새로운 과제를 제기하고 해결 방안(SAFFRON) 제시.

◦

다중 분기 보상 모델(MRM) 및 부분 감독 훈련, 보수적인 탐색 제약, Trie 기반 캐싱 전략 등의 혁신적인 기법 제시.

◦

Saffron-1 모델과 Safety4M 데이터셋 공개를 통한 후속 연구 지원.

•

한계점:

◦

SAFFRON의 효과는 특정 데이터셋과 모델에 대한 실험 결과에 기반하며, 다른 환경에서의 일반화 성능은 추가 연구가 필요함.

◦

다중 분기 보상 모델의 설계 및 훈련 과정이 복잡하여, 구현 및 활용에 어려움이 있을 수 있음.

◦

탈옥 공격의 다양성을 완전히 고려하지 못했을 가능성. 새로운 유형의 공격에 대한 취약성 존재 가능성.

Made with Slashpage