Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Exploring Diffusion Transformer Designs via Grafting

Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams

ECoRAG: Evidentiality-guided Compression for Long Context RAG

Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective

Does It Make Sense to Speak of Introspection in Large Language Models?

Sparse Autoencoders, Again?

Feature-Based Lie Group Transformer for Real-World Applications

TracLLM: A Generic Framework for Attributing Long Context LLMs

SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels

Labelling Data with Unknown References

Tug-of-war between idiom's figurative and literal meanings in LLMs

State-Covering Trajectory Stitching for Diffusion Planners

Deep Learning Weather Models for Subregional Ocean Forecasting: A Case Study on the Canary Current Upwelling System

DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

An Uncertainty-Aware ED-LSTM for Probabilistic Suffix Prediction

SageAttention2++: A More Efficient Implementation of SageAttention2

Autocomp: LLM-Driven Code Optimization for Tensor Accelerators

IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis

Decoupling Representation and Learning in Genetic Programming: the LaSER Approach

Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)

Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models

How can Diffusion Models Evolve into Continual Generators?

Open Your Eyes: Vision Enhances Message Passing Neural Networks in Link Prediction

m-KAILIN: Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training

FinSage: A Multi-aspect RAG System for Financial Filings Question Answering

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models

Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP

GENIUS: A Generative Framework for Universal Multimodal Search

TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

ARMOR: Empowering Multimodal Understanding Model with Interleaved Multimodal Generation Capability

A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models

Knowledge Retention for Continual Model-Based Reinforcement Learning

Adversarial Tokenization

UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

SAGE: A Framework of Precise Retrieval for RAG

Graph Attention Networks Unleashed: A Fast and Explainable Vulnerability Assessment Framework for Microgrids

SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models

Improving Customer Service with Automatic Topic Detection in User Emails

DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model

Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference

MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models

A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models

LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

Maximum Entropy Reinforcement Learning with Diffusion Policy

TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking

Relational Conformal Prediction for Correlated Time Series

RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models

UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control

The Complexity of Learning Sparse Superposed Features with Feedback

Peri-LN: Revisiting Normalization Layer in the Transformer Architecture

Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus

ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis

FDLLM: A Dedicated Detector for Black-Box LLMs Fingerprinting

The Bakers and Millers Game with Restricted Locations

Diving into Self-Evolving Training for Multimodal Reasoning

Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation

A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks

CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data

Understanding Memorization in Generative Models via Sharpness in Probability Landscapes

A Cognac shot to forget bad memories: Corrective Unlearning in GNNs

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Audio-Language Models

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

The Impact of Inference Acceleration on Bias of LLMs

pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2

Simmering: Sufficient is better than optimal for training neural networks

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

Deconfounding Multi-Cause Latent Confounders: A Factor-Model Approach to Climate Model Bias Correction

DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models

Proximal Policy Distillation

BoA: Attention-aware Post-training Quantization without Backpropagation

Certification for Differentially Private Prediction in Gradient-Based Training

Multi-Agent Collaboration via Cross-Team Orchestration

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models

Mirage: A Multi-Level Superoptimizer for Tensor Programs

Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion

Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Structure Guided Large Language Model for SQL Generation

GraphGPT: Generative Pre-trained Graph Eulerian Transformer

Graph Deep Learning for Time Series Forecasting

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

Rethinking Machine Unlearning in Image Generation Models

The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process

Adversarial Tokenization

Created by

Haebom

저자

Renato Lui Geh, Zilei Shao, Guy Van den Broeck

개요

본 논문은 대규모 언어 모델(LLM)의 토큰화 과정에서 발생하는 취약점을 다룹니다. 기존 LLM 파이프라인은 주어진 문자열에 대해 단 하나의 토큰화만 고려하지만, 실제로는 여러 가지 토큰화 방법이 존재합니다. 예를 들어, "penguin"이라는 단어는 "[p, enguin]"으로 토큰화될 수도 있지만, "[peng, uin]"으로 토큰화될 수도 있습니다. 본 논문은 LLM이 하나의 토큰화 방식으로만 학습되었음에도 불구하고 다른 토큰화 방식에 대한 의미적 이해를 유지한다는 점을 보여주고, 이것이 LLM의 안전성에 미치는 영향에 대해 질문을 제기합니다. 특히 악의적인 문자열을 적대적으로 토큰화하여 안전 및 정렬 제약을 회피할 수 있는지 여부를 실험적으로 검증합니다. 결과적으로, 적대적 토큰화가 기존의 최첨단 적대적 접근 방식에 비해 경쟁력이 있으며, 유해한 요청의 텍스트를 변경하지 않고도 효과적인 공격 방법임을 밝힙니다. 세 가지 최첨단 LLM과 적대적 데이터 세트를 통해 이러한 취약점을 실험적으로 검증합니다.

시사점, 한계점

•

시사점:

◦

LLM의 하위 단어 모델에서 이전에 알려지지 않았던 취약점인 "적대적 토큰화"를 발견했습니다.

◦

적대적 토큰화는 기존의 최첨단 적대적 공격보다 효과적이고 텍스트 변경 없이도 안전 및 정렬 제약을 우회할 수 있습니다.

◦

LLM의 안전성 및 정렬에 대한 새로운 위협을 제시합니다.

◦

LLM의 토큰화 과정에 대한 재검토 및 개선의 필요성을 강조합니다.

•

한계점:

◦

본 연구는 특정 LLM과 데이터 세트에 국한되어 일반화 가능성에 대한 추가 연구가 필요합니다.

◦

적대적 토큰화 공격에 대한 방어 메커니즘 개발에 대한 연구가 추가적으로 필요합니다.

◦

다양한 토큰화 방식의 의미적 이해에 대한 LLM의 내부 메커니즘에 대한 추가적인 분석이 필요합니다.

Made with Slashpage