Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers

Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning

MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training

R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning

DiTraj: training-free trajectory control for video diffusion transformer

Agribot: agriculture-specific question answer system

$\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization

Dual-Head Reasoning Distillation: Improving Classifier Accuracy with Train-Time-Only Reasoning

Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence

Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy

Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization's Impact on CLIP Beyond Accuracy

SiNGER: A Clearer Voice Distills Vision Transformers Further

i-LAVA: Insights on Low Latency Voice-2-Voice Architecture for Agents

Experience Deploying Containerized GenAI Services at an HPC Center

EmbeddingGemma: Powerful and Lightweight Text Representations

Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization

Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation

Responsible AI Technical Report

Diffusion-Based Impedance Learning for Contact-Rich Manipulation Tasks

Diversity Boosts AI-Generated Text Detection

SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer

APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation

Self-Evolving LLMs via Continual Instruction Tuning

Reinforced Generation of Combinatorial Structures: Applications to Complexity Theory

Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference

StefaLand: An Efficient Geoscience Foundation Model That Improves Dynamic Land-Surface Predictions

Accurate and Efficient Low-Rank Model Merging in Core Space

Patterns in the Transition From Founder-Leadership to Community Governance of Open Source

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning

Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study

Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction

MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs

FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs

TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation

Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication

A Systematic Survey on Large Language Models for Evolutionary Optimization: From Modeling to Solving

DEPFusion: Dual-Domain Enhancement and Priority-Guided Mamba Fusion for UAV Multispectral Object Detection

COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens

BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

The Physical Basis of Prediction: World Model Formation in Neural Organoids via an LLM-Generated Curriculum

Diffusion Generative Models Meet Compressed Sensing, with Applications to Imaging and Finance

Co-Evolving Complexity: An Adversarial Framework for Automatic MARL Curricula

Grocery to General Merchandise: A Cross-Pollination Recommender using LLMs and Real-Time Cart Context

Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions

GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping

Can General-Purpose Omnimodels Compete with Specialists? A Case Study in Medical Image Segmentation

Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering

TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization

Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks

End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost

Automatic Question & Answer Generation Using Generative Large Language Model (LLM)

CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning

What Matters in Data for DPO?

Type-Compliant Adaptation Cascades: Adapting Programmatic LM Workflows to Data

Speculative Safety-Aware Decoding

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

Coarse-to-Fine Personalized LLM Impressions for Streamlined Radiology Reports

ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals

Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration

Contrastive Representations for Temporal Reasoning

Semantic Discrepancy-aware Detector for Image Forgery Identification

G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration

BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation

PakBBQ: A Culturally Adapted Bias Benchmark for QA

MoQE: Improve Quantization Model performance via Mixture of Quantization Experts

Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs

Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

AttriLens-Mol: Attribute Guided Reinforcement Learning for Molecular Property Prediction with Large Language Models

Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management

CTTS: Collective Test-Time Scaling

The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet

Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models

Can Language Models Discover Scaling Laws?

When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation

A Markov Categorical Framework for Language Modeling

Moving Out: Physically-grounded Human-AI Collaboration

GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning

The Ever-Evolving Science Exam

Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling

GRID: Scalable Task-Agnostic Prompt-Based Continual Learning for Language Models

Learning to summarize user information for personalized reinforcement learning from human feedback

Making Language Model a Hierarchical Classifier

Vidar: Embodied Video Diffusion Model for Generalist Manipulation

BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search

Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition

Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs

CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering

PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought Process

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer

Empirical Analysis Of Heuristic and Approximation Algorithms for the The Mutual-Visibility Problem

Learning to Segment for Vehicle Routing Problems

Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap

Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime

Semantic-guided Diverse Decoding for Large Language Model

Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

Created by

Haebom

저자

Toluwani Aremu, Noor Hussein, Munachiso Nwadike, Samuele Poppi, Jie Zhang, Karthik Nandakumar, Neil Gong, Nils Lukas

개요

GenAI 제공업체는 콘텐츠가 자사 모델에 의해 생성되었는지 확인하기 위해 워터마킹을 사용합니다. 워터마크는 콘텐츠에 숨겨진 신호이며, 비밀 워터마크 키를 사용하여 존재를 감지할 수 있습니다. 핵심 보안 위협은 위조 공격으로, 공격자는 제공업체의 워터마크를 제공업체가 생성하지 않은 콘텐츠에 삽입하여 평판을 훼손하고 신뢰를 저해할 수 있습니다. 기존의 방어책은 동일한 콘텐츠에 여러 키를 가진 여러 워터마크를 임베딩하여 위조를 방지하지만, 이는 모델 유틸리티를 저하시킬 수 있습니다. 그러나 공격자가 충분히 많은 워터마크가 있는 샘플을 수집할 수 있는 경우 위조는 여전히 위협으로 남아 있습니다. 본 논문은 공격자가 수집한 워터마크 콘텐츠의 수와 관계없이, 공격자가 서로 다른 키의 워터마크를 쉽게 구별할 수 없는 경우, 위조 공격에 대해 증명 가능한 방어책을 제안합니다. 제안하는 방식은 모델 유틸리티를 추가적으로 저하시키지 않습니다. 각 쿼리에 대해 워터마크 키 선택을 랜덤화하고, 정확히 하나의 키로 워터마크가 감지된 경우에만 콘텐츠를 진본으로 간주합니다. 이미지 및 텍스트 모드에 초점을 맞추지만, 제안하는 방어책은 기본 워터마킹 방법을 블랙박스로 취급하기 때문에 모드에 구애받지 않습니다. 제안하는 방법은 공격자의 성공률을 증명 가능하게 제한하며, 거의 완벽한 성공률에서 무시할 만한 계산 오버헤드로 단 2%로 감소하는 것을 경험적으로 관찰했습니다.

시사점, 한계점

•

시사점:

◦

공격자가 수집한 워터마크 콘텐츠의 수에 독립적인 위조 방어책을 제안합니다.

◦

모델 유틸리티를 추가적으로 저하시키지 않습니다.

◦

이미지 및 텍스트 모드에 적용 가능하며, 모드에 구애받지 않습니다.

◦

공격자의 성공률을 증명 가능하게 제한합니다.

◦

경험적으로 위조 공격 성공률을 크게 감소시킵니다.

•

한계점:

◦

공격자가 서로 다른 키의 워터마크를 쉽게 구별할 수 없는 경우에만 방어책이 효과적입니다.

◦

기본 워터마킹 방법에 의존하므로, 워터마킹 방법의 보안성에 따라 전체 시스템의 보안 수준이 달라집니다.

Made with Slashpage