/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Self-Questioning Language Models
Beyond risk: A proto-framework for assessing the societal impact of AI systems
Supervised Dynamic Dimension Reduction with Deep Neural Network
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
LLMs Have a Heart of Stone: Demystifying the Soft Thinking Ability of Large Reasoning Models
Industrial LLM-based Code Optimization under Regulation: A Mixture-of-Agents Approach
Reliable Evaluation Protocol for Low-Precision Retrieval
Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery
Tool-integrated Reinforcement Learning for Repo Deep Search
CauKer: classification time series foundation models can be pretrained on synthetic data only
Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment
DMSC: Dynamic Multi-Scale Coordination Framework for Time Series Forecasting
HyCodePolicy: Hybrid Language Controllers for Multimodal Monitoring and Decision in Embodied Agents
Entity Representation Learning Through Onsite-Offsite Graph for Pinterest Ads
Evaluating User Experience in Conversational Recommender Systems: A Systematic Review Across Classical and LLM-Powered Approaches
Spatial-Frequency Aware for Object Detection in RAW Image
Learning Pivoting Manipulation with Force and Vision Feedback Using Optimization-based Demonstrations
NCCR: to Evaluate the Robustness of Neural Networks and Adversarial Examples
ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions
From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation
EcoTransformer: Attention without Multiplication
Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation
SDBench: A Comprehensive Benchmark Suite for Speaker Diarization
True Multimodal In-Context Learning Needs Attention to the Visual Context
Gauge Flow Models
Zero-Shot Neural Architecture Search with Weighted Response Correlation
The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting
A Comparative Study of Specialized LLMs as Dense Retrievers
Sign Spotting Disambiguation using Large Language Models
UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields
Thought Anchors: Which LLM Reasoning Steps Matter?
UITron-Speech: Towards Automated GUI Agents Based on Speech Instructions
15,500 Seconds: Lean UAV Classification Using EfficientNet and Lightweight Fine-Tuning
AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale
On the Fundamental Impossibility of Hallucination Control in Large Language Models
Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR
Text-Only Reasoning Unleashes Zero-Shot Multimodal Evaluators
CAIN: Hijacking LLM-Humans Conversations via Malicious System Prompts
Explain Less, Understand More: Jargon Detection via Personalized Parameter-Efficient Fine-tuning
What Lives? A meta-analysis of diverse opinions on the definition of life
A Generative Neural Annealer for Black-Box Combinatorial Optimization
GRILL: Gradient Signal Restoration in Ill-Conditioned Layers to Enhance Adversarial Attacks on Autoencoders
CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering
Mj\"olnir: A Deep Learning Parametrization Framework for Global Lightning Flash Density
RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework
ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning
Beyond Wide-Angle Images: Structure-to-Detail Video Portrait Correction via Unsupervised Spatiotemporal Adaptation
CITRAS: Covariate-Informed Transformer for Time Series Forecasting
Rubric Is All You Need: Enhancing LLM-based Code Evaluation With Question-Specific Rubrics
Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory
Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Pull-Based Query Scheduling for Goal-Oriented Semantic Communication
Accelerating Focal Search in Multi-Agent Path Finding with Tighter Lower Bounds
RAILGUN: A Unified Convolutional Policy for Multi-Agent Path Finding Across Different Environments and Tasks
UltraSTF: Ultra-Compact Model for Large-Scale Spatio-Temporal Forecasting
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
Foundation Model of Electronic Medical Records for Adaptive Risk Estimation
Tool Unlearning for Tool-Augmented LLMs
Vision without Images: End-to-End Computer Vision from Single Compressive Measurements
How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias
3DTTNet: Multimodal Fusion-Based 3D Traversable Terrain Modeling for Off-Road Environments
DOGR: Towards Versatile Visual Document Grounding and Referring
Real-World Offline Reinforcement Learning from Vision Language Model フィードバック
Causality-Driven Audits of Model Robustness
AUTALIC: A Dataset for Anti-AUTistic Ableist Language In Context
Beyond Adapter Retrieval: Latent Geometry-Preserving Composition via Sparse Task Projection
Pyhgf: A neural network library for predictive coding
Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated
AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual Granularity
Parse Trees Guided LLM Prompt Compression
One Model, Any Conjunctive Query: Graph Neural Networks for Answering Queries over Incomplete Knowledge Graphs
A Value Based Parallel Update MCTS Method for Multi-Agent Cooperative Decision Making of Connected and Automated Vehicles
Fairness Definitions in Language Models Explained
CityLight: A Neighborhood-inclusive Universal Model for Coordinated City-scale Traffic Signal Control
Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting
Long-Term Visual Object Tracking with Event Cameras: An Associative Memory Augmented Tracker and A Benchmark Dataset
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
From Cluster Assumption to Graph Convolution: Graph-based Semi-Supervised Learning Revisited
Environmental Sound Classification on An Embedded Hardware Platform
Data Dependency Inference for Industrial Code Generation Based on UML Sequence Diagrams
InqEduAgent: Adaptive AI Learning Partners with Gaussian Process Augmentation
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Higher Gauge Flow Models
Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models
IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
SLR: Automated Synthesis for Scalable Logical Reasoning
The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets
Learning to Inference Adaptively for Multimodal Large Language Models
Efficient rule induction by ignoring pointless rules
Why the Agent Made that Decision: Contrastive Explanation Learning for Reinforcement Learning
Evaluating Detection Thresholds: The Impact of False Positives and Negatives on Super-Resolution Ultrasound Localization Microscopy
Load more
Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation
Created by
Haebom
作者
Jaechul Roh, Zachary Novack, Yuefeng Peng, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Amir Houmansadr
概要
この論文は、生成モデルの暗記現象が単純な文字通りの再現を超えて、比喩的なパターン、意味の関連性、そして驚くべきことにモダリティを超えて(例えば歌詞 - 音楽生成、テキスト - ビデオ生成)現れることを明らかにします。特に、著作権のあるコンテンツが間接的な音声経路を通じて流出する新しい種類のクロスモダリティ暗記現象を明らかにし、これを攻撃する方法として敵対的な音声プロンプト(APT)を提案します。 APTは、象徴的なフレーズを音韻的に似ていますが、意味的には他の選択肢に置き換えて(例えば「mom's spaghetti」を「Bob's confetti」に)、音の形を維持しながら意味の内容を大幅に変更します。実験は、陰謀的に似ているが意味的には関係のない歌詞を使用して、モデルが暗記された曲を再生するように誘導できることを示しています。意味の変化にもかかわらず、SUNOのようなブラックボックスモデルとYuEのようなオープンソースモデルは、原曲と驚くほど似たような(メロディ、リズム、ボーカルの側面で)出力を生成し、AudioJudge、CLAP、CoverIDで高いスコアを得ます。これらの効果はジャンルと言語にわたって持続します。さらに驚くべきことに、音声プロンプトだけがテキストビデオモデルで視覚的暗記を引き起こす可能性があることを発見しました。 「Lose Yourself」の変更された歌詞を入力すると、Veo 3は元のミュージックビデオを反映するシーン(フードティーを着たラッパーや暗い都市の背景など)を生成します。プロンプトには明示的な視覚的な手がかりはありません。このようなクロスモダリティ漏洩は前例のない脅威を表し、著作権フィルタなどの既存の安全対策を無効にします。本研究は、転写ベースの生成モデルの根本的な脆弱性を示しており、著作権、ソース、および多モーダル生成システムの安全な配布に関する緊急の懸念を提起する。
Takeaways、Limitations
•
Takeaways:
◦
生成モデルの暗記現象が文字通りの再現を超えて様々な方法で現れることを明らかにする。
◦
クロスモダリティ暗記現象による著作権コンテンツ流出の新たな脅威を提示します。
◦
既存の著作権フィルタなど安全対策の無力化の可能性を見せる。
◦
ダモーダル生成システムの安全な展開のための新しい安全対策の開発の必要性の提起。
◦
音声プロンプトを使用した敵対的な攻撃の可能性を示します。
•
Limitations:
◦
APT攻撃の一般化の可能性と他のモデル/データセットに関するさらなる研究が必要です。
◦
提案されたAPT攻撃に対する防御技術のさらなる研究が必要です。
◦
さまざまな生成モデルとデータセットの広範な実験が必要です。
◦
現実世界の著作権侵害事例との関連性に関するさらなる研究が必要
PDFを見る
Made with Slashpage