haebom
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation
The Gray Zone of Faithfulness: Taming Ambiguity in Unfaithfulness Detection
Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
BUSTED at AraGenEval Shared Task: A Comparative Study of Transformer-Based Models for Arabic AI-Generated Text Detection
Steering Evaluation-Aware Language Models to Act Like They Are Deployed
DB-FGA-Net: Dual Backbone Frequency Gated Attention Network for Multi-Class Brain Tumor Classification with Grad-CAM Interpretability
日常的な検査データを用いた早期癌検出の実現可能性の評価:不均衡なデータセットにおける機械学習アプローチの評価
On the Structure of Stationary Solutions to McKean-Vlasov Equations with Applications to Noisy Transformers
ShapeX: Shapelet-Driven Post Hoc Explanations for Time Series Classification Models
Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning
Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
What Makes a Good Curriculum? Disentangling the Effects of Data Ordering on LLM Mathematical Reasoning
Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients
UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in OmniModels
ADPO: Anchored Direct Preference Optimization
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
MIN-Merging: Merge the Important Neurons for Model Merging
When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking
From Flows to Words: Can Zero-/Few-Shot LLMs Detect Network Intrusions? A Grammar-Constrained, Calibrated Evaluation on UNSW-NB15
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
GOOD: Training-Free Guided Diffusion Sampling for Out-of-Distribution Detection
UNDREAM: Bridging Differentiable Rendering and Photorealistic Simulation for End-to-end Adversarial Attacks
The Chameleon Nature of LLMs: Quantifying Multi-Turn Stance Instability in Search-Enabled Language Models
ESCA: Contextualizing Embodied Agents via Scene-Graph Generation
Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion
Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs
Evidence Without Injustice: A New Counterfactual Test for Fair Algorithms
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
Automatic Music Sample Identification with Multi-Track Contrastive Learning
DiffHeads: Differential Analysis and Inference-Time Masking of Bias Heads in Large Language Models
Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization
Uncovering Singularities in Feynman Integrals via Machine Learning
Beyond Fertility: Analyzing STRR as a Metric for Multilingual Tokenization Evaluation
Token Is All You Price
LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology
IKNet: Interpretable Stock Price Prediction via Keyword-Guided Integration of News and Technical Indicators
Smartphone-based iris recognition through high-quality visible-spectrum iris image capture.V2
Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices
Feasibility-Aware Decision-Focused Learning for Predicting Parameters in the Constraints
Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
Holistic Order Prediction in Natural Scenes
Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation
LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models
Aligning LLMs for Multilingual Consistency in Enterprise Applications
Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
Automatic Discovery of One Parameter Subgroups of $SO(n)$
Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization's Impact on CLIP Beyond Accuracy
WolBanking77: Wolof Banking Speech Intent Classification Dataset
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
EvoBrain: Dynamic Multi-Channel EEG Graph Modeling for Time-Evolving Brain Networks
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers
Membership Inference Attacks on Recommender System: A Survey
Reconstruction Alignment Improves Unified Multimodal Models
Deriving Transformer Architectures as Implicit Multinomial Regression
The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management
ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation
The Role of AI in Facilitating Interdisciplinary Collaboration: Evidence from AlphaFold
Score-informed Neural Operator for Enhancing Ordering-based Causal Discovery
TaoSR1: The Thinking Model for E-commerce Relevance Search
A Data-driven ML Approach for Maximizing Performance in LLM-Adapter Serving
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing
The ISLab Solution to the Algonauts Challenge 2025: A Multimodal Deep Learning Approach to Brain Response Prediction
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
PESTO: Real-Time Pitch Estimation with Self-supervised Transposition-equivariant Objective
BikeBench: A Bicycle Design Benchmark for Generative Models with Objectives and Constraints
Trusted Knowledge Extraction for Operations and Maintenance Intelligence
CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models
ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports
DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning
Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries
PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors
A Lightweight Gradient-based Causal Discovery Framework with Applications to Complex Industrial Processes
Ground-Compose-Reinforce: Grounding Language in Agentic Behaviours using Limited Data
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
Context-Aware Regularization with Markovian Integration for Attention-Based Nucleotide Analysis
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation
The Cross-Lingual Cost: Retrieval Biases in RAG over Arabic-English Corpora
Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
Rethinking and Exploring String-Based Malware Family Classification in the Era of LLMs and RAG
Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies
ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization
Echo State Transformer: Attention Over Finite Memories
Reasoning as an Adaptive Defense for Safety
Curious Causality-Seeking Agents Learn Meta Causal World
DeepOmni: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE
FlightKooba: A Fast Interpretable FTP Model
Thought Anchors: Which LLM Reasoning Steps Matter?
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
Identifiability of Deep Polynomial Neural Networks
Cohort Discovery: A Survey on LLM-Assisted Clinical Trial Recruitment
Distributional Training Data Attribution: What do Influence Functions Sample?
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills
Unsupervised Document and Template Clustering using Multimodal Embeddings
Load more
Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization
Created by
Haebom
作者
Rui Chen, Bin Liu, Changtao Miao, Xinghao Wang, Yi Li, Tao Gong, Qi Chu, Nenghai Yu
概要
画像変調技術の発展は深刻なセキュリティ脅威を引き起こし、効果的な画像操作位置追跡(IML)の必要性を強調します。マップ学習ベースのIMLは強力なパフォーマンスを示していますが、ピクセル単位のコメントにはコストがかかります。既存の薬物地図または非地図の代替物はしばしば性能が低下し、解釈の可能性が不足している。本論文は、解釈可能なIML作業のためにマルチモード大型言語モデル(MLLM)を利用する訓練を必要としないフレームワークであるIn-Context Forensic Chain(ICFC)を提案する。 ICFCは、オブジェクト化されたルールの構築と適応フィルタリングを統合して信頼できる知識ベースを構築し、粗い提案から細分化されたフォレンジックの結果まで、専門家のフォレンジックワークフローを模倣する多段階プログレッシブ推論パイプラインを構築します。この設計は、画像レベルの分類、ピクセルレベルの位置追跡、およびテキストレベルの解釈の可能性のためにMLLM推論を体系的に活用することを可能にします。いくつかのベンチマークでは、ICFCは最先端の訓練を必要としない方法を上回るだけでなく、薬指導および完全地図アプローチと比較して競争力のあるまたは優れたパフォーマンスを達成します。
Takeaways、Limitations
•
トレーニングが不要なフレームワークで、画像操作位置追跡の問題を解決。
•
マルチモードラージ言語モデル(MLLM)を活用して解釈可能な結果を提供します。
•
オブジェクト化されたルールの構築と適応フィルタリングによる信頼できる知識ベースの構築。
•
プロフェッショナルフォレンジックワークフローを模倣した多段階プログレッシブ推論パイプラインによるパフォーマンスの向上。
•
複数のベンチマークで既存の方法論に比べて優れた性能を実証。
•
論文の内容に具体的なLimitations言及なし。 (しかし、訓練を必要としないフレームワークなので、MLLMのパフォーマンスによって結果が異なる可能性があるという点が潜在的な制限になる可能性があります。)
PDFを見る
Made with Slashpage