Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation

The Gray Zone of Faithfulness: Taming Ambiguity in Unfaithfulness Detection

Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

BUSTED at AraGenEval Shared Task: A Comparative Study of Transformer-Based Models for Arabic AI-Generated Text Detection

Steering Evaluation-Aware Language Models to Act Like They Are Deployed

DB-FGA-Net: Dual Backbone Frequency Gated Attention Network for Multi-Class Brain Tumor Classification with Grad-CAM Interpretability

日常的な検査データを用いた早期癌検出の実現可能性の評価：不均衡なデータセットにおける機械学習アプローチの評価

On the Structure of Stationary Solutions to McKean-Vlasov Equations with Applications to Noisy Transformers

ShapeX: Shapelet-Driven Post Hoc Explanations for Time Series Classification Models

Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

Imbalanced Gradients in RL Post-Training of Multi-Task LLMs

What Makes a Good Curriculum? Disentangling the Effects of Data Ordering on LLM Mathematical Reasoning

Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients

UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in OmniModels

ADPO: Anchored Direct Preference Optimization

Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

MIN-Merging: Merge the Important Neurons for Model Merging

When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking

From Flows to Words: Can Zero-/Few-Shot LLMs Detect Network Intrusions? A Grammar-Constrained, Calibrated Evaluation on UNSW-NB15

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

GOOD: Training-Free Guided Diffusion Sampling for Out-of-Distribution Detection

UNDREAM: Bridging Differentiable Rendering and Photorealistic Simulation for End-to-end Adversarial Attacks

The Chameleon Nature of LLMs: Quantifying Multi-Turn Stance Instability in Search-Enabled Language Models

ESCA: Contextualizing Embodied Agents via Scene-Graph Generation

Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion

Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs

Evidence Without Injustice: A New Counterfactual Test for Fair Algorithms

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning

Automatic Music Sample Identification with Multi-Track Contrastive Learning

DiffHeads: Differential Analysis and Inference-Time Masking of Bias Heads in Large Language Models

Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization

Uncovering Singularities in Feynman Integrals via Machine Learning

Beyond Fertility: Analyzing STRR as a Metric for Multilingual Tokenization Evaluation

Token Is All You Price

LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

IKNet: Interpretable Stock Price Prediction via Keyword-Guided Integration of News and Technical Indicators

Smartphone-based iris recognition through high-quality visible-spectrum iris image capture.V2

Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices

Feasibility-Aware Decision-Focused Learning for Predicting Parameters in the Constraints

Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus

Holistic Order Prediction in Natural Scenes

Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation

LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models

Aligning LLMs for Multilingual Consistency in Enterprise Applications

Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning

Automatic Discovery of One Parameter Subgroups of $SO(n)$

Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization's Impact on CLIP Beyond Accuracy

WolBanking77: Wolof Banking Speech Intent Classification Dataset

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise

EvoBrain: Dynamic Multi-Channel EEG Graph Modeling for Time-Evolving Brain Networks

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning

Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers

Membership Inference Attacks on Recommender System: A Survey

Reconstruction Alignment Improves Unified Multimodal Models

Deriving Transformer Architectures as Implicit Multinomial Regression

The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management

ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

The Role of AI in Facilitating Interdisciplinary Collaboration: Evidence from AlphaFold

Score-informed Neural Operator for Enhancing Ordering-based Causal Discovery

TaoSR1: The Thinking Model for E-commerce Relevance Search

A Data-driven ML Approach for Maximizing Performance in LLM-Adapter Serving

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

The ISLab Solution to the Algonauts Challenge 2025: A Multimodal Deep Learning Approach to Brain Response Prediction

EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering

PESTO: Real-Time Pitch Estimation with Self-supervised Transposition-equivariant Objective

BikeBench: A Bicycle Design Benchmark for Generative Models with Objectives and Constraints

Trusted Knowledge Extraction for Operations and Maintenance Intelligence

CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models

ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports

DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning

Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries

PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors

A Lightweight Gradient-based Causal Discovery Framework with Applications to Complex Industrial Processes

Ground-Compose-Reinforce: Grounding Language in Agentic Behaviours using Limited Data

Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training

Context-Aware Regularization with Markovian Integration for Attention-Based Nucleotide Analysis

Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation

The Cross-Lingual Cost: Retrieval Biases in RAG over Arabic-English Corpora

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

Rethinking and Exploring String-Based Malware Family Classification in the Era of LLMs and RAG

Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies

ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization

Echo State Transformer: Attention Over Finite Memories

Reasoning as an Adaptive Defense for Safety

Curious Causality-Seeking Agents Learn Meta Causal World

DeepOmni: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE

FlightKooba: A Fast Interpretable FTP Model

Thought Anchors: Which LLM Reasoning Steps Matter?

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation

Identifiability of Deep Polynomial Neural Networks

Cohort Discovery: A Survey on LLM-Assisted Clinical Trial Recruitment

Distributional Training Data Attribution: What do Influence Functions Sample?

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

Unsupervised Document and Template Clustering using Multimodal Embeddings

Neon: Negative Extrapolation From Self-Training Improves Image Generation

Created by

Haebom

作者

Sina Alemohammad, Zhangyang Wang, Richard G. Baraniuk

概要

高品質のトレーニングデータの不足により、生成型AIモデルの拡張が困難になっています。生成モデルを使用して合成データを生成し、それを実際のデータと共に fine-tuning に活用して性能向上を図る試みがありましたが、モデル自体の崩壊につながり、サンプル品質や多様性が低下する問題が発生しました。この論文では、この問題を解決するために、自己訓練から生じる性能低下を自己改善信号として活用する新しい学習方法であるNeon（Negative Extrapolation frOm self-traiNing）を提案します。 Neonは最初に独自の生成データを使用して基盤となるモデルを fine-tuning し、逆方向勾配更新を使用して、劣化した重みから逸脱する方向にモデルを学習します。 Neon は、予測可能な実データと合成データとの間の勾配不整合を修正し、モデルを実際のデータ分布に近づけるように調整します。この方法は、新しい実際のデータなしで簡単なポストホックマージを使用して実装でき、1,000個未満の合成サンプルでも効果的に機能し、追加のトレーニングコンピューティングリソースを1％未満で使用します。さまざまなアーキテクチャ（diffusion、flow matching、autoregressive、inductive moment matching models）とデータセット（ImageNet、CIFAR-10、FFHQ）に対するNeonの普遍性を実証しました。特に、ImageNet 256x256では、NeonはxAR-LモデルのFIDを1.02に向上させ、追加のトレーニングコンピューティングリソースはわずか0.36％でした。

Takeaways、Limitations

•

Takeaways：

◦

自己訓練に起因するモデル崩壊を克服する新しい学習方法の提示

◦

さまざまな生成型AIモデルとデータセットに適用可能。

◦

少数の合成データと少ない追加のコンピューティングリソースでも効果的なパフォーマンス向上

◦

ImageNet 256x256で新しいSOTA(State-of-the-art)を達成.

•

Limitations：

◦

論文で提示した方法の一般化の可能性と長期的な効果に関するさらなる研究の必要性

◦

モデル崩壊現象に対するNeonの依存性があるかもしれません。 (克服、防止方法)

◦

特定のアーキテクチャとデータセットの追加実験が必要です。

Made with Slashpage