Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning

Sea-ing Through Scattered Rays: Revisiting the Image Formation Model for Realistic Underwater Image Generation

DPANet: Dual Pyramid Attention Network for Multivariate Time Series Forecasting

MeanFlowSE: one-step generative speech enhancement via conditional mean flow

Empathy-R1: A Chain-of-Empathy and Reinforcement Learning Framework for Long-Form Mental Health Support

Threat Modeling for Enhancing Security of IoT Audio Classification Devices under a Secure Protocols Framework

AToken: A Unified Tokenizer for Vision

TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning

Comprehensive Evaluation of CNN-Based Audio Tagging Models on Resource-Constrained Devices

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training

Hardness, Structural Knowledge, and Opportunity: An Analytical Framework for Modular Performance Modeling

Benchmark of stylistic variation in LLM-generated texts

SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints

Structure Matters: Brain Graph Augmentation via Learnable Edge Masking for Data-efficient Psychiatric Diagnosis

DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge

Riemannian Batch Normalization: A Gyro Approach

On the Security of Tool-Invocation Prompts for LLM-Based Agentic Systems: An Empirical Risk Assessment

MIDOG 2025: Mitotic Figure Detection with Attention-Guided False Positive Correction

Do Retrieval Augmented Language Models Know When They Don't Know?

LongCat-Flash Technical Report

MedCOD: Enhancing English-to-Spanish Medical Translation of Large Language Models Using Enriched Chain-of-Dictionary Framework

Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning

OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages

Subjective Behaviors and Preferences in LLM: Language of Browsing

Using Natural Language for Human-Robot Collaboration in the Real World

RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding

Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models

VLA-Mark: A cross modal watermark for large vision-language alignment model

Deformable Dynamic Convolution for Accurate yet Efficient Spatio-Temporal Traffic Prediction

Deep Reinforcement Learning with Gradient Eligibility Traces

Generating Moving 3D Soundscapes with Latent Diffusion Models

Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework

Discrete Diffusion in Large Language and Multimodal Models: A Survey

DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models

Algorithmic Fairness: Not a Purely Technical but Socio-Technical Property

OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization

Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward

AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition

LLMs Can Compensate for Deficiencies in Visual Representations

Spatial Understanding from Videos: Structured Prompts Meet Simulation Data

Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation

Cross-Attention Speculative Decoding

Beyond Linear Steering: Unified Multi-Attribute Control for Language Models

Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert

SEMMA: A Semantic Aware Knowledge Graph Foundation Model

AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection

Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fairness in Recommender Systems

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains

A Survey of Large Language Models for Data Challenges in Graphs

CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation

Creative Preference Optimization

MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language

Search and Refine During Think: Facilitating Knowledge Refinement for Improved Retrieval-Augmented Reasoning

Space Group Equivariant Crystal Diffusion

Schreier-Coset Graph Propagation

Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning

AttentionDrop: A Novel Regularization Method for Transformer Models

MigGPT: Harnessing Large Language Models for Automated Migration of Out-of-Tree Linux Kernel Patches Across Versions

Hybrid Temporal Differential Consistency Autoencoder for Efficient and Sustainable Anomaly Detection in Cyber-Physical Systems

Who is Responsible When AI Fails? Mapping Causes, Entities, and Consequences of AI Privacy and Ethical Incidents

No Black Box Anymore: Demystifying Clinical Predictive Modeling with Temporal-Feature Cross Attention Mechanism

Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes - Insights from Urban Studies

MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling

Pruning the Paradox: How CLIP's Most Informative Heads Enhance Performance While Amplifying Bias

KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature Analysis

SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning

Sparsity May Be All You Need: Sparse Random Parameter Adaptation

Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data

"It Felt Like I Was Left in the Dark": Exploring Information Needs and Design Opportunities for Family Caregivers of Older Adult Patients in Critical Care Settings

Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective

Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models

A Layered Multi-Expert Framework for Long-Context Mental Health Assessments

Efficient Real-time Refinement of Language Model Text Generation

FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Dynamic Neural Curiosity Enhances Learning Flexibility for Autonomous Goal Discovery

Bayesian Concept Bottleneck Models with LLM Priors

G2D2: Gradient-Guided Discrete Diffusion for Inverse Problem Solving

SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

DiRW: Path-Aware Digraph Learning for Heterophily

Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework

DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition

CrackSCF: Lightweight Cascaded Fusion Network for Robust and Efficient Structural Crack Segmentation

ConfReady: A RAG based Assistant and Dataset for Conference Checklist Responses

FOVAL: Calibration-Free and Subject-Invariant Fixation Depth Estimation Across Diverse Eye-Tracking Datasets

Assessing invariance to affine transformations in image quality metrics

The Great AI Witch Hunt: Reviewers Perception and (Mis)Conception of Generative AI in Research Writing

Database-Augmented Query Representation for Information Retrieval

Two Is Better Than One: Aligned Representation Pairs for Anomaly Detection

BBScoreV2: Learning Time-Evolution and Latent Alignment from Stochastic Representation

Beyond Pixels: Enhancing LIME with Hierarchical Features and Segmentation Foundation Models

Spatio-Temporal Anomaly Detection with Graph Networks for Data Quality Monitoring of the Hadron Calorimeter

Understanding AI Evaluation Patterns: How Different GPT Models Assess Vision-Language Descriptions

Online Robust Planning under Model Uncertainty: A Sample-Based Approach

HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?

Creative Preference Optimization

Created by

Haebom

作者

Mete Ismayilzada, Antonio Laverghetta Jr., Simone A. Luchini, Reet Patel, Antoine Bosselut, Lonneke van der Plas, Roger Beaty

概要

本論文は、大規模言語モデル（LLM）の創造的なコンテンツ生成能力を向上させるための新しい方法である創造的好み最適化（CrPO）を提案します。既存の方法が多様性や特定の作業に焦点を当てたのとは異なり、CrPOは、新型、多様性、驚き、品質など、多次元の創造性をモジュール式の好み最適化目標に統合します。 20万人以上の人間が生成した応答と30を超える心理的創造性評価を含む大規模な人間の好みデータセットMuCEを使用して複数のモデルをCrPOで学習および評価した結果、GPT-4oを含む強力な基準モデルよりも自動および人間評価の両方でより斬新で多様で驚くべき製品を生成しながら高い出力品質を維持することがわかりました。 NoveltyBenchのさらなる評価は、このアプローチの一般化の可能性をさらに確認します。結論として、好みフレームワーク内で創造性を直接最適化することは、出力品質を低下させることなくLLMの創造的能力を向上させるための有望な方向であることを示しています。

Takeaways、Limitations

•

Takeaways：

◦

LLMの創造性を向上させるための新しい方法であるCrPOの提示。

◦

多次元創造性を考慮したモジュラーアプローチの採用

◦

大規模な人間の好みのデータセットMuCEの活用。

◦

従来モデルより優れたクリエイティブコンテンツ生成性能を実証。

◦

出力品質を低下させることなく創造性を向上させる可能性を提示します。

◦

NoveltyBenchによるアプローチの一般化可能性の確認

•

Limitations：

◦

MuCEデータセットの構成と範囲の詳細な説明が不足しています。

◦

CrPOの計算コストとスケーラビリティの分析不足

◦

様々な種類のLLMの一般化の可能性に関するさらなる研究が必要

◦

「創造性」の定義と測定の主観的側面に関する議論の欠如

Made with Slashpage