/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Sea-ing Through Scattered Rays: Revisiting the Image Formation Model for Realistic Underwater Image Generation
DPANet: Dual Pyramid Attention Network for Multivariate Time Series Forecasting
MeanFlowSE: one-step generative speech enhancement via conditional mean flow
Empathy-R1: A Chain-of-Empathy and Reinforcement Learning Framework for Long-Form Mental Health Support
Threat Modeling for Enhancing Security of IoT Audio Classification Devices under a Secure Protocols Framework
AToken: A Unified Tokenizer for Vision
TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning
Comprehensive Evaluation of CNN-Based Audio Tagging Models on Resource-Constrained Devices
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training
Hardness, Structural Knowledge, and Opportunity: An Analytical Framework for Modular Performance Modeling
Benchmark of stylistic variation in LLM-generated texts
SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints
Structure Matters: Brain Graph Augmentation via Learnable Edge Masking for Data-efficient Psychiatric Diagnosis
DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge
Riemannian Batch Normalization: A Gyro Approach
On the Security of Tool-Invocation Prompts for LLM-Based Agentic Systems: An Empirical Risk Assessment
MIDOG 2025: Mitotic Figure Detection with Attention-Guided False Positive Correction
Do Retrieval Augmented Language Models Know When They Don't Know?
LongCat-Flash Technical Report
MedCOD: Enhancing English-to-Spanish Medical Translation of Large Language Models Using Enriched Chain-of-Dictionary Framework
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning
OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages
Subjective Behaviors and Preferences in LLM: Language of Browsing
Using Natural Language for Human-Robot Collaboration in the Real World
RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding
Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models
VLA-Mark: A cross modal watermark for large vision-language alignment model
Deformable Dynamic Convolution for Accurate yet Efficient Spatio-Temporal Traffic Prediction
Deep Reinforcement Learning with Gradient Eligibility Traces
Generating Moving 3D Soundscapes with Latent Diffusion Models
Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
Discrete Diffusion in Large Language and Multimodal Models: A Survey
DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models
Algorithmic Fairness: Not a Purely Technical but Socio-Technical Property
OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization
Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition
LLMs Can Compensate for Deficiencies in Visual Representations
Spatial Understanding from Videos: Structured Prompts Meet Simulation Data
Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation
Cross-Attention Speculative Decoding
Beyond Linear Steering: Unified Multi-Attribute Control for Language Models
Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert
SEMMA: A Semantic Aware Knowledge Graph Foundation Model
AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection
Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fairness in Recommender Systems
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
A Survey of Large Language Models for Data Challenges in Graphs
CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Creative Preference Optimization
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language
Search and Refine During Think: Facilitating Knowledge Refinement for Improved Retrieval-Augmented Reasoning
Space Group Equivariant Crystal Diffusion
Schreier-Coset Graph Propagation
Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
AttentionDrop: A Novel Regularization Method for Transformer Models
MigGPT: Harnessing Large Language Models for Automated Migration of Out-of-Tree Linux Kernel Patches Across Versions
Hybrid Temporal Differential Consistency Autoencoder for Efficient and Sustainable Anomaly Detection in Cyber-Physical Systems
Who is Responsible When AI Fails? Mapping Causes, Entities, and Consequences of AI Privacy and Ethical Incidents
No Black Box Anymore: Demystifying Clinical Predictive Modeling with Temporal-Feature Cross Attention Mechanism
Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes - Insights from Urban Studies
MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling
Pruning the Paradox: How CLIP's Most Informative Heads Enhance Performance While Amplifying Bias
KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature Analysis
SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data
"It Felt Like I Was Left in the Dark": Exploring Information Needs and Design Opportunities for Family Caregivers of Older Adult Patients in Critical Care Settings
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models
A Layered Multi-Expert Framework for Long-Context Mental Health Assessments
Efficient Real-time Refinement of Language Model Text Generation
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Dynamic Neural Curiosity Enhances Learning Flexibility for Autonomous Goal Discovery
Bayesian Concept Bottleneck Models with LLM Priors
G2D2: Gradient-Guided Discrete Diffusion for Inverse Problem Solving
SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
DiRW: Path-Aware Digraph Learning for Heterophily
Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework
DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition
CrackSCF: Lightweight Cascaded Fusion Network for Robust and Efficient Structural Crack Segmentation
ConfReady: A RAG based Assistant and Dataset for Conference Checklist Responses
FOVAL: Calibration-Free and Subject-Invariant Fixation Depth Estimation Across Diverse Eye-Tracking Datasets
Assessing invariance to affine transformations in image quality metrics
The Great AI Witch Hunt: Reviewers Perception and (Mis)Conception of Generative AI in Research Writing
Database-Augmented Query Representation for Information Retrieval
Two Is Better Than One: Aligned Representation Pairs for Anomaly Detection
BBScoreV2: Learning Time-Evolution and Latent Alignment from Stochastic Representation
Beyond Pixels: Enhancing LIME with Hierarchical Features and Segmentation Foundation Models
Spatio-Temporal Anomaly Detection with Graph Networks for Data Quality Monitoring of the Hadron Calorimeter
Understanding AI Evaluation Patterns: How Different GPT Models Assess Vision-Language Descriptions
Online Robust Planning under Model Uncertainty: A Sample-Based Approach
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?
Load more
Assessing invariance to affine transformations in image quality metrics
Created by
Haebom
作者
Nuria Alabau-Bosque, Paula Daud en-Oliver, Jorge Vila-Tom as, Valero Laparra, Jes us Malo
概要
本論文は、デジタルメディアの歪みに対する人間の主観的な画質評価方式の限界を指摘し、自然環境で発生する画像変化をよりよく反映するアパイン変換(回転、移動、サイズ調整、スペクトル照明変化)に対する不変性を考慮した新しい評価方法論を提示します。従来の方式はデジタル歪みに集中して人間のアパイン変換に対する不変性を見落とすが、本論文では(1)すべての指標に共通の主観的表現における可視性しきい値決定と(2)指標の距離値とこの共通表現との変換という2つの要素からなる方法論を提案します。正確な心理物理学を使用して共通の表現でしきい値を決定し、どの指標にも簡単に適用できる変換方式を提示します。実験の結果、既存の指標は人間のような不可視性しきい値を示さなかった。データとコードは公に提供されます。
Takeaways、Limitations
•
Takeaways:
◦
既存の画質指標評価方式の限界を指摘し,アパイン変換に対する不変性を考慮した新しい評価方法論の提示
◦
人間の視覚認識特性(不変性、不可視性しきい値)を考慮したより正確な画質評価可能性の提示。
◦
提示された方法論は、さまざまな画質指標に適用可能であり、公開されたデータとコードを通じて検証可能です。
•
Limitations:
◦
提示された方法論の一般的な画像品質評価の適用範囲と有効性に関するさらなる研究が必要です。
◦
様々なタイプの歪みとアパイン変換の組合せに対する方法論の堅牢性のさらなる検証の必要性
PDFを見る
Made with Slashpage