Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

Arbitrary Precision Printed Ternary Neural Networks with Holistic Evolutionary Approximation

Invited Paper: Feature-to-Classifier Co-Design for Mixed-Signal Smart Flexible Wearables for Healthcare at the Extreme Edge

Robustness is Important: Limitations of LLMs for Data Fitting

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

CE-RS-SBCIT A Novel Channel Enhanced Hybrid CNN Transformer with Residual, Spatial, and Boundary-Aware Learning for Brain Tumor MRI Analysis

PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science

THEME: Enhancing Thematic Investing with Semantic Stock Representations and Temporal Dynamics

Trust but Verify! A Survey on Verification Design for Test-time Scaling

Quantized Neural Networks for Microcontrollers: A Comprehensive Review of Methods, Platforms, and Applications

Documenting Deployment with Fabric: A Repository of Real-World AI Governance

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Region-Level Context-Aware Multimodal Understanding

ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism

Mask & Match: Learning to Recognize Handwritten Math with Self-Supervised Attention

Adaptive Duration Model for Text Speech Alignment

SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs

Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback

Dually Hierarchical Drift Adaptation for Online Configuration Performance Learning

Single Domain Generalization for Multimodal Cross-Cancer Prognosis via Dirac Rebalancer and Distribution Entanglement

Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

BASE-Q: Bias and Asymmetric Scaling Enhanced Rotational Quantization for Large Language Models

Scientifically-Interpretable Reasoning Network (ScIReN): Discovering Hidden Relationships in the Carbon Cycle and Beyond

A Hybrid Artificial Intelligence Method for Estimating Flicker in Power Systems

Beyond Frequency: The Role of Redundancy in Large Language Model Memorization

TrueGL: A Truthful, Reliable, and Unified Engine for Grounded Learning in Full-Stack Search

Unified Path Planner with Adaptive Safety and Optimality

FedSEA-LLaMA: A Secure, Efficient and Adaptive Federated Splitting Framework for Large Language Models

WebInject: Prompt Injection Attack to Web Agents

Towards Embodiment Scaling Laws in Robot Locomotion

SPIN-ODE: Stiff Physics-Informed Neural ODE for Chemical Reaction Rate Estimation

DDaTR: Dynamic Difference-aware Temporal Residual Network for Longitudinal Radiology Report Generation

Latent Adaptive Planner for Dynamic Manipulation

MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness

SAGA: A Security Architecture for Governing AI Agentic Systems

Towards Understanding Camera Motions in Any Video

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

DeepTrans: Deep Reasoning Translation via Reinforcement Learning

A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Disease Detection from Retinal Fundus Images

Decentralized Domain Generalization with Style Sharing: Formal Model and Convergence Analysis

FROG: Fair Removal on Graphs

DPImageBench: A Unified Benchmark for Differentially Private Image Synthesis

LLM Test Generation via Iterative Hybrid Program Analysis

Toxicity Begets Toxicity: Unraveling Conversational Chains in Political Podcasts

Retrieval-Augmented Machine Translation with Unstructured Knowledge

ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning

RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis

Categorical Data Clustering via Value Order Estimated Distance Metric Learning

Guiding a diffusion model using sliding windows

A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement

Mamba State-Space Models Are Lyapunov-Stable Learners

Alice's Adventures in a Differentiable Wonderland - Volume I, A Tour of the Land

COBRA-PPM: A Causal Bayesian Reasoning Architecture Using Probabilistic Programming for Robot Manipulation Under Uncertainty

Large Intestine 3D Shape Refinement Using Point Diffusion Models for Digital Phantom Generation

What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge

QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges

AI Simulation by Digital Twins: Systematic Survey, Reference Framework, and Mapping to a Standardized Architecture

Compression versus Accuracy: A Hierarchy of Lifted Models

TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness

Transforming Wearable Data into Personal Health Insights using Large Language Model Agents

Policy Expansion for Bridging Offline-to-Online Reinforcement Learning

The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning

DynaMark: A Reinforcement Learning Framework for Dynamic Watermarking in Industrial Machine Tool Controllers

TMUAD: Enhancing Logical Capabilities in Unified Anomaly Detection Models with a Text Memory Bank

MoE-Health: A Mixture of Experts Framework for Robust Multimodal Healthcare Prediction

Going over Fine Web with a Fine-Tooth Comb: Technical Report of Indexing Fine Web for Problematic Content Search and Retrieval

PiCSAR: Probabilistic Confidence Selection And Ranking

Benchmarking GPT-5 in Radiation Oncology: Measurable Gains, but Persistent Need for Expert Oversight

Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering

Reasoning-Intensive Regression

Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNL

Developer Insights into Designing AI-Based Computer Perception Tools

CAD2DMD-SET: Synthetic Generation Tool of Digital Measurement Device CAD Model Datasets for fine-tuning Large Vision-Language Models

OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization

Entropy-Based Non-Invasive Reliability Monitoring of Convolutional Neural Networks

Why Stop at Words? Unveiling the Bigger Picture スルー Line-Level OCR

Harnessing IoT and Generative AI for Weather-Adaptive Learning in Climate Resilience Education

QZhou-Embedding Technical Report

Physics-Informed Spectral Modeling for Hyperspectral Imaging

Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning

A Survey on Current Trends and Recent Advances in Text Anonymization

NSPDI-SNN: An efficient lightweight SNN based on nonlinear synaptic pruning and dendritic integration

Limitations of Physics-Informed Neural Networks: a Study on Smart Grid Surrogation

EZ-Sort: Efficient Pairwise Comparison via Zero-Shot CLIP-Based Pre-Ordering and Human-in-the-Loop Sorting

What Data is Really Necessary? A Feasibility Study of Inference Data Minimization for Recommender Systems

Complete Gaussian Splats from a Single Image with Denoising Diffusion Models

On the Hardness of Learning GNN-based SAT Solvers: The Role of Graph Ricci Curvature

ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

Priors Matter: Addressing Misspecification in Bayesian Deep Q-Learning

HSFN: Hierarchical Selection for Fake News Detection building Heterogeneous Ensemble

Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards

Controllable 3D Molecular Generation for Structure-Based Drug Design Through Bayesian Flow Networks and Gradient Integration

Diffusion-based Multi-modal Synergy Interest Network for Click-through Rate Prediction

MedShift: Implicit Conditional Transport for X-Ray Domain Adaptation

The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management

Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models

Benchmarking the State of Networks with a Low-Cost Method Based on Reservoir Computing

DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction

EZ-Sort: Efficient Pairwise Comparison via Zero-Shot CLIP-Based Pre-Ordering and Human-in-the-Loop Sorting

Created by

Haebom

作者

ユジンパーク、Haejun Chung、Ikbeom Jang

概要

本論文は、主観的または困難な注釈作業における信頼性の向上のために、絶対格付けまたは配列分類よりも対対比較を好む傾向を扱う。既存のすべてのペア比較には多くの注釈（O（n ^ 2））が必要ですが、最近の研究では、ソートアルゴリズムを使用して双対比較を積極的にサンプリングすることによって注釈の負担を大幅に減らしました（O（n log n））。この論文では、（1）トレーニングなしで階層的にコントラスティブ言語画像処理（CLIP）モデルを使用して項目をおおよそ事前ソートし、（2）簡単で明白な人の比較を自動比較に置き換えることで、注釈効率をさらに向上させます。提案されたEZ-Sortは最初にCLIPベースのゼロショットプリアライメントを作成し、次にバケット認識Eloスコアを初期化し、最後に不確実性ベースの人間参加MergeSortを実行します。 FGNET（顔の年齢推定）、DHCI（歴史的画像年代記）、EyePACS（網膜画像品質評価）など、さまざまなデータセットを使用して検証を行いました。その結果、EZ-Sortは、完全な双対比較と比較して90.5％、従来の研究と比較して19.8％（n = 100の場合）の人間の注釈コストを削減しながら、評価者間の信頼性を維持または向上させました。これらの結果は、CLIPベースの辞書情報と不確実性認識サンプリングを組み合わせることで、効率的でスケーラブルな双対ランク付けソリューションを得ることができることを示しています。

Takeaways、Limitations

•

Takeaways:

◦

CLIPを活用したゼロショットプリアライメントと不確実性ベースのサンプリングにより、双対比較操作の効率が大幅に向上しました。

◦

完全な双対比較に対する注釈コストを90.5％削減し、既存の研究比19.8％削減する成果を達成しました。

◦

評価者間の信頼性を維持または改善しながら効率を改善しました。

◦

さまざまなデータセットで検証を行い、提案された方法の一般化性能を確認しました。

•

Limitations:

◦

CLIPモデルの性能に依存し、CLIPモデルの限界がEZ-Sortの性能に影響を与える可能性があります。

◦

自動化された比較の精度が常に保証されるわけではないため、エラーの可能性があります。

◦

特定の種類のデータセットに最適化されている可能性があり、他の種類のデータセットではパフォーマンスが低下する可能性があります。

◦

大規模なデータセットのスケーラビリティに関する追加の研究が必要になる場合があります。

Made with Slashpage