Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

Emotions as Ambiguity-aware Ordinal Representations

From Tabula Rasa to Emergent Abilities: Discovering Robot Skills via Real-World Unsupervised Quality-Diversity

Enhancing Model Privacy in Federated Learning with Random Masking and Quantization

Scaling Laws for Task-Stratified Knowledge in Post-Training Quantized Large Language Models

Principled Detection of Hallucinations in Large Language Models via Multiple Testing

Vocoder-Projected Feature Discriminator

ControlEchoSynth: Boosting Ejection Fraction Estimation Models via Controlled Video Diffusion

Explain Before You Answer: A Survey on Compositional Visual Reasoning

Time-Aware One Step Diffusion Network for Real-World Image Super-Resolution

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives

GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

Input-Time Scaling

LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models

StreetViewAI: Making Street View Accessible Using Context-Aware Multimodal AI

Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs

From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Human-Centered Human-AI Interaction (HC-HAII): A Human-Centered AI パースペクティブ

GTPO: Trajectory-Based Policy Optimization in Large Language Models

Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery

A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics

Invisible Architectures of Thought: Toward a New Science of AI as Cognitive Infrastructure

Revisiting Pre-trained Language Models for Vulnerability Detection

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Scaling Decentralized Learning with FLock

SegQuant: A Semantics-Aware and Generalizable Quantization Framework for Diffusion Models

Apple Intelligence Foundation Language Models: Tech Report 2025

Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning

PyVision: Agentic Vision with Dynamic Tooling

DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust

MEraser: An Effective Fingerprint Erasure Approach for Large Language Models

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

Pseudo-Simulation for Autonomous Driving

BinConv: A Neural Architecture for Ordinal Encoding in Time-Series Forecasting

FaceEditTalker: Controllable Talking Head Generation with Facial Attribute Editing

EnvInjection: Environmental Prompt Injection Attack to Multi-modal Web Agents

X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

Heat Diffusion Models - Interpixel Attention Mechanism

Bidirectional Task-Motion Planning Based on Hierarchical Reinforcement Learning for Strategic Confrontation

Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts

Pricing AI Model Accuracy

Evaluating the Fitness of Ontologies for the Task of Question Generation

Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation

PGAD: Prototype-Guided Adaptive Distillation for Multi-Modal Learning in AD Diagnosis

Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models

An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

Efficient PINNs via Multi-Head Unimodular Regularization of the Solutions Space

Statistical learning does not always entail knowledge

Score-based Generative Diffusion Models for Social Recommendations

PromptKeeper: Safeguarding System Prompts for LLMs

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Understanding Fairness-Accuracy Trade-offs in Machine Learning Models: Does Promoting Fairness Undermine Performance?

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language モデル

Leveraging Multi-facet Paths for Heterogeneous Graph Representation Learning

Training with Explanations Alone: A New Paradigm to Prevent Shortcut Learning

Generation of Geodesics with Actor-Critic Reinforcement Learning to Predict Midpoints

TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes

HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models

StepWiser: Stepwise Generative Judges for Wiser Reasoning

AniME: Adaptive Multi-Agent Planning for Long Animation Generation

AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance

AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots

Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science

General agents contain world models

Approximate Lifted Model Construction

Fitness Landscape of Large Language Model-Assisted Automated Algorithm Search

Synthesizing High-Quality Programming Tasks with LLM-based Expert and Student Agents

Preference Elicitation for Multi-objective Combinatorial Optimization with Active Learning and Maximum Likelihood Estimation

Reference-Aligned Retrieval-Augmented Question Answering over Heterogeneous Proprietary Documents

Demonstrating specification gaming in reasoning models

AirRAG: Autonomous Strategic Planning and Reasoning Steer Retrieval Augmented Generation

Think Smart、Act SMARL！ Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

From Evidence to Decision: Exploring Evaluative AI

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Discrete-Guided Diffusion for Scalable and Safe Multi-Robot Motion Planning

Patch Progression Masked Autoencoder with Fusion CNN Network for Classifying Evolution Between Two Pairs of 2D OCT Slices

DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis

Large Language Models (LLMs) for Electronic Design Automation (EDA)

Symphony: A Decentralized Multi-Agent Framework for Scalable Collective Intelligence

HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and their Impact on Power and Cooling

Decomposing Behavioral Phase Transitions in LLMs: Order Parameters for Emergent Misalignment

Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach

Linear-Time Demonstration Selection for In-Context Learning via Gradient Estimation

MathBuddy: A Multimodal System for Affective Math Tutoring

Diffusion Language Models Know the Answer Before Decoding

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Dhati+: Fine-tuned Large Language Models for Arabic Subjectivity Evaluation

WaveHiT-SR: Hierarchical Wavelet Network for Efficient Image Super-Resolution

The Next Layer: Augmenting Foundation Models with Structure-Preserving and Attention-Guided Learning for Local Patches to Global Context Awareness in Computational Pathology

Logical Reasoning with Outcome Reward Models for Test-Time Scaling

The Information Dynamics of Generative Diffusion

AI-Powered Detection of Inappropriate Language in Medical School Curricula

Generative AI for Testing of Autonomous Driving Systems: A Survey

Multispectral LiDAR data for extracting tree points in urban and suburban areas

Input-Time Scaling

Created by

Haebom

作者

Rapheal Huang (Yuming), Weilong Guo

概要

この論文では、既存の大規模言語モデル（LLM）の拡張方式であるデータと学習規模の拡張、推論時間の拡張を補完する新しい拡張パラダイムである入力時間拡張（Input-Time Scaling）を提示します。この方法は、メタ知識を活用してさまざまな戦略で入力を改善し、学習とテストの両方で戦略を適用する「学習テスト共同設計」の現象を発見しました。興味深いことに、低品質のデータセットがより良いパフォーマンスを示す可能性があり、ランダムに選択された1,000の例で最高のパフォーマンスを達成できることがわかりました。これは、「ごみ入力、ごみ出力」という一般的な仮定に反する結果です。より高品質なデータで学習することは常にパフォーマンスの向上につながるわけではなく、1,000の例だけでも高次元推論能力を発揮できるという「Less is More」現象とも一致します。 Qwen2.5-32B-Instructモデルを使用した実験の結果、AIME24とAIME25で最先端の性能（76.7％）を達成し、3つのモデルを多数決で合わせるとAIME25で80％の性能を達成しました。 DeepSeek-R1-Distill-Qwen-32Bモデルを使用したときは、AIME24で86.7%、AIME25で76.7%の性能を達成しました。データセット、データパイプライン、評価結果、チェックポイントをオープンソースで公開する予定です。

Takeaways、Limitations

•

Takeaways:

◦

既存のデータと学習規模の拡張,推論時間の拡張を補完する新しい入力時間拡張パラダイムを提示

◦

学習 - テスト共同設計の重要性の発見

◦

低品質のデータセットが高品質のデータセットよりも優れたパフォーマンスを示す可能性があることを確認する（Garbage in、Garbage out反論

◦

Less is More 現象との一貫性確認（少量のデータでも高次元推論可能）

◦

AIME24とAIME25でSOTA性能を達成

◦

データセット、コードなどのオープンソース公開予定

•

Limitations:

◦

現時点では、特定のモデル（Qwen2.5-32B-Instruct、DeepSeek-R1-Distill-Qwen-32B）の実験結果のみが提示され、一般化の可能性に関するさらなる研究が必要です

◦

入力時間拡張の効果がすべてのLLMに適用できるかどうかをさらに検証する必要がある

◦

学習-テスト共同設計の具体的なメカニズムの追加分析が必要

◦

オープンソースの公開がまだ完了していない。

Made with Slashpage