/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
CEHR-XGPT: A Scalable Multi-Task Foundation Model for Electronic Health Records
Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens
Adaptive Learning Strategies for Mitotic Figure Classification in MIDOG2025 Challenge
MitoDetect++: A Domain-Robust Pipeline for Mitosis Detection and Atypical Subtyping
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance
Fantastic Pretraining Optimizers and Where to Find Them
Towards Interpretable Geo-localization: a Concept-Aware Global Image-GPS Alignment Framework
TECP: Token-Entropy Conformal Prediction for LLMs
The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management
Train-Once Plan-Anywhere Kinodynamic Motion Planning via Diffusion Trees
Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare
Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets
AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection
HuggingGraph: Understanding the Supply Chain of LLM Ecosystem
Food safety trends across Europe: insights from the 392-million-entry CompreHensive European Food Safety (CHEFS) database
Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty Quantification
BayesSDF: Surface-Based Laplacian Uncertainty Estimation for 3D Geometry with Neural Signed Distance Fields
Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework
The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
AI-Assisted Rapid Crystal Structure Generation Towards a Target Local Environment
First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
Cutting Through Privacy: A Hyperplane-Based Data Reconstruction Attack in Federated Learning
AutoPDL: Automatic Prompt Optimization for LLM Agents
RailGoerl24: G\"orlitz Rail Test Center CV Dataset 2024
Revealing higher-order neural representations of uncertainty with the Noise Estimation through Reinforcement-based Diffusion (NERD) model
PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models
Spoof Trace Discovery for Deep Learning Based Explainable Face Anti-Spoofing
The Information Security Awareness of Large Language Models
Automatically Detecting Online Deceptive Patterns
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Automated detection of underdiagnosed medical conditions via opportunistic imaging
Selective Preference Optimization via Token-Level Reward Function Estimation
ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation
PersonaGym: Evaluating Persona Agents and LLMs
CFaults: Model-Based Diagnosis for Fault Localization in C Programs with Multiple Test Cases
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Demystifying Chains, Trees, and Graphs of Thoughts
Survival Analysis with Adversarial Regularization
Net2Brain: A Toolbox to compare artificial vision models with human brain responses
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs
PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Dynamic Speculative Agent Planning
AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning
Graph RAG as Human Choice Model: Building a Data-Driven Mobility Agent with Preference Chain
MHSNet:An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design
Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning
Translating Federated Learning Algorithms in Python into CSP Processes Using ChatGPT
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
Epistemic Skills: Reasoning about Knowledge and Oblivion
Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment
GUIエージェント:A Survey
Neural Network Verification with PyRAT
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation
MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
SpikingBrain Technical Report: Spiking Brain-inspired Large Models
Scaling Performance of Large Language Model Pretraining
Recomposer: Event-roll-guided generative audio editing
COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization
Uncertain but Useful: Leveraging CNN Variability into Data Augmentation
CURE: Controlled Unlearning for Robust Embeddings - Mitigating Conceptual Shortcuts in Pre-Trained Language Models
HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models
RapidGNN: Energy and Communication-Efficient Distributed Training on Large-Scale Graph Neural Networks
Enhancing 3D Point Cloud Classification with ModelNet-R and Point-SkipNet
AI Agents for Web Testing: A Case Study in the Wild
Accuracy-Constrained CNN Pruning for Efficient and Reliable EEG-Based Seizure Detection
Exploring Situated Stabilities of a Rhythm Generation System through Variational Cross-Examination
GenAI-based test case generation and execution in SDV platform
ICR: Iterative Clarification and Rewriting for Conversational Search
ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions
Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization
Pointing-Guided Target Estimation via Transformer-Based Attention
Adversarial Augmentation and Active Sampling for Robust Cyber Anomaly Detection
LLM Enabled Multi-Agent System for 6G Networks: Framework and Method of Dual-Loop Edge-Terminal Collaboration
High-Resolution Global Land Surface Temperature Retrieval via a Coupled Mechanism-Machine Learning Framework
Exploring an implementation of quantum learning pipeline for support vector machines
DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation
Artificial intelligence for representing and characterizing quantum systems
PLaMo 2 Technical Report
SpiderNets: Estimating Fear Ratings of Spider-Related Images with Vision Models
The Paradox of Doom: Acknowledging Extinction Risk Reduces the Incentive to Prevent It
A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing
REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts
PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Exploring Non-Local Spatial-Angular Correlations with a Hybrid Mamba-Transformer Framework for Light Field Super-Resolution
AI-Driven Fronthaul Link Compression in Wireless Communication Systems: Review and Method Design
Toward Accessible Dermatology: Skin Lesion Classification Using Deep Learning Models on Mobile-Acquired Images
Graph Unlearning: Efficient Node Removal in Graph Neural Networks
Enhancing Diversity in Large Language Models via Determinantal Point Processes
VARMA-Enhanced Transformer for Time Series Forecasting
The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models
Load more
Fantastic Pretraining Optimizers and Where to Find Them
Created by
Haebom
作者
Kaiyue Wen, David Hall, Tengyu Ma, Percy Liang
概要
この論文は、大規模言語モデルの事前訓練でAdamWを置き換えることができる最適化アルゴリズムの高速化の主張に関する体系的な研究結果を提供します。従来の研究の比較は、不公平なハイパーパラメータの調整と制限された評価設定によって歪められたという問題を指摘し、4つのモデルサイズとデータモデル比で10の最適化アルゴリズムを比較分析しました。研究は、公正な比較のために、厳格なハイパーパラメータのチューニングと、さまざまなモデルサイズとデータモデルの比率のトレーニング終了時点の評価が不可欠であることを明らかにしました。さらに、既存の研究で主張されている速度の向上は実際には低く、モデルサイズが大きくなるにつれて減少する傾向があることがわかりました。特に、MuonやSoapなどの最速の最適化アルゴリズムは行列を前処理として使用しますが、その速度向上はモデルサイズに反比例して減少することがわかりました。
Takeaways、Limitations
•
Takeaways:
◦
大規模言語モデル事前訓練における最適化アルゴリズムの高速化に関する既存の研究結果の信頼性に関する疑問
◦
公平な最適化アルゴリズムを比較するための厳密なハイパーパラメータチューニングと包括的な評価方法の提示
◦
行列ベースの前処理器を用いた最適化アルゴリズムの速度向上はモデルサイズに応じて減少することを確認した。
◦
AdamWを凌駕する速度向上は、モデル規模が大きくなるにつれて微小になることを実験的に証明。
•
Limitations:
◦
本研究で検討されている最適化アルゴリズムとモデルサイズ、データモデルの比率は限られている可能性があります。
◦
他の種類の言語モデルや作業の一般化の可能性に関するさらなる研究が必要
◦
より多様なハイパーパラメータ空間探索により、より洗練された比較が必要になる可能性があります。
PDFを見る
Made with Slashpage