Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 요약본 공유 시 출처만 명기하면 됩니다.
This service is supported by Google Gemini.

How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields

Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

Behavior-Consistent Deep Reinforcement Learning

Fine-grained Claim-level RAG Benchmark for Law

ProcBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents

GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition

RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

The Hidden Cost of Contextual Sycophancy: an AI Literacy Intervention in Human-AI Collaboration

Verify-Gated Completion as Admission Control in a Governed Multi-Agent Runtime: A Bounded Architecture Case Study

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

Symphony for Speech-to-Text: Supporting Real-Time Medical Voice Interfaces

Identifiable Token Correspondence for World Models

When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search

ACE: Self-Evolving LLM Coding Framework via Adversarial Unit Test Generation and Preference Optimization

IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation

Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

Bridging Silicon and the Hippocampus: Algebro-Deterministic Memory "VaCoAl" as a Substrate for Vector-HaSH and TEM

Pelican-Unify 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

Holder Policy Optimisation

Quantifying Rodda and Graham Gait Classification from 3D Makerless Kinematics derived from a Single-view Video in a Heterogeneous Pediatric Clinical Cohort

Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization

Decoupling Endpoint and Semantic Transition Learning for Zero-Shot Composed Image Retrieval

What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

Governed Metaprogramming for Intelligent Systems: Reclassifying Eval as a Governed Effect

On the Wasserstein Gradient Flow Interpretation of Drifting Models

EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation

Towards Open World Sound Event Detection

OptiLookUp: An Optical ROM-Based Lookup Table Engine for Photonic Accelerators

MU-SHOT-Fi: Self-Supervised Multi-User Wi-Fi Sensing with Source-free Unsupervised Domain Adaptation

SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks

Trees to Flows and Back: Unifying Decision Trees and Diffusion Models

Fair Dataset Distillation via Cross-Group Barycenter Alignment

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm

Dual-Anchoring: Addressing State Drift in Vision-Language Navigation

Prototype-Grounded Concept Models for Verifiable Concept Alignment

TIP: Token Importance in On-Policy Distillation

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

Black-Box Optimization From Small Offline Datasets via Meta Learning with Synthetic Tasks

Beyond LLMs, Sparse Distributed Memory, and Neuromorphics <A Hyper-Dimensional SRAM-CAM "VaCoAl" for Ultra-High Speed, Ultra-Low Power, and Low Cost>

Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation

Robust Reasoning Benchmark

Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

Learning Without Losing Identity: Capability Evolution for Embodied Agents

Energy-based Tissue Manifolds for Longitudinal Multiparametric MRI Analysis

The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading

Rethinking Forward Processes for Score-Based Nonlinear Data Assimilation in High Dimensions

Rule-State Inference (RSI): A Bayesian Framework for Compliance Monitoring in Rule-Governed Domains

MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels

Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications

Beyond One-Size-Fits-All: Adaptive Subgraph Denoising for Zero-Shot Graph Learning with Large Language Models

PenTiDef: Decentralized Federated Intrusion Detection System with Differential Privacy and Latent-Space Defense via Blockchain Coordination in IIoT

VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

Transporting Task Vectors across Different Architectures without Training

SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents

Revisiting Regularized Policy Optimization for Stable and Efficient Reinforcement Learning in Two-Player Games

SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reliable Healthcare Facility Visit Prediction

Billion-Scale Graph Foundation Models

FlashSinkhorn: IO-Aware Entropic Optimal Transport on GPU

Unifying Masked Diffusion Models with Various Generation Orders and Beyond

LiteCoOp: Lightweight Multi-LLM Shared-Tree Reasoning for Model-Serving Compiler Optimizations

VDE Bench: Evaluating The Capability of Image Editing Models to Modify Visual Documents

MonoScale: Scaling Multi-Agent System with Monotonic Improvement

StreetDesignAI: Broadening Designer Perspectives Through Multi-Persona Evaluation of Cycling Infrastructure

Training-Trajectory-Aware Token Selection

AutoBaxBuilder: Bootstrapping Code Security Benchmarking

Semantic Attacks on Tool-Augmented LLMs: Securing the Model Context Protocol Against Descriptor-Level Manipulation

CentaurEval: Benchmarking Human-in-the-Loop Value in Agentic Coding

Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models

Twice Sequential Monte Carlo for Tree Search

Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective

Atom-anchored LLMs speak Chemistry: A Retrosynthesis Demonstration

Event-Aware Prompt Learning for Dynamic Graphs

CacheClip: Accelerating RAG with Effective KV Cache Reuse

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

A KL-regularization Framework for Learning to Plan with Adaptive Priors

Decision Potential Surface: A Theoretical and Practical Approximation of Large Language Model Decision Boundary

Go witheFlow: Real-time Emotion Driven Audio Effects Modulation

DecepChain: Inducing Deceptive Reasoning in Large Language Models

Exploring How Audio Effects Alter Emotion with Foundation Models

Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI

STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking

Memory-Efficient LLM Pretraining via Minimalist Optimizer Design

SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning

Self-orthogonalizing attractor neural networks emerging from the free energy principle

Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models

When Grammar Guides the Attack: Uncovering Control-Plane Vulnerabilities in LLMs with Structured Output

Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark

Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior

FP4 All the Way: Fully Quantized Training of LLMs

작성자

Haebom

카테고리

Empty

저자

Brian Chmiel, Maxim Fishman, Ron Banner, Daniel Soudry

개요

본 논문은 최초로 가중치, 활성화 함수, 기울기 모두에 주로 4비트 부동소수점(FP4) 정밀도를 사용하여 2000억 토큰까지의 데이터셋을 대상으로 대규모 언어 모델(LLM)의 완전 양자화 훈련(FQT)을 시연합니다. 블록 크기, 스케일링 형식, 반올림 방법 등 FP4에 대한 주요 설계 선택 사항을 광범위하게 조사하여 16개의 FP4 값(E2M1) 블록이 E4M3으로 표현된 스케일을 공유하는 NVFP4 형식이 최적의 결과를 제공함을 보여줍니다. 역전파 및 업데이트 단계에는 확률적 반올림을, 순전파에는 가장 가까운 값으로 반올림하여 안정성을 높였습니다. 또한, 기울기 노름이 양자화 잡음의 약 $\sqrt{3}$ 배 이하로 떨어지면 양자화 훈련의 효과가 감소하는 이론적 및 실험적 임계값을 확인했습니다. 이러한 통찰력을 활용하여 256개의 Intel Gaudi2 가속기를 사용하여 70억 매개변수 모델을 성공적으로 훈련했습니다. 결과적으로 FP4로 훈련된 모델은 표준 BF16 기준과 비슷한 하위 작업 성능을 달성하여 FP4 훈련이 대규모 LLM 훈련에 대한 실용적이고 매우 효율적인 접근 방식임을 확인했습니다. 레퍼런스 구현은 https://github.com/Anonymous1252022/fp4-all-the-way 에서 제공됩니다.