Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

A Theoretical Framework for Environmental Similarity and Vessel Mobility as Coupled Predictors of Marine Invasive Species Pathways

CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field

Node-Based Editing for Multimodal Generation of Text, Audio, Image, and Video

Control Barrier Function for Aligning Large Language Models

A Criminology of Machines

FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels

PoCo: Agentic Proof-of-Concept Exploit Generation for Smart Contracts

Causal Graph Neural Networks for Healthcare

LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis

OceanAI: A Conversational Platform for Accurate, Transparent, Near-Real-Time Oceanographic Insights

Pay for The Second-Best Service: A Game-Theoretic Approach Against Dishonest LLM Providers

Artificial Intelligence in Elementary STEM Education: A Systematic Review of Current Applications and Future Challenges

Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off

TraceTrans: Translation and Spatial Tracing for Surgical Prediction

TowerVision: Understanding and Improving Multilinguality in Vision-Language Models

Gestura: A LVLM-Powered System Bridging Motion and Semantics for Real-Time Free-Form Gesture Understanding

ADPO: Anchored Direct Preference Optimization

Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning

RealDPO: Real or Not Real, that is the Preference

Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning

Mathematics with large language models as provers and verifiers

Integrating Sequential and Relational Modeling for User Events: Datasets and Prediction Tasks

A Systematic Evaluation of Self-Supervised Learning for Label-Efficient Sleep Staging with Wearable EEG

Learning to Navigate Socially Through Proactive Risk Perception

Training Large Language Models To Reason In Parallel With Global Forking Tokens

ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models

HyperAdapt: Simple High-Rank Adaptation

Comparing Computational Pathology Foundation Models using Representational Similarity Analysis

Back to Ear: Perceptually Driven High Fidelity Music Reconstruction

Test-Time Warmup for Multimodal Large Language Models

Zero-Shot Referring Expression Comprehension via Vison-Language True/False Verification

Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications

Balancing Quality and Variation: Spam Filtering Distorts Data Label Distributions

A Multi-target Bayesian Transformer Framework for Predicting Cardiovascular Disease Biomarkers during Pandemics

Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset

NyayaRAG: Realistic Legal Judgment Prediction with RAG under the Indian Common Law System

XRoboToolkit: A Cross-Platform Framework for Robot Teleoperation

Vibe Coding as a Reconfiguration of Intent Mediation in Software Development: Definition, Implications, and Research Agenda

GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units

MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection

Deep Graph Learning for Industrial Carbon Emission Analysis and Policy Impact

Towards Efficient and Accurate Spiking Neural Networks via Adaptive Bit Allocation

Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization

HoliSafe: Holistic Safety Benchmarking and Modeling for Vision-Language Model

Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models

Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data

Hierarchical Retrieval with Evidence Curation for Open-Domain Financial Question Answering on Standardized Documents

Two Causally Related Needles in a Video Haystack

Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics

Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators

But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors

Learning Dynamics of RNNs in Closed-Loop Environments

Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks

Multimodal Cancer Modeling in the Age of Foundation Model Embeddings

Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards

TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models

Efficient Model Development through Fine-tuning Transfer

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Pragmatic Reasoning improves LLM Code Generation

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

How Memory in Optimization Algorithms Implicitly Modifies the Loss

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

Transferable & Stealthy Ensemble Attacks: A Black-Box Jailbreaking Framework for Large Language Models

Understanding Adam Requires Better Rotation Dependent Assumptions

Beyond the Kolmogorov Barrier: A Learnable Weighted Hybrid Autoencoder for Model Order Reduction

Residual Kolmogorov-Arnold Network for Enhanced Deep Learning

Legal Fact Prediction: The Missing Piece in Legal Judgment Prediction

LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users

Projection Methods for Operator Learning and Universal Approximation

Stochastic Diffusion: A Diffusion Probabilistic Model for Stochastic Time Series Forecasting

A Unified Kernel for Neural Network Learning

Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning

Toward Clinically Grounded Foundation Models in Pathology

Seg the HAB: Language-Guided Geospatial Algae Bloom Reasoning and Segmentation

A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning

Structured Debate Improves Corporate Credit Reasoning in Financial AI

Cross-modal Causal Intervention for Alzheimer's Disease Prediction

Evaluating LLM-Contaminated Crowdsourcing Data Without Ground Truth

Style2Code: A Style-Controllable Code Generation Framework with Dual-Modal Contrastive Representation Learning

Building Altruistic and Moral AI Agent with Brain-inspired Emotional Empathy Mechanisms

"Let's Agree to Disagree": Investigating the Disagreement Problem in Explainable AI for Text Summarization

Collaboration Dynamics and Reliability Challenges of Multi-Agent LLM Systems in Finite Element Analysis

Discussion Graph Semantics of First-Order Logic with Equality for Reasoning about Discussion and Argumentation

X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations

Addressing divergent representations from causal interventions on neural networks

Integrating Temporal and Structural Context in Graph Transformers for Relational Deep Learning

LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems

Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics

Alternative Fairness and Accuracy Optimization in Criminal Justice

RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG

Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering

OUNLP at TSAR 2025 Shared Task: Multi-Round Text Simplifier via Code Generation

RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables

Q3R: Quadratic Reweighted Rank Regularizer for Effective Low-Rank Training

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

Created by

Haebom

저자

Jonathan Li, Nasim Farahini, Evgenii Iuliugin, Magnus Vesterlund, Christian Haggstrom, Guangtao Wang, Shubhangi Upasani, Ayush Sachdeva, Rui Li, Faline Fu, Chen Wu, Ayesha Siddiqua, John Long, Tuowen Zhao, Matheen Musaddiq, Hakan Zeffer, Yun Du, Mingran Wang, Qinghua Li, Bo Li, Urmish Thakker, Raghu Prabhakar

개요

1000억 개 이상의 파라미터를 가진 대규모 언어 모델(LLM)과 10만 개 이상의 컨텍스트 길이를 지원하는 모델의 확산으로 인해 대규모 KV 캐시를 지원하기 위한 온칩 메모리 수요가 증가하고 있습니다. StreamingLLM 및 SnapKV와 같은 기술은 모델 정확도를 유지하면서 KV 캐시 크기를 제어하는 방법을 보여줍니다. 그러나 이러한 기술은 vLLM 또는 SGLang과 같은 프레임워크를 사용하는 산업 배포에서 일반적으로 사용되지 않습니다. 본 논문에서는 Llama-3.1-8B-Instruct 및 DeepSeek-R1에 대한 정확도 영향을 탐구하고, 대규모로 배포할 수 있는 KV 캐시 압축 방법인 SnapStream을 개발했습니다. SambaNova SN40L 가속기에서 DeepSeek-671B의 16방향 텐서 병렬 배포에서 SnapStream의 효율성을 입증했으며, 실제 프로덕션 환경에서 128k 컨텍스트 길이와 최대 초당 1832 토큰으로 실행됩니다. SnapStream은 온칩 메모리 사용량을 4배 향상시키고 LongBench-v2, AIME24 및 LiveCodeBench에서 최소한의 정확도 저하를 보입니다.

시사점, 한계점

•

시사점:

◦

SnapStream은 KV 캐시 압축 기술로, 온칩 메모리 사용량을 4배 개선했습니다.

◦

LongBench-v2, AIME24 및 LiveCodeBench에서 최소한의 정확도 저하를 보였습니다.

◦

정적 그래프와 연속 배치 방식을 사용하는 프로덕션 추론 시스템에 희소 KV 어텐션 기술을 처음으로 구현했습니다.

•

한계점:

◦

논문에서 구체적인 한계점은 명시되지 않았습니다.

Made with Slashpage