[공지사항]을 빙자한 안부와 근황
Show more
/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Eye-tracked Virtual Reality: A Comprehensive Survey on Methods and Privacy Challenges
From Roots to Rewards: Dynamic Tree Reasoning with RL
Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light
Instance space analysis of the capacitated vehicle routing problem
Multi-Agent LLMs as Ethics Advocates for AI-Based Systems
GATSim: Urban Mobility Simulation with Generative Agents
Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Strategic Reflectivism In Intelligent Systems
SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator
What the F*ck Is Artificial General Intelligence?
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios
To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization
BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems
UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception
CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis
Toward Temporal Causal Representation Learning with Tensor Decomposition
Kolmogorov Arnold Networks (KANs) for Imbalanced Data - An Empirical Perspective
NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) トラック
Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment
The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?
DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits
Edge Intelligence with Spiking Neural Networks
VLA-Mark: A cross modal watermark for large vision-language alignment model
Noradrenergic-inspired gain modulation attenuates the stability gap in joint training
A multi-strategy improved snake optimizer for 3-dimensional UAV path planning and engineering problems
Photonic Fabric Platform for AI Accelerators
OrthoInsight: Rib Fracture Diagnosis and Report Generation Based on Multi-Modal Large Models
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
A segmented robot grasping perception neural network for edge AI
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation
Exploiting Primacy Effect To Improve Large Language Models
Generalist Forecasting with Frozen Video Models via Latent Diffusion
Convergent transformations of visual representation in brains and models
Preprint: Did I Just Browse A Website Written by LLMs?
The Levers of Political Persuasion with Conversational AI
Political Leaning and Politicalness Classification of Texts
Self-supervised learning on gene expression data
Using LLMs to identify features of personal and professional skills in an open-response situational judgment test
Real-Time Fusion of Visual and Chart Data for Enhanced Maritime Vision
When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models
SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection
Scalable Submodular Policy Optimization via Pruned Submodularity Graph
RAG-based Architectures for Drug Side Effect Retrieval in LLMs
Team of One: Cracking Complex Video QA with Model Synergy
Food safety trends across Europe: insights from the 392-million-entry CompreHensive European Food Safety (CHEFS) database
One Step Closer: Creating the Future to Boost Monocular Semantic Scene Completion
Localized FNO for Spatiotemporal Hemodynamic Upsampling in Aneurysm MRI
Learning Spectral Diffusion Prior for Hyperspectral Image Reconstruction
Search-Optimized Quantization in Biomedical Ontology Alignment
SamGoG: A Sampling-Based Graph-of-Graphs Framework for Imbalanced Graph Classification
Can Synthetic Images Conquer Forgetting? Beyond Unexplored Doubts in Few-Shot Class-Incremental Learning
AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework
Point of Interest Recommendation: Pitfalls and Viable Solutions
Binarizing Physics-Inspired GNNs for Combinatorial Optimization
LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues
HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors
When Person Re-Identification Meets Event Camera: A Benchmark Dataset and An Attribute-guided Re-Identification Framework
Improved particle swarm optimization algorithm: multi-target trajectory optimization for swarm drones
A Comprehensive Review of Transformer-based language models for Protein Sequence Analysis and Design
Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques
Seed-X: Building Strong Multilingual Translation LLM with 7B パラメータ
Linguistic and Embedding-Based Profiling of Texts generated by Humans and Large Language Models
BreastSegNet: Multi-label Segmentation of Breast MRI
GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention
Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
Apple Intelligence Foundation Language Models: Tech Report 2025
Change of Thought: Adaptive Test-Time Computation
Time Series Forecastability Measures
Reading Between the Lines: Combining Pause Dynamics and Semantic Coherence for Automated Assessment of Thought Disorder
Loss-Complexity Landscape and Model Structure Functions
Acoustic Index: A Novel AI-Driven Parameter for Cardiac Disease Risk Stratification Using Echocardiography
Humans learn to prefer trustworthy AI over human partners
PHASE: Passive Human Activity Simulation Evaluation
AI-Assisted Fixes to Code Review Comments at Scale
Neural Architecture Search with Mixed Bio-inspired Learning Rules
ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations
Graph Neural Network Surrogates for Contacting Deformable Bodies with Necessary and Sufficient Contact Detection
"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
CaSTFormer: Causal Spatio-Temporal Transformer for Driving Intention Prediction
Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace Complexity
AI-ming backwards: Vanishing archaeological landscapes in Mesopotamia and automatic detection of sites on CORONA imagery
Soft-ECM: An extension of Evidential C-Means for complex data
Single- to multi-fidelity history-dependent learning with uncertainty quantification and disentanglement: application to data-driven constitutive modeling
SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection
Gauge Flow Models
Aligning Knowledge Graphs and Language Models for Factual Accuracy
Causal Language Control in Multilingual Transformers via Sparse Feature Steering
A Deep Learning-Based Ensemble System for Automated Shoulder Fracture Detection in Clinical Radiographs
IConMark: Robust Interpretable Concept-Based Watermark For AI Images
Mitigating Stylistic Biases of Machine Translation Systems via Monolingual Corpora Only
TopicImpact: Improving Customer Feedback Analysis with Opinion Units for Topic Modeling and Star-Rating Prediction
Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models
Persona-Based Synthetic Data Generation Using Multi-Stage Conditioning with Large Language Models for Emotion Recognition
Smart Routing for Multimodal Video Retrieval: When to Search What
Enhancing Breast Cancer Detection with Vision Transformers and Graph Neural Networks
Transformer-Based Framework for Motion Capture Denoising and Anomaly Detection in Medical Rehabilitation
Load more
High-Throughput LLM inference on Heterogeneous Clusters
Created by
Haebom
作者
Yi Xiong, Jinqi Huang, Wenjie Huang, Xuebing Yu, Entong Li, Zhixiong Ning, Jinhua Zhou, Li Zeng, Xin Chen
概要
本論文では、異機種クラスターでの大規模言語モデル(LLM)推論サービスのための高スループット推論システムを提案します。このシステムは、まずリソース量と予想スループットをモデル化し、フルナビゲーション技術を使用して展開構成を最適化します。第二に、異なるインスタンスの異なる処理能力を十分に考慮する新しい要求スケジューリングメカニズムを提案する。実験の結果、提案されたスケジューラは、2つの異機種クラスターのスループットをそれぞれ122.5%と33.6%向上させることを示しました。主な課題としては、異機種間クラスターのさまざまなデプロイメント構成によるパフォーマンスの違いと、インスタンス固有の処理能力の違いによる効率的な要求のスケジューリングが困難であることが挙げられます。
Takeaways、Limitations
•
Takeaways:
◦
異機種クラスタ環境におけるLLM推論サービスのスループットを大幅に向上させる効果的なシステムを提示する。
◦
フルナビゲーション技術を使用したデプロイメント構成の最適化とインスタンス固有の処理能力を考慮したスケジューリングメカニズムは、実質的なパフォーマンス向上をもたらします。
◦
提案されたシステムは、LLM推論サービスのコスト削減と作業処理速度の向上に貢献できます。
•
Limitations:
◦
フルナビゲーション技術は、クラスタ規模が大きくなるにつれて計算コストが指数関数的に増加する可能性があります。
◦
提案されたスケジューリングメカニズムのパフォーマンスは、インスタンスの処理能力を正確に予測することに依存します。予測誤差はパフォーマンスの低下を引き起こす可能性があります。
◦
様々な種類の異機種クラスタ環境に対する一般化の可能性に関するさらなる研究が必要である。
PDFを見る
Made with Slashpage