/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
VarCoNet: A variability-aware self-supervised framework for functional connectome extraction from resting-state fMRI
KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting
SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment
Pack and Force Your Memory: Long-form and Consistent Video Generation
Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models
Analyzing Latent Concepts in Code Language Models
Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving
DM-Bench: Benchmarking LLMs for Personalized Decision Making in Diabetes Management
YOLO-Based Defect Detection for Metal Sheets
Jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking
SecInfer: Preventing Prompt Injection via Inference-time Scaling
Putnam-like dataset summary: LLMs as mathematical competition contestants
Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
Observation-Free Attacks on Online Learning to Rank
MTRec: Learning to Align with User Preferences via Mental Reward Models
MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANs
When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
Flow-Induced Diagonal Gaussian Processes
Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach
Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection
Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
Scam2Prompt: A Scalable Framework for Auditing Malicious Scam Endpoints in Production LLMs
Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization
STORI: A Benchmark and Taxonomy for Stochastic Environments
A Study on the Framework for Evaluating the Ethics and Trustworthiness of Generative AI
Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
Quantum-RAG and PunGPT2: Advancing Low-Resource Language Generation and Retrieval for the Punjabi Language
Tuning LLM-based Code Optimization via Meta-Prompting: An Industrial Perspective
SBP-YOLO:A Lightweight Real-Time Model for Detecting Speed Bumps and Potholes toward Intelligent Vehicle Suspension Systems
An Architecture for Spatial Networking
A Comprehensive Review on Harnessing Large Language Models to Overcome Recommender System Challenges
First Hallucination Tokens Are Different from Conditional Ones
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Model Parallelism With Subnetwork Data Parallelism
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting
A Survey of Pun Generation: Datasets, Evaluations and Methodologies
Controlled Generation with Equivariant Variational Flow Matching
CAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
Semantic Preprocessing for LLM-based Malware Analysis
Manipulating 3D Molecules in a Fixed-Dimensional E(3)-Equivariant Latent Space
Permissioned LLMs: Enforcing Access Control in Large Language Models
Efficient Preimage Approximation for Neural Network Certification
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model
Pre-training Limited Memory Language Models with Internal and External Knowledge
OT Score: An OT based Confidence Score for Source Free Unsupervised Domain Adaptation
Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments
A Survey of Deep Learning for Complex Speech Spectrograms
Continuous Thought Machines
CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
PropRAG: Guiding Retrieval with Beam Search over Proposition Paths
Activated LoRA: Fine-tuned LLMs for Intrinsics
Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models
Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations
Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement
DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation
A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Rethinking the Vulnerability of Concept Erasure and a New Method
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM トレーニング
MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents
CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification
Graph Neural Networks for Transmission Grid Topology Control: Busbar Information Asymmetry and Heterogeneous Representations
Inferring Pluggable Types with Machine Learning
Optimizing Container Loading and Unloading through Dual-Cycling and Dockyard Rehandle Reduction Using a Hybrid Genetic Algorithm
LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing
Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders
RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives
Unified Domain Adaptive Semantic Segmentation
Do AI Models Perform Human-like Abstract Reasoning Across Modalities?
Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs
Thinkquel: A Model Dedicated to Text-to-dbt Using Synthetic Data and a Span-Aware Objective
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Learning to Interact in World Latent for Team Coordination
Understanding Generative Recommendation with Semantic IDs from a Model-scaling View
GUI-PRA: Process Reward Agent for GUI Tasks
PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning
Efficient & Correct Predictive Equivalence for Decision Trees
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Gala: Global LLM Agents for Text-to-Model Translation
Disentangling Multiplex Spatial-Temporal Transition Graph Representation Learning for Socially Enhanced POI Recommendation
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems
V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving
MIRROR: Modular Internal Processing for Personalized Safety in LLM Dialogue
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
ViLBias: Detecting and Reasoning about Bias in Multimodal Content
OML: A Primitive for Reconciling Open Access with Owner Control in AI Model Distribution
Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes
Load more
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
Created by
Haebom
作者
Ye Li, Yuan Meng, Zewen Sun, Kangye Ji, Chen Tang, Jiajun Fan, Xinzhu Ma, Shutao Xia, Zhi Wang, Wenwu Zhu
概要
Vision-Language-Action(VLA)モデルの高い計算コストと低い実行頻度を解決するために、SP-VLAと呼ばれる統合フレームワークを提案します。これは、モデルスケジューリングとトークンプルーニングを組み合わせてVLAモデルを加速します。具体的には、アクション認識モデルのスケジューリングを通じて時間的冗長性を減らし、空間的に意味のあるデュアル認識トークンプルーニングを通じて視覚的冗長性を排除します。 SP-VLAは、VLAモデルと軽量ジェネレータを動的に切り替えて実行頻度を調整し、重要なアクションと重要な視覚情報に集中するように導き、精度を維持しながら効果的な加速を実現します.実験の結果、LIBEROで1.5倍、SimplerEnvで2.4倍のロスレス加速を達成し、最大6%の平均性能向上を示した。推論の頻度と遅延時間は、SimplerEnvで2.2倍、LIBEROで1.4倍向上しました。
Takeaways、Limitations
•
Takeaways:
◦
VLAモデルの効率を向上させるための新しいフレームワークを提示します。
◦
モデルスケジューリングとトークンプルーニングを組み合わせて時間的および空間的冗長性の両方を解決します。
◦
実験による高加速性能と精度の維持を証明
◦
ロボット制御や自律航行などのリアルタイム作業にVLAモデルを適用可能にする
•
Limitations:
◦
軽量ジェネレータの性能と一般化能力によっては、性能の違いが生じる可能性があります。
◦
モデルスケジューリングとトークンプルーニングの最適パラメータ設定に関するさらなる研究が必要
◦
他のVLAモデルと環境の一般化性能検証が必要です。
PDFを見る
Made with Slashpage