/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation
Fourier-VLM: Compressing Vision Tokens in the Frequency Domain for Large Vision-Language Models
LAG: Logic-Augmented Generation from a Cartesian Perspective
Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms
FDC-Net: Rethinking the association between EEG artifact removal and multi-dimensional affective computing
Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS
RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory
DS$^2$Net: Detail-Semantic Deep Supervision Network for Medical Image Segmentation
LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations at eBay
When Cars Have Stereotypes: Auditing Demographic Bias in Objects from Text-to-Image Models
HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation
SpectrumFM: Redefining Spectrum Cognition via Foundation Modeling
Dynamic Robot-Assisted Surgery with Hierarchical Class-Incremental Semantic Segmentation
A novel language model for predicting serious adverse event results in clinical trials from their prospective registrations
A Bit of Freedom Goes a Long Way: Classical and Quantum Algorithms for Reinforcement Learning under a Generative Model
ALLoyM: A large language model for alloy phase diagram prediction
Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation
Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration?
SystolicAttention: Fusing FlashAttention within a Single Systolic Array
RAPNet: A Receptive-Field Adaptive Convolutional Neural Network for Pansharpening
AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model
Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching
Speckle2Self: Self-Supervised Ultrasound Speckle Reduction Without Clean Data
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
Addressing The Devastating Effects Of Single-Task Data Poisoning In Exemplar-Free Continual Learning
Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition
Probabilistic Optimality for Inference-time Scaling
ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation
Exploring Adapter Design Tradeoffs for Low Resource Music Generation
CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation
Robust Anomaly Detection in Network Traffic: Evaluating Machine Learning Models on CICIDS2017
Robust Behavior Cloning Via Global Lipschitz Regularization
Granular-Ball-Induced Multiple Kernel K-Means
DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving
MMET: A Multi-Input and Multi-Scale Transformer for Efficient PDEs Solving
A Two-stage Optimization Method for Wide-range Single-electron Quantum Magnetic Sensing
Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting
AI-Generated Compromises for Coalition Formation
MLOps with Microservices: A Case Study on the Maritime Domain
Winner-takes-all for Multivariate Probabilistic Time Series Forecasting
Leaps Beyond the Seen: Reinforced Reasoning Augmented Generation for Clinical Notes
Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification
HERGC: Heterogeneous Experts Representation and Generative Completion for Multimodal Knowledge Graphs
Verbal Werewolf: Engage Users with Verbalized Agentic Werewolf Game Framework
MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection
CADRE: Customizable Assurance of Data Readiness in Privacy-Preserving Federated Learning
FP4 All the Way: Fully Quantized Training of LLMs
Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration
Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization
RIDGECUT: Learning Graph Partitioning with Rings and Wedges
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?
A Multimodal Deep Learning Approach for White Matter Shape Prediction in Diffusion MRI Tractography
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Bidirectional Hierarchical Protein Multi-Modal Representation Learning
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
$\Mu$KE: Matryoshka Unstructured Knowledge Editing of Large Language Models
Learning 3D-Gaussian Simulators from RGB Videos
Learning Adaptive Dexterous Grasping from Single Demonstrations
A Theory of Learning with Autoregressive Chain of Thought
FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
ElementaryNet: A Non-Strategic Neural Network for Predicting Human Behavior in Normal-Form Games
Collective Reasoning Among LLMs: A Framework for Answer Validation Without Ground Truth
Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA
Predicting Depression in Screening Interviews from Interactive Multi-Theme Collaboration
Schema-Guided Scene-Graph Reasoning based on Multi-Agent Large Language Model System
MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization
Mitigating Traffic Oscillations in Mixed Traffic Flow with Scalable Deep Koopman Predictive Control
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint
Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
Ehrenfeucht-Haussler Rank and Chain of Thought
WebWalker: Benchmarking LLMs in Web Traversal
Generative AI for Cel-Animation: A Survey
Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense
MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval
POEX: Towards Policy Executable Jailbreak Attacks Against the LLM-based Robots
B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation
Understanding and Mitigating Memorization in Generative Models via Sharpness of Probability Landscapes
Steering AI-Driven Personalization of Scientific Text for General Audiences
Zero-Shot Voice Conversion via Content-Aware Timbre Ensemble and Conditional Flow Matching
EfficientEQA: An Efficient Approach to Open-Vocabulary Embodied Question Answering
UoMo: A Universal Model of Mobile Traffic Forecasting for Wireless Network Optimization
MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection
Exploring Spatial Representation to Enhance LLM Reasoning in Aerial Vision-Language Navigation
A Closer Look at Machine Unlearning for Large Language Models
In-Situ Fine-Tuning of Wildlife Models in IoT-Enabled Camera Traps for Efficient Adaptation
EEG-Language Pretraining for Highly Label-Efficient Clinical Phenotyping
A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio
Reward-Directed Score-Based Diffusion Models via q-Learning
Chain of Thought Still Thinks Fast: APriCoT Helps with Thinking Slow
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
AI-AI Bias: large language models favor communications generated by large language models
LVBench: An Extreme Long Video Understanding Benchmark
From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks
Fractured Glass, Failing Cameras: Simulating Physics-Based Adversarial Samples for Autonomous Driving Systems
Runtime Monitoring and Enforcement of Conditional Fairness in Generative AIs
On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning
Load more
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
Created by
Haebom
作者
Xianglong Yan, Zhiteng Li, Tianao Zhang, Linghe Kong, Yulun Zhang, Xiaokang Yang
概要
本論文では,長文文脈推論における効率性を高めるために,Key-Value(KV)キャッシュのメモリ使用量を低減する新しい事後学習KVキャッシュ圧縮方法であるReCalKVを提案する.既存の方法の追加操作や高圧縮率でのパフォーマンス低下の問題を解決するために、KeyとValueの役割と重要性の違いを考慮して、それぞれ異なる圧縮戦略を使用します。 Keyの場合、Head-wise Similarity-aware Reordering(HSR)を使用して同様のヘッドをクラスタ化し、グループ化されたSVDを適用して追加の演算なしで精度を維持し、ValueについてはOffline Calibration and Matrix Fusion(OCMF)を介して追加の演算なしで精度を維持します。実験の結果、ReCalKVは従来の低次元圧縮方法を上回り、最小限の性能損失で高い圧縮率を達成することを示しています。
Takeaways、Limitations
•
Takeaways:
◦
KeyとValueに異なる圧縮戦略を適用することで、長文脈推論の効率を高めるための新しい方法を提示します。
◦
既存の方法のLimitationsである追加演算と高圧縮率での性能低下の問題を効果的に解決
◦
最小限の性能損失で高い圧縮率を達成し、メモリ使用量を大幅に削減。
◦
公開されたコードを通じて再現性を確保。
•
Limitations:
◦
ReCalKVのパフォーマンス向上が特定のLLMアーキテクチャまたはデータセットに限定される可能性。
◦
他の圧縮方法との比較分析がより深く行われる必要がある。
◦
事後学習方式なので、初期学習過程の効率性を考慮する必要がある。
PDFを見る
Made with Slashpage