/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Dehazing Light Microscopy Images with Guided Conditional Flow Matching: finding a sweet spot between fidelity and realism
EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework
Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation
HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation
Potemkin Understanding in Large Language Models
OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs
How to Retrieve Examples in In-context Learning to Improve Conversational Emotion Recognition using Large Language Models?
Position: Machine Learning Conferences Should Establish a "Refutations and Critiques" Track
Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis
Improving Student-AI Interaction Through Pedagogical Prompting: An Example in Computer Science Education
GLIMPSE: Gradient-Layer Importance Mapping for Prompted Visual Saliency Explanation for Generative LVLMs
Automatic Depression Assessment using Machine Learning: A Comprehensive Survey
Generalizing vision-language models to novel domains: A comprehensive survey
Comparative Evaluation of ChatGPT and DeepSeek Across Key NLP Tasks: Strengths, Weaknesses, and Domain-Specific Performance
AI-Generated Song Detection via Lyrics Transcripts
KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation
Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning
Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion
Aligning Evaluation with Clinical Priorities: Calibration, Label Shift, and Error Costs
Value-Free Policy Optimization via Reward Partitioning
VFEFL: Privacy-Preserving Federated Learning against Malicious Clients via Verifiable Functional Encryption
Enabling Precise Topic Alignment in Large Language Models Via Sparse Autoencoders
Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following
StepProof: Step-by-step verification of natural language mathematical proofs
Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment
Improved Supervised Fine-Tuning for Large Language Models to Mitigate Catastrophic Forgetting
SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving
FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed
VeriLoC: Line-of-Code Level Prediction of Hardware Design Quality from Verilog Code
Multi Layered Autonomy and AI Ecologies in Robotic Art Installations
Bridging Subjective and Objective QoE: Operator-Level Aggregation Using LLM-Based Comment Analysis and Network MOS Comparison
Quantum computing and artificial intelligence: status and perspectives
Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization
A Large Language Model-Enabled Control Architecture for Dynamic Resource Capability Exploration in Multi-Agent Manufacturing Systems
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
WeatherEdit: Controllable Weather Editing with 4D Gaussian Field
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Error Optimization: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks
TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks
Super-Resolution Generative Adversarial Networks based Video Enhancement
Object detection in adverse weather conditions for autonomous vehicles using Instruct Pix2Pix
INSIGHT: Bridging the Student-Teacher Gap in Times of Large Language Models
SConU: Selective Conformal Uncertainty in Large Language Models
MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation
Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization
Achieving binary weight and activation for LLMs using Post-Training Quantization
A Consequentialist Critique of Binary Classification Evaluation Practices
Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean Capabilities of Language Models
Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards
FedMM-X: A Trustworthy and Interpretable Framework for Federated Multi-Modal Learning in Dynamic Environments
Automating Adjudication of Cardiovascular Events Using Large Language Models
ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
Visual Position Prompt for MLLM ベースの Visual Grounding
Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
Privacy Ethics Alignment in AI: A Stakeholder-Centric Framework for Ethical AI
Characterizing GPU Resilience and Impact on AI/HPC Systems
Explainable Sentiment Analysis with DeepSeek-R1: Performance, Efficiency, and Few-Shot Learning
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
The Problem of the Priors, or Posteriors?
Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy
What can large language models do for sustainable food?
Enough Coin Flips Can Make LLMs Act Bayesian
How to Move Your Dragon: Text-to-Motion Synthesis for Large-Vocabulary Objects
Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
Space-Time Graphs of Convex Sets for Multi-Robot Motion Planning
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning
LNUCB-TA: Linear-nonlinear Hybrid Bandit Learning with Temporal Attention
No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models
Retrieval Augmented Generation Based LLM Evaluation For Protocol State Machine Inference With Chain-of-Thought Reasoning
A general language model for peptide identification
Cluster and Predict Latent Patches for Improved Masked Image Modeling
Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks
KMI: A Dataset of Korean Motivational Interviewing Dialogues for Psychotherapy
Mechanistic Interpretability of Emotion Inference in Large Language Models
Multimodal Medical Code Tokenizer
Time to Rethink AI for Combinatorial Optimization: Classical Algorithms Remain Tough to Match
Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models
Environment-Driven Online LiDAR-Camera Extrinsic Calibration
Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation
DReSS: Data-driven Regularized Structured Streamlining for Large Language Models
Towards Automated Self-Supervised Learning for Truly Unsupervised Graph Anomaly Detection
Adaptive Rank Allocation for Federated Parameter-Efficient Fine-Tuning of Language Models
DisCoPatch: Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection
An Investigation into Seasonal Variations in Energy Forecasting for Student Residences
Efficiently Serving Large Multimodal Models Using EPD Disaggregation
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
AlignGuard: Scalable Safety Alignment for Text-to-Image Generation
A Library for Learning Neural Operators
ZipAR: Parallel Auto-regressive Image Generation through Spatial Locality
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
Recommender Systems for Good (RS4Good): Survey of Use Cases and a Call to Action for Research that Matters
Foundation Models for Wearable Movement Data in Mental Health Research
GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs
Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements
Load more
Sheaf-Based Decentralized Multimodal Learning for Next-Generation Wireless Communication Systems
Created by
Haebom
作者
Abdulmomen Ghalkha, Zhuojun Tian, Chaouki Ben Issaid, Mehdi Bennis
概要
本論文は、さまざまなモダリティの感覚データを収集するエッジデバイス間のインテリジェントなコラボレーションを通じて、環境の理解を高め、意思決定の精度を向上させる大規模な通信システムをカバーします。従来の連合学習(FL)アルゴリズムは通常、単一のモダリティデータセットを考慮し、同じモデルアーキテクチャを必要とし、マルチモーダリティデータに固有の豊富な情報を利用することができず、さまざまなモダリティとさまざまなクライアント機能を持つ実際のシナリオに適用することに制限があります。この問題に対処するために、この論文は、さまざまなモダリティを持つデバイス間のコラボレーションを向上させるために層理論を活用する新しい分散マルチモーダル学習フレームワークであるSheaf-DMFLを提案します。各クライアントは異なるモダリティのローカルフィーチャエンコーダセットを持ち、その出力はタスク固有のレイヤを通過する前に接続されます。同じモダリティのエンコーダはクライアント間で共同学習されますが、レイヤインフラストラクチャを使用してクライアントのタスク固有のレイヤ間の一意の相関関係をキャプチャします。学習能力をさらに向上させるために、各クライアント内でアテンションメカニズムを調整して異なるモダリティ間の相関関係を捉える、Sheaf-DMFL-Attという高度なアルゴリズムを提案します。 Sheaf-DMFL-Attの厳格な収束分析を提供し、理論的保証を確立します。実際のリンクブロック予測とmmWaveビームフォーミングシナリオの広範なシミュレーションにより、これらの異機種無線通信システムで提案されたアルゴリズムの卓越性を実証しています。
Takeaways、Limitations
•
Takeaways:
◦
さまざまなモダリティとクライアント機能を備えた実際のシナリオに適用可能な新しい分散マルチモーダル学習フレームワークであるSheaf-DMFL提案。
◦
層理論を活用して多様なモダリティを持つデバイス間のコラボレーションを改善
◦
アテンションメカニズムを活用して、異なるモダリティ間の相関関係を効果的に捕捉。
◦
提案されたアルゴリズムの収束性を理論的に保証した。
◦
実際のシナリオ(リンクブロック予測とMmWaveビームフォーミング)におけるアルゴリズムの優れた検証
•
Limitations:
◦
提案されたアルゴリズムの実際の環境適用のための追加の実験と分析の必要性
◦
多様なモダリティのデータサイズと分布の違いに対するロバースト性分析が必要
◦
層理論に基づくモデルの複雑さと計算コストの評価が必要
◦
他の分散学習アルゴリズムとの比較解析強化が必要
PDFを見る
Made with Slashpage