/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models
Comparative Analysis of Transformer Models in Disaster Tweet Classification for Public Safety
Emergent Social Dynamics of LLM Agents in the El Farol Bar Problem
The Good, the Bad and the Constructive: Automatically Measuring Peer Review's Utility for Authors
Energy Landscapes Enable Reliable Abstention in Retrieval-Augmented Large Language Models for Healthcare
DEXOP: A Device for Robotic Transfer of Dexterous Human Manipulation
Reinforcement Learning for Robust Ageing-Aware Control of Li-ion Battery Systems with Data-Driven Formal Verification
RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models
Gravity Well Echo Chamber Modeling With An LLM-Based Confirmation Bias Model
Insights from Gradient Dynamics: Gradient Autoscaled Normalization
Efficient Virtuoso: A Latent Diffusion Transformer Model for Goal-Conditioned Trajectory Planning
MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds
DCPO: Dynamic Clipping Policy Optimization
DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving
Can AI be Auditable?
Robotic Fire Risk Detection based on Dynamic Knowledge Graph Reasoning: An LLM-Driven Approach with Graph Chain-of-Thought
Navigating the EU AI Act: Foreseeable Challenges in Qualifying Deep Learning-Based Automated Inspections of Class III Medical Devices
Complementary Learning System Empowers Online Continual Learning of Vehicle Motion Forecasting in Smart Cities
MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts
QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning
MovieCORE: COgnitive REasoning in Movies
Automatic Prompt Optimization with Prompt Distillation
Membership Inference Attacks on LLM-based Recommender Systems
Leveraging Large Language Models for Accurate Sign Language Translation in Low-Resource Scenarios
Group Expectation Policy Optimization for Heterogeneous Reinforcement Learning
Convergence and Generalization of Anti-Regularization for Parametric Models
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning
Bridging Generalization and Personalization in Human Activity Recognition via On-Device Few-Shot Learning
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering
Using Artificial Intuition in Distinct, Minimalist Classification of Scientific Abstracts for Management of Technology Portfolios
Semantic Discrepancy-aware Detector for Image Forgery Identification
Quantum-Efficient Reinforcement Learning Solutions for Last-Mile On-Demand Delivery
BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models
Uncertainty-Driven Reliability: Selective Prediction and Trustworthy Deployment in Modern Machine Learning
Real-Time Analysis of Unstructured Data with Machine Learning on Heterogeneous Architectures
VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion
An Efficient Continuous-Time MILP for Integrated Aircraft Hangar Scheduling and Layout
DIRF: A Framework for Digital Identity Protection and Clone Governance in Agentic AI Systems
COLLAGE: Adaptive Fusion-based Retrieval for Augmented Policy Learning
Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and Context-Aware KGQA
Nested Graph Pseudo-Label Refinement for Noisy Label Domain Adaptation Learning
LanternNet: A Hub-and-Spoke System to Seek and Suppress Spotted Lanternfly Populations
RecPS: Privacy Risk Scoring for Recommender Systems
Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)
Role-Playing LLM-Based Multi-Agent Support Framework for Detecting and Addressing Family Communication Bias
PLAME: Lightweight MSA Design Advances Protein Folding From Evolutionary Embeddings
Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles
Leveraging Out-of-Distribution Unlabeled Images: Semi-Supervised Semantic Segmentation with an Open-Vocabulary Model
Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs
Precise Bayesian Neural Networks
Transit for All: Mapping Equitable Bike2Subway Connection using Region Representation Learning
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models
Image Segmentation with Large Language Models: A Survey with Perspectives for Intelligent Transportation Systems
SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies
Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models
Bipedal Balance Control with Whole-body Musculoskeletal Standing and Falling Simulations
Scaling Laws of Motion Forecasting and Planning - Technical Report
Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning
Who Gets Credit or Blame? Attributing Accountability in Modern AI Systems
Unsupervised Evolutionary Cell Type Matching via Entropy-Minimized Optimal Transport
Multi-output Classification using a Cross-talk Architecture for Compound Fault Diagnosis of Motors in Partially Labeled Condition
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
Steering LLM Reasoning Through Bias-Only Adaptation
MetaSTH-Sleep: Towards Effective Few-Shot Sleep Stage Classification for Health Management with Spatial-Temporal Hypergraph Enhanced Meta-Learning
InterFeat: A Pipeline for Finding Interesting Scientific Features
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation
Advancing Scientific Text Classification: Fine-Tuned Models with Dataset Expansion and Hard-Voting
Test It Before You Trust It: Applying Software Testing for Trustworthy In-context Learning
Action Flow Matching for Continual Robot Learning
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Byzantine-Robust Federated Learning Using Generative Adversarial Networks
Beyond SHAP and Anchors: A large-scale experiment on how developers struggle to design meaningful end-user explanations
VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making
DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation
Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning
Assistance or Disruption? Exploring and Evaluating the Design and Trade-offs of Proactive AI Programming Support
Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models
CHIRLA: Comprehensive High-resolution Identification and Re-identification for Large-scale Analysis
Kolmogorov-Arnold Fourier Networks
Position: LLMs Can be Good Tutors in English Education
Predicting Steady-State Behavior in Complex Networks with Graph Neural Networks
Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models
Motion-enhanced Cardiac Anatomy Segmentation via an Insertable Temporal Attention Module
Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
DispFormer: A Pretrained Transformer Incorporating Physical Constraints for Dispersion Curve Inversion
Integrating Evidence into the Design of XAI and AI-based Decision Support Systems: A Means-End Framework for End-users in Construction
Revealing the impact of synthetic native samples and multi-tasking strategies in Hindi-English code-mixed humour and sarcasm detection
Neural Port-Hamiltonian Differential Algebraic Equations for Compositional Learning of Electrical Networks
Sequential Controlled Langevin Diffusions
Privacy-Preserving Federated Learning via Homomorphic Adversarial Networks
CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives
Lessons from Studying Two-Hop Latent Reasoning
HierTOD: A Task-Oriented Dialogue System Driven by Hierarchical Goals
Flexible Coded Distributed Convolution Computing for Enhanced Straggler Resilience and Numerical Stability in Distributed CNNs
FACEGroup: Feasible and Actionable Counterfactual Explanations for Group Fairness
ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries
Load more
When Language Overrules: Revealing Text Dominance in Multimodal Large Language Models
Created by
Haebom
作者
Huyu Wu, Meng Tang, Xinhan Zheng, Haiyun Jiang
概要
本論文は、多様なモダリティ(画像、ビデオ、オーディオ、時系列、グラフ)を処理するマルチモーダル大規模言語モデル(MLLM)の「テキスト支配(text dominance)」現象を体系的に分析した研究です。テキスト支配とは、MLLMが他のモダリティを十分に活用せずにテキストに過度に依存する現象を指します。研究では、モダリティ支配指数(MDI)と州効率指数(AEI)という2つの評価指標を提示し、さまざまなモダリティでテキスト支配現象がかなり広範囲に現れることを明らかにしました。テキスト支配の原因としては、非テキストモダリティのトークンの重複による注意力の希釈、融合アーキテクチャ設計の影響、テキスト入力を好む作業の公式化などを提示し、トークン圧縮という簡単な方法でモデルの注意力の不均衡を効果的に解決できることを示した(例: LLaVA-7BのMDIを0.8.3この研究は、よりバランスの取れた包括的なマルチモーダル言語モデルの開発のための基盤を提供します。
Takeaways、Limitations
•
Takeaways:
◦
マルチモーダル大規模言語モデルにおけるテキスト支配現象の重大性と広範性を最初に体系的に明らかにした。
◦
テキスト支配現象の原因を多角的に分析し、その解決策を提示。
◦
提示された評価指標(MDI、AEI)とトークン圧縮方法は、将来のマルチモーダルモデルの開発と評価に役立ちます。
◦
よりバランスの取れた包括的なマルチモーダル言語モデル開発のための重要なマイルストーンを提示します。
•
Limitations:
◦
提示されたトークン圧縮方法の一般性と他のモデル/データセットへの適用性に関するさらなる研究が必要です。
◦
テキスト支配現象の原因分析は、より深い研究を通して補完する必要があります。
◦
さまざまな融合アーキテクチャと作業の定式化の包括的な分析が不足する可能性があります。
PDFを見る
Made with Slashpage