/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Interleaving Reasoning for Better Text-to-Image Generation
Barycentric Neural Networks and Length-Weighted Persistent Entropy Loss: A Green Geometric and Topological Framework for Function Approximation
Signal-Based Malware Classification Using 1D CNNs
Toward a Metrology for Artificial Intelligence: Hidden-Rule Environments and Reinforcement Learning
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding
No Thoughts Just AI: Biased LLM Hiring Recommendations Alter Human Decision Making and Limit Human Autonomy
What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning?
HodgeFormer: Transformers for Learnable Operators on Triangular Meshes through Data-Driven Hodge Matrices
CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models
Pilot Study on Generative AI and Critical Thinking in Higher Education Classrooms
ZkLoRA: Fine-Tuning Large Language Models with Verifiable Security via Zero-Knowledge Proofs
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control
Ultra-Low-Latency Spiking Neural Networks with Temporal-Dependent Integrate-and-Fire Neuron Model for Objects Detection
Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models
A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems
Trust but Verify! A Survey on Verification Design for Test-time Scaling
Research on Conversational Recommender System Considering Consumer Types
A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges
Grid-Agent: An LLM-Powered Multi-Agent System for Power Grid Control
Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM
A Mixed User-Centered Approach to Enable Augmented Intelligence in Intelligent Tutoring Systems: The Case of MathAIde app
Meaning-infused grammar: Gradient Acceptability Shapes the Geometric Representations of Constructions in LLMs
MoRPI-PINN: A Physics-Informed Framework for Mobile Robot Pure Inertial Navigation
Conditional Video Generation for High-Efficiency Video Compression
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges
Grounding DINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models
Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting
From Images to Insights: Explainable Biodiversity Monitoring with Plain Language Habitat Explanations
HueManity: Probing Fine-Grained Visual Perception in MLLMs
Understanding Behavioral Metric Learning: A Large-Scale Study on Distracting Reinforcement Learning Environments
Localizing Persona Representations in LLMs
Multi-output Classification using a Cross-talk Architecture for Compound Fault Diagnosis of Motors in Partially Labeled Condition
SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning
Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives
Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts
Visuospatial Cognitive Assistant
Overflow Prevention Enhances Long-Context Recurrent LLMs
GRADA: Graph-based Reranking against Adversarial Documents Attack
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
Comparative Analysis of Lightweight Deep Learning Models for Memory-Constrained Devices
Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?
Llama-Nemotron: Efficient Reasoning Models
Tripartite-GraphRAG via Plugin Ontologies
DMS-Net:Dual-Modal Multi-Scale Siamese Network for Binocular Fundus Image Classification
Enhancing Traffic Incident Response through Sub-Second Temporal Localization with HybridMamba
Audio-centric Video Understanding Benchmark without Text Shortcut
The Model Hears You: Audio Language Model Deployments Should Consider the Principle of Least Privilege
Involution and BSConv Multi-Depth Distillation Network for Lightweight Image Super-Resolution
DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection
VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification
Cardiverse: Harnessing LLMs for Novel Card Game Prototyping
TrojanRobot: Physical-world Backdoor Attacks Against VLM-based Robotic Manipulation
Automatically Detecting Online Deceptive Patterns
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning
CTourLLM: Enhancing LLMs with Chinese Tourism Knowledge
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
MSRFormer: Road Network Representation Learning using Multi-scale Feature Fusion of Heterogeneous Spatial Interactions
Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArts
EvoEmo: Towards Evolved Emotional Policies for LLM Agents in Multi-Turn Negotiation
AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning
MaRVL-QA: A Benchmark for Mathematical Reasoning over Visual Landscapes
Benchmarking for Domain-Specific LLMs: A Case Study on Academia and Beyond
CountQA: How Well Do MLLMs Count in the Wild?
ASP-FZN: A Translation-based Constraint Answer Set Solver
MedGellan: LLM-Generated Medical Guidance to Support Physicians
Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component-Level, Event-Centric Approach to Legal Knowledge Graphs
Addition in Four Movements: Mapping Layer-wise Information Trajectories in LLMs
GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning
Automatic Reward Shaping from Confounded Offline Data
Visualizing Thought: Conceptual Diagrams Enable Robust Combinatorial Planning in LMMs
COMMA: A Communicative Multimodal Multi-Agent Benchmark
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism
Self-Emotion-Mediated Exploration in Artificial Intelligence Mirrors: Findings from Cognitive Psychology
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search
ACE and Diverse Generalization via Selective Disagreement
Bringing Multi-Modal Multi-Task Federated Foundation Models to Education Domain: Prospects and Challenges
ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation
Breaking Android with AI: A Deep Dive into LLM-Powered Exploitation
Accelerating Local AI on Consumer GPUs: A Hardware-Aware Dynamic Strategy for YOLOv10s
GENUINE: Graph Enhanced Multi-level Uncertainty Estimation for Large Language Models
Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation
Uncovering Scaling Laws for Large Language Models via Inverse Problems
Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning
Deep Learning-Based Burned Area Mapping Using Bi-Temporal Siamese Networks and AlphaEarth Foundation Datasets
Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost
Forecasting Russian Equipment Losses Using Time Series and Deep Learning Models
Enhanced SegNet with Integrated Grad-CAM for Interpretable Retinal Layer Segmentation in OCT Images
Individual utilities of life satisfaction reveal inequality aversion unrelated to political alignment
XSRD-Net: EXplainable Stroke Relapse Detection
Are LLMs Enough for Hyperpartisan, Fake, Polarized and Harmful Content Detection? Evaluating In-Context Learning vs. Fine-Tuning
What Were You Thinking? An LLM-Driven Large-Scale Study of Refactoring Motivations in Open-Source Projects
Spectral and Rhythm Feature Performance Evaluation for Category and Class Level Audio Classification with Deep Convolutional Neural Networks
Enhancing Online Learning by Integrating Biosensors and Multimodal Learning Analytics for Detecting and Predicting Student Behavior: A Review
Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems
Load more
MSRFormer: Road Network Representation Learning using Multi-scale Feature Fusion of Heterogeneous Spatial Interactions
Created by
Haebom
作者
Jian Yang, Jiahui Wu, Li Fang, Hongchao Fan, Bianying Zhang, Huijie Zhao, Guangyi Yang, Rui Xin, Xiong You
概要
本稿では、深層学習を利用して道路ネットワークデータをベクトル表現に変換する既存の方法の限界を克服するために、マルチスケール空間相互作用を統合する新しい道路ネットワーク表現学習フレームワークであるMSRFormerを提案します。道路ネットワークの不均一性と階層的性質を考慮して、大規模軌跡データセットから小規模特徴を抽出する空間フロー合成積とスケール依存空間相互作用領域を識別する手法を使用します。グラフトランスを活用して、マルチスケールの複雑な空間依存性を効果的に捕捉し、残差接続を通じて空間相互作用特徴を融合して最終道路ネットワーク表現を導き出します。 2つの実際のデータセットを使用した検証の結果、MSRFormerは、従来の方法よりも2つの道路ネットワーク分析作業でパフォーマンスが優れていることを示しました。軌跡データを統合することは交通関連の作業にとってより有利であり、スケール効果と空間相互作用の流れの不均一性との間の相互作用パターンを強調する。
Takeaways、Limitations
•
Takeaways:
◦
マルチスケール空間相互作用を考慮した道路ネットワーク表現学習フレームワークMSRFormerの提示
◦
軌跡データを活用して交通関連の作業性能を向上
◦
複雑な道路ネットワーク構造における既存の方法に対するパフォーマンスの向上(最大16%)。
◦
スケール効果と流動不均一性との間の相互作用パターンに関する洞察を提供する
◦
作業に依存しない道路ネットワーク表現モデルを開発するための実用的なフレームワークを提供します。
•
Limitations:
◦
提示された2つの実際のデータセット以外のデータセットの一般化パフォーマンス検証が必要です。
◦
MSRFormerの計算の複雑さと効率に関するさらなる分析が必要
◦
さまざまなタイプの道路ネットワーク解析作業に対する適用性のさらなる研究が必要
PDFを見る
Made with Slashpage