/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Dense Video Understanding with Gated Residual Tokenization
Machines are more productive than humans until they aren't, and vice versa
BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
Exploring Data and Parameter Efficient Strategies for Arabic Dialect Identifications
The threat of analytic flexibility in using large language models to simulate human data: A call to attention
Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study
A Graph-Based Approach to Alert Contextualisation in Security Operations Centres
FunAudio-ASR Technical Report
Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering
Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models
Pluralistic Alignment for Healthcare: A Role-Driven Framework
ALIGNS: Unlocking nomological networks in psychological measurement through a large language model
A Survey of Reinforcement Learning for Large Reasoning Models
Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network
Reconstruction Alignment Improves Unified Multimodal Models
Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models
FASL-Seg: Anatomy and Tool Segmentation of Surgical Scenes
Dual-Mode Deep Anomaly Detection for Medical Manufacturing: Structural Similarity and Feature Distance
Exploit Tool Invocation Prompt for Tool Behavior Hijacking in LLM-Based Agentic System
Measuring the Measures: Discriminative Capacity of Representational Similarity Metrics Across Model Families
AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting
Ensemble of Pathology Foundation Models for MIDOG 2025 Track 2: Atypical Mitosis Classification
Deep Learning-Driven Multimodal Detection and Movement Analysis of Objects in Culinary
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
MovieCORE: COgnitive REasoning in Movies
ASE: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Generalized invariants meet constitutive neural networks: A novel framework for hyperelastic materials
Neural Logic Networks for Interpretable Classification
Roll Your Eyes: Gaze Redirection via Explicit 3D Eyeball Rotation
Controllable Surface Diffusion Generative Model for Neurodevelopmental Trajectories
Deciding how to respond: A deliberative framework to guide policymaker responses to AI systems
SCORPION: Addressing Scanner-Induced Variability in Histopathology
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation
FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation
EnCoBo: Energy-Guided Concept Bottlenecks for Interpretable Generation
T-SYNTH: A Knowledge-Based Dataset of Synthetic Breast Images
MedVAL: Toward Expert-Level Medical Text Validation with Language Models
Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems
"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation
An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing
DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning
Semantic Exploration and Dense Mapping of Complex Environments using Ground Robot with Panoramic LiDAR-Camera Fusion
Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data
Binarized Neural Networks Converge Toward Algorithmic Simplicity: Empirical Support for the Learning-as-Compression Hypothesis
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models
DisastIR: A Comprehensive Information Retrieval Benchmark for Disaster Management
Preference Isolation Forest for Structure-based Anomaly Detection
Trustless Autonomy: Understanding Motivations, Benefits, and Governance Dilemmas in Self-Sovereign Decentralized AI Agents
GRADA: Graph-based Reranking against Adversarial Documents Attack
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models
Direct Video-Based Spatiotemporal Deep Learning for Cattle Lameness Detection
Read Before You Think: Mitigating LLM Comprehension Failures with Step-by-Step Reading
Zero-Shot LLMs in Human-in-the-Loop RL: Replacing Human Feedback for Reward Shaping
Predicting Multi-Agent Specialization via Task Parallelizability
Fine-tuning Vision Language Models with Graph-based Knowledge for Explainable Medical Image Analysis
VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion
METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
SNaRe: Domain-aware Data Generation for Low-Resource Event Detection
Superpose Task-specific Features for Model Merging
Examining False Positives under Inference Scaling for Mathematical Reasoning
SWAT: Sliding Window Adversarial Training for Gradual Domain Adaptation
Advanced Physics-Informed Neural Network with Residuals for Solving Complex Integral Equations
Retrieval-Retro: Retrieval-based Inorganic Retrosynthesis with Expert Knowledge
Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in Switzerland
Reconstruction of Differentially Private Text Sanitization via Large Language Models
3DS: Medical Domain Adaptation of LLMs via Decomposed Difficulty-based Data Selection
The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models
Top K Enhanced Reinforcement Learning Attacks on Heterogeneous Graph Node Classification
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
EXPLOR: Extrapolatory Pseudo-Label Matching for Out-of-distribution Uncertainty Based Rejection
Spatio-Temporal Anomaly Detection with Graph Networks for Data Quality Monitoring of the Hadron Calorimeter
Rule-Based Error Detection and Correction to Operationalize Movement Trajectory Classification
Heterogeneous Directed Hypergraph Neural Network over Abtract syntax tree (AST) for Code Classification
The Art of Saying "Maybe": A Conformal Lens for Uncertainty Benchmarking in VLMs
Human + AI for Accelerating Ad Localization Evaluation
Statistical Methods in Generative AI
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
DSperse: A Framework for Targeted Verification in Zero-Knowledge Machine Learning
DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework
Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge
Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning
Automatic Mapping of AutomationML Files to Ontologies for Graph Queries and Validation
Explicit Context-Driven Neural Acoustic Modeling for High-Fidelity RIR Generation
FlowRL: Matching Reward Distributions for LLM Reasoning
Orion: Fuzzing Workflow Automation
TITAN: A Trajectory-Informed Technique for Adaptive Parameter Freezing in Large-Scale VQE
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models
Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting
Semi-Supervised 3D Medical Segmentation from 2D Natural Images Pretrained Model
Leveraging Geometric Visual Illusions as Perceptual Inductive Biases for Vision Models
Exploring How Audio Effects Alter Emotion with Foundation Models
WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
The mechanization of science illustrated by the Lean formalization of the multi-graded Proj construction
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
TextMine: LLM-Powered Knowledge Extraction for Humanitarian Mine Action
Listening, Imagining \& Refining: A Heuristic Optimized ASR Correction Framework with LLMs
Communication Efficient Split Learning of ViTs with Attention-based Double Compression
Load more
Semantic Exploration and Dense Mapping of Complex Environments using Ground Robot with Panoramic LiDAR-Camera Fusion
Created by
Haebom
作者
Xiaoyang Zhan, Shixin Zhou, Qianqian Yang, Yixuan Zhao, Hao Liu, Srinivas Chowdary Ramineni, Kenji Shimada
概要
本論文は,LiDAR‐パノラマカメラシステムを搭載した地上ロボットを用いて,複雑で未知の環境で自律的な意味論的探索と密集した意味論的目標マッピングを実行するシステムを提示します。従来のアプローチは、多くの時点で高品質の観測データを収集することと、不必要な繰り返しの移動を避けることとの間のバランスをとるのが困難な場合が多い。これらの問題を解決するために、マッピングと計画を組み合わせた完全なシステムを提案します。まず、幾何学的範囲と意味論的観点の両方の観測を完了することで作業を上書きします。次に、意味論的および幾何学的視点を別々に管理し、新しい優先順位ベースの分離型地域サンプラーを提案して、地域視点セットを作成します。これにより、不要な繰り返しなしに明示的な多視点セマンティックチェックとボクセル範囲を可能にします。これに基づいて、効率的なグローバルスコープを確保するための階層計画者を開発します。また、ロボットの安全性を確保しつつ、積極的な探索動作を可能にする安全な積極的な探査状態機械を提案する。このシステムには、最先端のSLAMアルゴリズムとシームレスに統合され、ポイントクラウドレベルで高密度のセマンティックターゲットマッピングを実行するプラグアンドプレイセマンティックターゲットマッピングモジュールが含まれています。現実的なシミュレーションと複雑な実環境での広範な実験により、提案されたアプローチの妥当性を検証します。シミュレーション結果は、提案された計画者がより速いナビゲーションとより短い移動距離を達成し、同時に指定された数の多視点検査を保証することを示しています。実際の実験は、非構造化環境の正確な密集セマンティックオブジェクトマッピングの達成におけるシステムの効果をさらに確認します。
Takeaways、Limitations
•
Takeaways:
◦
複雑な環境における効率的な自律探索とセマンティックマッピングのための新しいシステム提示
◦
多視点観察と幾何学的範囲を同時に考慮する効果的な計画アルゴリズムの開発
◦
安全性を維持しながら攻撃的なナビゲーションを可能にするステートマシンの実装。
◦
シミュレーションと実環境実験によるシステム性能の検証
◦
最先端のSLAMアルゴリズムとのシームレスな統合による改善されたマッピング精度
•
Limitations:
◦
提案されたシステムの性能は、使用されるLiDARパノラマカメラシステムの性能に依存し得る。
◦
さまざまな環境条件(照明、天気など)に対するシステムの堅牢性の追加検証が必要です。
◦
実際の環境での長期運用時に発生する可能性があるシステムの安定性と耐久性に関する追加の研究が必要です。
◦
計算コストが高く、リアルタイム処理に困難がある可能性があります。
◦
特定の種類の環境(例えば、極端に狭い環境または混雑した環境)に対するシステムの適用可能性に関する追加の研究が必要です。
PDFを見る
Made with Slashpage