Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Horus: A Protocol for Trustless Delegation Under Uncertainty

Reducing Variability of Multiple Instance Learning Methods for Digital Pathology

Positioning AI Tools to Support Online Harm Reduction Practice: Applications and Design Directions

DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues

Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs

BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Towards Safety Evaluations of Theory of Mind in Large Language Models

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

GraphGSOcc: Semantic-Geometric Graph Transformer with Dynamic-Static Decoupling for 3D Gaussian Splatting-based Occupancy Prediction

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

15,500 Seconds: Lean UAV Classification Leveraging PEFT and Pre-Trained Networks

Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information

Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers

BIS Reasoning 1.0: The First Large-Scale Japanese Benchmark for Belief-Inconsistent Syllogistic Reasoning

On the Fundamental Impossibility of Hallucination Control in Large Language Models

Adapting Rule Representation With Four-Parameter Beta Distribution for Learning Classifier Systems

Real-Time Blind Defocus Deblurring for Earth Observation: The IMAGIN-e Mission Approach

Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling

Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization

Pre-training Large Memory Language Models with Internal and External Knowledge

Towards Universal Semantics With Large Language Models

Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations

Enhancing Robustness to Missing Modalities through Clustered Federated Learning

Perceiving Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models

LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis

Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?

Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

How Metacognitive Architectures Remember Their Own Thoughts: A Systematic Review

SFO: Piloting VLM Feedback for Offline RL

Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks

KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature Analysis

Distribution Matching for Self-Supervised Transfer Learning

A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior

SKIL: Semantic Keypoint Imitation Learning for Generalizable Data-efficient Manipulation

AirRadar: Inferring Nationwide Air Quality in China with Deep Neural Networks

A Framework for Mining Collectively-Behaving Bots in MMORPGs

Continual Learning with Strategic Selection and Forgetting for Network Intrusion Detection

A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions

A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation

GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs

There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models

Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models

Contrastive Learning and Adversarial Disentanglement for Privacy-Aware Task-Oriented Semantic Communication

Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization

NegMerge: Sign-Consensual Weight Merging for Machine Unlearning

Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

Backdooring Bias (B^2) into Stable Diffusion Models

Embodied Instruction Following in Unknown Environments

Improving Consistency Models with Generator-Augmented Flows

OralBBNet: Spatially Guided Dental Segmentation of Panoramic X-Rays with Bounding Box Priors

Divergent Creativity in Humans and Large Language Models

SpikeNAS: A Fast Memory-Aware Neural Architecture Search Framework for Spiking Neural Network-based Embedded AI Systems

Squat: Quant Small Language Models on the Edge

Dataset Distillation via the Wasserstein Metric

The Boolean Solution Problem from the Perspective of Predicate Logic -- Extended Version

Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

World-aware Planning Narratives Enhance Large Vision-Language Model Planner

Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?

MMLU-Reason: Benchmarking Multi-Task Multi-modal Language Understanding and Reasoning

Adapting Probabilistic Risk Assessment for AI

Beating Transformers using Synthetic Cognition

MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow

Using Large Language Models to Categorize Strategic Situations and Decipher Motivations Behind Human Behaviors

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification

DREAMS: A python framework for Training Deep Learning Models on EEG Data with Model Card Reporting for Medical Applications

Human Mobility Modeling with Household Coordination Activities under Limited Information via Retrieval-Augmented LLMs

Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection

End-to-End Large Portfolio Optimization for Variance Minimization with Neural Networks through Covariance Cleaning

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

AI4Research: A Survey of Artificial Intelligence for Scientific Research

Towards Foundation Auto-Encoders for Time-Series Anomaly Detection

Bridging UI Design and chatbot Interactions: Applying Form-Based Principles to Conversational Agents

mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

MILP-SAT-GNN: Yet Another Neural SAT Solver

Empowering Manufacturers with Privacy-Preserving AI Tools: A Case Study in Privacy-Preserving Machine Learning to Solve Real-World Problems

LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

How Do Vision-Language Models Process Conflicting Information Across Modalities?

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Probing Evaluation Awareness of Language Models

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

BranchNet: A Neuro-Symbolic Learning Framework for Structured Multi-Class Classification

GPU-based complete search for nonlinear minimization subject to bounds

Enhanced Generative Model Evaluation with Clipped Density and Coverage

Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America

GraphGSOcc: Semantic-Geometric Graph Transformer with Dynamic-Static Decoupling for 3D Gaussian Splatting-based Occupancy Prediction

Created by

Haebom

저자

Ke Song, Yunhe Wu, Chunchit Siu, Huiyuan Xiong

개요

본 논문은 자율 주행을 위한 3D 의미론적 점유 예측 과제를 다루며, 기존 3D Gaussian Splatting (3DGS) 방법의 두 가지 주요 문제점, 즉 (1) 유사 범주 및 영역 간의 의미론적 상관 관계를 무시하는 통합된 특징 집계, (2) MLP 반복 최적화에서 기하학적 제약의 부족으로 인한 경계 모호성, (3) 동적-정적 객체 결합 최적화의 편향 문제를 해결하는 것을 목표로 합니다. 이를 위해, 의미론적 및 기하학적 그래프 Transformer를 결합하고 동적-정적 객체 최적화를 분리하는 새로운 프레임워크인 GraphGSOcc 모델을 제안합니다. Dual Gaussians Graph Attention을 통해 기하학적 그래프와 의미론적 그래프를 동적으로 구성하여 특징 집계 및 의미론적 관계를 효과적으로 인코딩하고, 다중 스케일 그래프 Attention 프레임워크를 통해 경계 디테일과 객체 수준 위상을 최적화합니다. 또한, 의미론적 확률 분포를 활용하여 동적 및 정적 객체를 분리하고 Dynamic-Static Decoupled Gaussian Attention 메커니즘을 설계하여 동적 객체와 정적 장면 모두에 대한 예측 성능을 향상시킵니다. SurroundOcc-nuScenes, Occ3D-nuScenes, OpenOcc, KITTI 점유 벤치마크에서 최첨단 성능을 달성하며, SurroundOcc 데이터셋에서 25.20%의 mIoU를 달성하고 GPU 메모리를 6.8GB로 줄여 GaussianWorld에 비해 mIoU 1.97% 향상 및 메모리 13.7% 감소를 보였습니다.

시사점, 한계점

•

시사점:

◦

3D Gaussian Splatting 기반 점유 예측에서 의미론적 및 기하학적 정보를 효과적으로 활용하는 새로운 프레임워크를 제시.

◦

동적-정적 객체 최적화 분리를 통해 예측 성능 향상.

◦

다중 스케일 그래프 Attention을 통해 경계 디테일 및 객체 수준 위상을 정확하게 모델링.

◦

여러 벤치마크에서 최첨단 성능 달성 및 GPU 메모리 효율 향상.

•

한계점:

◦

제안된 모델의 복잡성으로 인한 계산 비용 증가 가능성.

◦

특정 데이터셋에 대한 성능 최적화로 일반화 성능에 대한 추가적인 검증 필요.

◦

다양한 환경 및 상황에 대한 로버스트 성능에 대한 추가적인 연구 필요.

Made with Slashpage