/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
VarCoNet: A variability-aware self-supervised framework for functional connectome extraction from resting-state fMRI
KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting
SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment
Pack and Force Your Memory: Long-form and Consistent Video Generation
Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models
Analyzing Latent Concepts in Code Language Models
Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving
DM-Bench: Benchmarking LLMs for Personalized Decision Making in Diabetes Management
YOLO-Based Defect Detection for Metal Sheets
Jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking
SecInfer: Preventing Prompt Injection via Inference-time Scaling
Putnam-like dataset summary: LLMs as mathematical competition contestants
Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
Observation-Free Attacks on Online Learning to Rank
MTRec: Learning to Align with User Preferences via Mental Reward Models
MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANs
When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
Flow-Induced Diagonal Gaussian Processes
Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach
Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection
Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
Scam2Prompt: A Scalable Framework for Auditing Malicious Scam Endpoints in Production LLMs
Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization
STORI: A Benchmark and Taxonomy for Stochastic Environments
A Study on the Framework for Evaluating the Ethics and Trustworthiness of Generative AI
Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
Quantum-RAG and PunGPT2: Advancing Low-Resource Language Generation and Retrieval for the Punjabi Language
Tuning LLM-based Code Optimization via Meta-Prompting: An Industrial Perspective
SBP-YOLO:A Lightweight Real-Time Model for Detecting Speed Bumps and Potholes toward Intelligent Vehicle Suspension Systems
An Architecture for Spatial Networking
A Comprehensive Review on Harnessing Large Language Models to Overcome Recommender System Challenges
First Hallucination Tokens Are Different from Conditional Ones
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Model Parallelism With Subnetwork Data Parallelism
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting
A Survey of Pun Generation: Datasets, Evaluations and Methodologies
Controlled Generation with Equivariant Variational Flow Matching
CAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
Semantic Preprocessing for LLM-based Malware Analysis
Manipulating 3D Molecules in a Fixed-Dimensional E(3)-Equivariant Latent Space
Permissioned LLMs: Enforcing Access Control in Large Language Models
Efficient Preimage Approximation for Neural Network Certification
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model
Pre-training Limited Memory Language Models with Internal and External Knowledge
OT Score: An OT based Confidence Score for Source Free Unsupervised Domain Adaptation
Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments
A Survey of Deep Learning for Complex Speech Spectrograms
Continuous Thought Machines
CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
PropRAG: Guiding Retrieval with Beam Search over Proposition Paths
Activated LoRA: Fine-tuned LLMs for Intrinsics
Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models
Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations
Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement
DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation
A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Rethinking the Vulnerability of Concept Erasure and a New Method
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM トレーニング
MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents
CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification
Graph Neural Networks for Transmission Grid Topology Control: Busbar Information Asymmetry and Heterogeneous Representations
Inferring Pluggable Types with Machine Learning
Optimizing Container Loading and Unloading through Dual-Cycling and Dockyard Rehandle Reduction Using a Hybrid Genetic Algorithm
LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing
Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders
RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives
Unified Domain Adaptive Semantic Segmentation
Do AI Models Perform Human-like Abstract Reasoning Across Modalities?
Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs
Thinkquel: A Model Dedicated to Text-to-dbt Using Synthetic Data and a Span-Aware Objective
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Learning to Interact in World Latent for Team Coordination
Understanding Generative Recommendation with Semantic IDs from a Model-scaling View
GUI-PRA: Process Reward Agent for GUI Tasks
PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning
Efficient & Correct Predictive Equivalence for Decision Trees
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Gala: Global LLM Agents for Text-to-Model Translation
Disentangling Multiplex Spatial-Temporal Transition Graph Representation Learning for Socially Enhanced POI Recommendation
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems
V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving
MIRROR: Modular Internal Processing for Personalized Safety in LLM Dialogue
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
ViLBias: Detecting and Reasoning about Bias in Multimodal Content
OML: A Primitive for Reconciling Open Access with Owner Control in AI Model Distribution
Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes
Load more
Unified Domain Adaptive Semantic Segmentation
Created by
Haebom
作者
Zhe Zhang, Gaochang Wu, Jing Zhang, Xiatian Zhu, Dacheng Tao, Tianyou Chai
概要
非監督ドメイン適応セマンティックセグメンテーション(UDA-SS)は、ラベル付きソースドメインからラベルなしのターゲットドメインに監督を転送することを目的としています。この研究は、画像とビデオのシナリオ全体にわたってUDA-SS研究を統合し、より包括的な理解、相乗的発展、そして効率的な知識共有を可能にします。この目的のために、一般的なデータ拡張の観点から統合UDA-SSを探求し、改善された一般化とアイデアの相互修正を可能にする統一された概念的なフレームワークを提示します。具体的には、特徴空間内の内部およびドメイン間の混合のための4方向経路を介した明確な点属性と特徴の不一致を解決するQuad-directional Mixup(QuadMix)方法を提案する。ビデオの時間的変化を処理するために、空間的および時間的次元にわたって光学フローベースの特徴集計を統合して、微細なドメイン整列を実行する。
Takeaways、Limitations
•
画像とビデオの両方のシナリオでUDA-SSへの統合されたアプローチを提示し、研究分野の断片性を解決し、知識共有を促進します。
•
Quad-directional Mixup(QuadMix)法により、特徴空間における内部およびドメイン間の混合のための新しいアプローチを提示します。
•
光学フローベースの特徴集約によるビデオの時間的変化を効果的に処理する
•
4つの難しいUDA-SSベンチマークで最先端の性能を超えています。
•
コードとモデルを公開し、研究の再現性と発展を促進する。
•
Limitationsは論文で明示的に言及されていません。
PDFを見る
Made with Slashpage