Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CEHR-XGPT: A Scalable Multi-Task Foundation Model for Electronic Health Records

Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens

Adaptive Learning Strategies for Mitotic Figure Classification in MIDOG2025 Challenge

MitoDetect++: A Domain-Robust Pipeline for Mitosis Detection and Atypical Subtyping

Align-Then-StEer: Adapting the Vision-Language Action Models through Unified Latent Guidance

Fantastic Pretraining Optimizers and Where to Find Them

Towards Interpretable Geo-localization: a Concept-Aware Global Image-GPS Alignment Framework

TECP: Token-Entropy Conformal Prediction for LLMs

The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management

Train-Once Plan-Anywhere Kinodynamic Motion Planning via Diffusion Trees

Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare

Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets

AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection

HuggingGraph: Understanding the Supply Chain of LLM Ecosystem

Food safety trends across Europe: insights from the 392-million-entry CompreHensive European Food Safety (CHEFS) database

Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty Quantification

BayesSDF: Surface-Based Laplacian Uncertainty Estimation for 3D Geometry with Neural Signed Distance Fields

Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework

The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations

AI-Assisted Rapid Crystal Structure Generation Towards a Target Local Environment

First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

Cutting Through Privacy: A Hyperplane-Based Data Reconstruction Attack in Federated Learning

AutoPDL: Automatic Prompt Optimization for LLM Agents

RailGoerl24: G\"orlitz Rail Test Center CV Dataset 2024

Revealing higher-order neural representations of uncertainty with the Noise Estimation through Reinforcement-based Diffusion (NERD) model

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

Spoof Trace Discovery for Deep Learning Based Explainable Face Anti-Spoofing

The Information Security Awareness of Large Language Models

Automatically Detecting Online Deceptive Patterns

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

Automated detection of underdiagnosed medical conditions via opportunistic imaging

Selective Preference Optimization via Token-Level Reward Function Estimation

ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation

PersonaGym: Evaluating Persona Agents and LLMs

CFaults: Model-Based Diagnosis for Fault Localization in C Programs with Multiple Test Cases

From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Demystifying Chains, Trees, and Graphs of Thoughts

Survival Analysis with Adversarial Regularization

Net2Brain: A Toolbox to compare artificial vision models with human brain responses

The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs

PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Dynamic Speculative Agent Planning

AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning

Graph RAG as Human Choice Model: Building a Data-Driven Mobility Agent with Preference Chain

MHSNet:An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design

Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning

Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning

Translating Federated Learning Algorithms in Python into CSP Processes Using ChatGPT

ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding

Epistemic Skills: Reasoning about Knowledge and Oblivion

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

GUI Agents: A Survey

Neural Network Verification with PyRAT

Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning

Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation

MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining

SpikingBrain Technical Report: Spiking Brain-inspired Large Models

Scaling Performance of Large Language Model Pretraining

Recomposer: Event-roll-guided generative audio editing

COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization

Uncertain but Useful: Leveraging CNN Variability into Data Augmentation

CURE: Controlled Unlearning for Robust Embeddings -- Mitigating Conceptual Shortcuts in Pre-Trained Language Models

HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models

RapidGNN: Energy and Communication-Efficient Distributed Training on Large-Scale Graph Neural Networks

Enhancing 3D Point Cloud Classification with ModelNet-R and Point-SkipNet

AI Agents for Web Testing: A Case Study in the Wild

Accuracy-Constrained CNN Pruning for Efficient and Reliable EEG-Based Seizure Detection

Exploring Situated Stabilities of a Rhythm Generation System through Variational Cross-Examination

GenAI-based test case generation and execution in SDV platform

ICR: Iterative Clarification and Rewriting for Conversational Search

ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions

Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

Pointing-Guided Target Estimation via Transformer-Based Attention

Adversarial Augmentation and Active Sampling for Robust Cyber Anomaly Detection

LLM Enabled Multi-Agent System for 6G Networks: Framework and Method of Dual-Loop Edge-Terminal Collaboration

High-Resolution Global Land Surface Temperature Retrieval via a Coupled Mechanism-Machine Learning Framework

Exploring an implementation of quantum learning pipeline for support vector machines

DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation

Artificial intelligence for representing and characterizing quantum systems

PLaMo 2 Technical Report

SpiderNets: Estimating Fear Ratings of Spider-Related Images with Vision Models

The Paradox of Doom: Acknowledging Extinction Risk Reduces the Incentive to Prevent It

A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing

REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts

PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination

Exploring Non-Local Spatial-Angular Correlations with a Hybrid Mamba-Transformer Framework for Light Field Super-Resolution

AI-Driven Fronthaul Link Compression in Wireless Communication Systems: Review and Method Design

Toward Accessible Dermatology: Skin Lesion Classification Using Deep Learning Models on Mobile-Acquired Images

Graph Unlearning: Efficient Node Removal in Graph Neural Networks

Enhancing Diversity in Large Language Models via Determinantal Point Processes

VARMA-Enhanced Transformer for Time Series Forecasting

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models

Exploring Non-Local Spatial-Angular Correlations with a Hybrid Mamba-Transformer Framework for Light Field Super-Resolution

Created by

Haebom

Author

Haosong Liu, Xiancheng Zhu, Huanqiang Zeng, Jianqing Zhu, Jiuwen Cao, Junhui Hou

Outline

This paper presents an improvement on the Mamba-based method, which boasts long-range information modeling and linear complexity, to optimize computational cost and performance in Lightweight Image Super-Resolution (LFSR). To address the inefficient and redundant feature extraction problems of conventional multidirectional scanning strategies applied to complex LF data, this paper designs a Subspace Simple Mamba Block (SSMB) based on the Subspace Simple Scanning (Sub-SS) strategy, achieving more efficient and accurate feature extraction. Furthermore, to address the limitations of state space in preserving spatial-angle and disparity information, a two-stage modeling strategy is proposed to more comprehensively explore nonlocal spatial-angle correlations. In the first stage, the Spatial-Angular Residual Subspace Mamba Block (SA-RSMB) is used to extract shallow spatial-angle features. In the second stage, a dual-branch parallel architecture combining the Epipolar Plane Mamba Block (EPMB) and the Epipolar Plane Transformer Block (EPTB) is used to enhance deep epipolar features. Based on these modules and strategies, we propose LFMT, a hybrid Mamba-Transformer framework that integrates the strengths of the Mamba and Transformer models. LFMT enables comprehensive information exploration across spatial, angular, and epipolar domains. Experimental results demonstrate that LFMT significantly outperforms existing state-of-the-art LFSR methods while maintaining low computational complexity on real and synthetic LF datasets.

Takeaways, Limitations

•

Takeaways:

◦

By improving the efficiency of the Mamba-based method, we reduce the computational cost of LFSR and improve its performance.

◦

Sub-SS strategy and SSMB enable more efficient and accurate feature extraction.

◦

The two-step modeling strategy improves the preservation of spatial-angular and parallax information.

◦

We propose an LFMT framework that combines the strengths of Mamba and Transformer, resulting in improved performance.

◦

We achieve performance that outperforms existing state-of-the-art techniques on real and synthetic datasets.

•

Limitations:

◦

Further validation of the generalization performance of the proposed method may be required.

◦

Optimization may have been done for a specific dataset, and performance evaluation on other types of LF data is needed.

◦

Further analysis is needed to determine the extent of the reduction in computational complexity and its effectiveness in practical applications.

◦

There may be a lack of detailed explanation of the parameter settings of the Sub-SS strategy and discussion of optimization methods.

View PDF

Made with Slashpage