Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation

Input Time Scaling

CRED-SQL: Enhancing Real-world Large Scale Database Text-to-SQL Parsing through Cluster Retrieval and Execution Description

STEM: Efficient Relative Capability Evaluation of LLMs through Structured Transition Samples

AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition

Biased AI improves human decision-making but reduces trust

MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling

ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model

MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving

ETA: Energy-based Test-time Adaptation for Depth Completion

Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens

When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs

CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Reinitializing weights vs units for maintaining plasticity in neural networks

Each to Their Own: Exploring the Optimal Embedding in RAG

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

TolerantECG: A Foundation Model for Imperfect Electrocardiogram

DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization

Structure As Search: Unsupervised Permutation Learning for Combinatorial Optimization

Enhancing Temporal Sensitivity of Large Language Model for Recommendation with Counterfactual Tuning

Multi-agent Auditory Scene Analysis

MinD: Learning A Dual-System World Model for Real-Time Planning and Implicit Risk Analysis

AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale

Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting

Spore in the Wild: A Case Study of Spore.fun as an Open-Environment Evolution Experiment with Sovereign AI Agents on TEE-Secured Blockchains

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Neural Restoration of Greening Defects in Historical Autochrome Photographs Based on Purely Synthetic Data

Security Concerns for Large Language Models: A Survey

Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)

One-Layer Transformers are Provably Optimal for In-context Reasoning and Distributional Association Learning in Next-Token Prediction Tasks

FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for \"U-Tsang, Amdo and Kham Speech Dataset Generation

Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers

Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models

A Conceptual Framework for AI-based Decision Systems in Critical Infrastructures

Dominated Actions in Imperfect-Information Games

Hands-On: Segmenting Individual Signs from Continuous Sequences

PathGPT: Reframing Path Recommendation as a Natural Language Generation Task with Retrieval-Augmented Language Models

Boosting Chart-to-Code Generation in MLLM via Dual Preference-Guided Refinement

JudgeLRM: Large Reasoning Models as a Judge

Generative AI in K-12 Education: The CyberScholar Initiative

Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions

Hybrid Action Based Reinforcement Learning for Multi-Objective Compatible Autonomous Driving

Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?

Action Engine: Automatic Workflow Generation in FaaS

The importance of visual modeling languages in generative software engineering

Identity Preserving 3D Head Stylization with Multiview Score Distillation

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models

Testing Components of the Attention Schema Theory in Artificial Neural Networks

A Little Human Data Goes A Long Way

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Social Debiasing for Fair Multi-modal LLMs

A Comprehensive Benchmark on Spectral GNNs: The Impact on Efficiency, Memory, and Effectiveness

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Enhancing Depression-Diagnosis-Oriented Chat with Psychological State Tracking

Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning

Towards the Use of Saliency Maps for Explaining Low-Quality Electrocardiograms to End Users

Nash Convergence of Mean-Based Learning Algorithms in First-Price Auctions

TASER: Table Agents for Schema-guided Extraction and Recommendation

Modeling Relational Logic Circuits for And-Inverter Graph Convolutional Network

EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making

KIRETT: Knowledge-Graph-Based Smart Treatment Assistant for Intelligent Rescue Operations

EoH-S: Evolution of Heuristic Set using LLMs for Automated Heuristic Design

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks (ORAN)

The NordDRG AI Benchmark for Large Language Models

Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

Unsupervised Learning for Quadratic Assignment

Reference-Aligned Retrieval-Augmented Question Answering over Heterogeneous Proprietary Documents

Benchmarking graph construction by large language models for coherence-driven inference

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Graph Structure Learning with Temporal Graph Information Bottleneck for Inductive Representation Learning

$TIME[t] \subseteq SPACE[O(\sqrt{t})]$ via Tree Height Compression

Long Chain-of-Thought Reasoning Across Languages

From Passive Tool to Socio-cognitive Teammate: A Conceptual Framework for Agentic AI in Human-AI Collaborative Learning

Evaluating Retrieval-Augmented Generation vs. Long-Context Input for Clinical Reasoning over EHRs

TransLight: Image-Guided Customized Lighting Control with Generative Decoupling

DINOv3 with Test-Time Training for Medical Image Registration

MF-LPR$^2$: Multi-Frame License Plate Image Restoration and Recognition using Optical Flow

TransLLM: A Unified Multi-Task Foundation Framework for Urban Transportation via Learnable Prompting

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Reliable generation of isomorphic physics problems using ChatGPT with prompt-chaining and tool use

Cross-Modality Controlled Molecule Generation with Diffusion Language Model

Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference

AFABench: A Generic Framework for Benchmarking Active Feature Acquisition

Emerson-Lei and Manna-Pnueli Games for LTLf+ and PPLTL+ Synthesis

Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine

Learning in Repeated Multi-Objective Stackelberg Games with Payoff Manipulation

Foe for Fraud: Transferable Adversarial Attacks in Credit Card Fraud Detection

ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal

ELATE: Evolutionary Language model for Automated Time-series Engineering

OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service

Can LLM Agents Solve Collaborative Tasks? A Study on Urgency-Aware Planning and Coordination

A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References

UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling

An Open-Source HW-SW Co-Development Framework Enabling Efficient Multi-Accelerator Systems

Mamba2 Meets Silence: Robust Vocal Source Separation for Sparse Regions

Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens

Created by

Haebom

Author

Suchisrit Gangopadhyay, Jung-Hee Kim, Xien Chen, Patrick Rim, Hyoungseob Park, Alex Wong

Outline

This paper proposes a method for adapting basic monocular depth estimators (FMDEs), trained on conventional perspective images, to fisheye images. Despite being trained on tens of millions of images, FMDEs are susceptible to covariate shift due to changes in camera calibration (intrinsic and distortion) parameters, resulting in incorrect depth estimates. Our proposed method aligns the distribution of latent embeddings encoding fisheye images with those of perspective images, enabling the reuse of FMDEs on fisheye cameras without retraining or fine-tuning. To achieve this, we introduce a set of calibration tokens as a lightweight adaptive mechanism that adjusts the latent embeddings to achieve alignment. We hypothesize that by leveraging the already expressive latent space of FMDEs, we can avoid the negative effects of conventional recalibration or map projection from image space to a standard reference frame. Our method utilizes self-supervised learning and utilizes a large, publicly available perspective image dataset without requiring fisheye images. This is accomplished by recalibrating perspective images to fisheye images and enhancing consistency between estimates during training. We evaluated the approach in both indoor and outdoor environments using multiple FMDEs, demonstrating consistent performance improvements over state-of-the-art methods with just a single token set. The code is available at https://github.com/JungHeeKim29/calibration-token .

GitHub - JungHeeKim29/calibration-token

Contribute to JungHeeKim29/calibration-token development by creating an account on GitHub.

github.com

Takeaways, Limitations

•

Takeaways:

◦

By making the existing monocular depth estimation model applicable to fisheye images, it is possible to expand various application fields utilizing fisheye cameras.

◦

Adaptability to fisheye images using lightweight correction tokens without retraining or fine-tuning.

◦

Achieving efficient adaptation and artifact reduction through latent space manipulation without image space transformation.

◦

Self-supervised learning method allows learning without a fisheye image dataset.

•

Limitations:

◦

Further research is needed to determine the generalization performance of the correction token. Versatility across various fisheye camera models and distortion levels is also needed.

◦

The performance of the proposed method may depend on the FMDEs and perspective image datasets used.

◦

Additional performance evaluation using real fisheye image datasets may be required.

View PDF

Made with Slashpage