Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Modeling Saliency Dataset Bias

Created by
  • Haebom

Author

Matthias K ummerer, Harneet Singh Khanuja, Matthias Bethge

Outline

Despite recent advances in image-based visual saliency prediction, predicting visual fixations across multiple datasets remains challenging, and we demonstrate that this is due to dataset bias. Models trained on one dataset significantly degrade when applied to other datasets. Increasing dataset diversity does not resolve this inter-dataset gap, with approximately 60% of the gap attributed to dataset-specific bias. To address this generalization gap, we propose a novel architecture based on a dataset-independent encoder-decoder architecture, adding fewer than 20 dataset-specific parameters that control interpretable mechanisms such as multiscale architecture, center bias, and fixation spread. Adapting these parameters alone to new data addresses over 75% of the generalization gap, achieving significant improvements with as few as 50 samples. The proposed model achieves new state-of-the-art performance on three datasets from the MIT/Tübingen Saliency Benchmark (MIT300, CAT2000, and COCO-Freeview) and also demonstrates excellent generalization performance on unrelated datasets. Additionally, the model exhibits complex multi-scale effects combining both absolute and relative magnitude, providing valuable insights into spatial visual significance properties.

Takeaways, Limitations

We confirm that dataset bias is the main cause of poor generalization performance of visual importance prediction models.
We present a novel architecture that effectively reduces the performance gap between datasets by using only a small number of dataset-specific parameters.
Demonstrates that model performance can be significantly improved with just a small number of samples.
Achieving SOTA on three datasets of the MIT/Tuebingen Saliency Benchmark.
Provides new insights into spatial visual significance.
The paper does not specify specific architectural details or implementation limitations.
Further validation of generalization performance on other datasets is needed.
👍