[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models

Created by
  • Haebom

Author

Konstantin Donhauser, Kristina Ulicna, Gemma Elyse Moran, Aditya Ravuri, Kian Kenyon-Dean, Cian Eastwood, Jason Hartford

Outline

In this paper, we explore whether sparse dictionary learning (DL), which has emerged as a powerful method for extracting semantically meaningful concepts from large-scale language models (LLMs) trained on text data, can be applied to scientific data that are difficult to interpret by humans, such as vision-based models trained on cell microscopy images. We propose a novel method that combines a sparse DL algorithm, iterative codebook feature learning (ICFL), with a PCA whitening preprocessing step derived from control data. This successfully retrieves biologically meaningful concepts, such as cell types and genetic variations, and reveals subtle morphological changes caused by human-interpretable interventions, suggesting a promising new direction for scientific discovery through mechanistic interpretation of biological images.

Takeaways, Limitations

Takeaways:
We demonstrate that sparse dictionary learning (DL) can be used to extract meaningful concepts from scientific data that is difficult for humans to interpret (e.g., cell microscopy images).
Successfully retrieved biologically meaningful concepts (cell types, genetic changes, etc.) through a combination of ICFL and PCA whitening preprocessing steps.
Opens up new possibilities for scientific discovery through mechanistic interpretation of biological images by revealing subtle morphological changes resulting from human-interpretable interventions.
Limitations:
Further studies are needed to investigate the generalization performance of the proposed method and its applicability to different types of scientific data.
Sensitivity analysis is needed on the selection of control data and setting of PCA whitening parameters.
Further in-depth interpretation and validation of the biological meaning of the extracted concepts is needed.
👍