Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Celler:A Genomic Language Model for Long-Tailed Single-Cell Annotation

Created by
  • Haebom

Author

Huan Zhao, Yiming Liu, Jina Yao, Ling Xiong, Zexin Zhou, Zixing Zhang

Outline

In this paper, we present Celler, a state-of-the-art generative dictionary learning model for efficient annotation of single-cell data related to human diseases. Celler utilizes the Gaussian Inflation (GInf) loss function and a Hard Data Mining (HDM) strategy to enhance learning of rare categories and reduce the risk of overfitting to common categories. Furthermore, we build Celler-75, a large-scale single-cell dataset containing 40 million cells across 80 human tissues and 75 specific diseases, providing crucial support for exploring the potential of single-cell technology. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
A novel method for effective annotation of single-cell data related to rare diseases is presented.
Improving model performance with the GInf loss function and HDM strategy.
Enabling research through the release of the large-scale single-cell dataset Celler-75.
Limitations:
Further validation of the balance and representativeness of the Celler-75 dataset is needed.
Further research is needed on the generalization performance of the GInf loss function and the HDM strategy.
The applicability of the model to other single-cell datasets needs to be evaluated.
👍