Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

CiteBART: Learning to Generate Citations for Local Citation Recommendations

Created by
  • Haebom

Author

Ege Yi\u{g}it \c{C}elik, Selma Tekir

Outline

This paper presents a novel approach for citation recommendation (LCR), specifically utilizing a generative approach to perform citation-specific pretraining within an encoder-decoder architecture. Two variants are proposed, which learn reconstructions by masking author-date citation tokens. The first, CiteBART-Base, utilizes only local context, and the second, CiteBART-Global, enhances the training signal by adding titles and abstracts of citing articles. CiteBART-Global achieves state-of-the-art performance on most LCR benchmarks, and the trained model performs best on the Refseer benchmark. This paper also provides detailed statistics on the generalization ability and hallucination tendency of CiteBART-Global through various experiments and analyses.

Takeaways, Limitations

Takeaways:
Improving LCR performance through generative pretraining.
The CiteBART-Global model achieves SOTA in most benchmarks.
The model trained on the Refseer benchmark shows the best performance.
Verifying cross-dataset generalization ability.
It exhibits a low rate of hallucination (MaHR).
Limitations:
It is difficult to verify the benefits of generative pre-training on small datasets such as the FullTextPeerRead dataset.
Further research on hallucinatory phenomena is needed.
👍