Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Fine-Grained Detection of AI-Generated Text Using Sentence-Level Segmentation

Created by
  • Haebom

Author

Lekkala Sai Teja, Annepaka Yadagiri, Partha Pakray, Chukhu Chunka, Mangadoddi Srikar Vardhan

Outline

This paper proposes a sentence-level sequence labeling model to overcome the limitations of AI text detection. While existing document-level classification models struggle to distinguish between mixed or slightly modified AI text, our model utilizes subtle linguistic cues between sentences to detect transitions between AI-generated and human-written text. By combining the latest Transformer model, neural networks (NN), and conditional random fields (CRFs), we achieve precise token-level AI text segmentation. We conduct experiments using two public benchmark datasets, and validate the model's performance through comparisons with existing state-of-the-art models and ablation studies. The source code and processed datasets are available on the GitHub repository.

Takeaways, Limitations

Takeaways:
Solve the low accuracy issue of existing AI text detection models for mixed or modified texts, such as Limitations.
Accurately distinguish between AI-generated and human-written parts through sentence-level analysis.
An effective model architecture combining Transformer, NN, and CRF is presented.
Ensuring reproducibility of research and supporting further research through open code and datasets.
Limitations:
The performance of the proposed model may depend on the dataset used.
As new AI text generation models emerge, there is a possibility that model performance will deteriorate.
Further research is needed on generalization performance across texts in different languages and styles.
👍