Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Diversity Boosts AI-Generated Text Detection

Created by
  • Haebom

Author

Advik Raj Basani, Pin-Yu Chen

Outline

AI-generated text detection is becoming increasingly important to prevent misuse of LLMs in education, business compliance, journalism, and social media. Previous detectors often rely on token-level likelihoods or opaque black-box classifiers, which are vulnerable to high-quality generation and poor interpretability. In this study, we propose DivEye, a novel detection framework that captures how unpredictability varies across text using surprisal-based features. Motivated by the observation that human-authored text exhibits richer variability in lexical and structural unpredictability than LLM output, DivEye captures this signal through a set of interpretable statistical features. The proposed method outperforms existing zero-shot detectors by up to 33.2% and is competitive with fine-tuned baselines across multiple benchmarks. DivEye is robust against paraphrasing and adversarial attacks, generalizes well across domains and models, and improves the performance of existing detectors by up to 18.7% when used as an auxiliary signal. Beyond detection, DivEye provides interpretable insights into why text is flagged, pointing to rhythmic unpredictability as a powerful and understudied signal for LLM detection.

Takeaways, Limitations

Takeaways:
DivEye outperforms existing methods in AI-generated text detection.
Provides insights into detection results using interpretable statistical features.
Robust against paraphrasing and adversarial attacks.
It works well across a variety of domains and models.
It can improve the performance of existing detectors.
We show that rhythmic unpredictability is an important signal for LLM detection.
Limitations:
The paper does not provide a specific Limitations.
👍