Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Benchmark of stylistic variation in LLM-generated texts

Created by
  • Haebom

Author

Ji\v{r} i Mili\v{c}ka, Anna Marklov a, V aclav Cvr\v{c}ek

Outline

This study investigates register variation between texts generated by large-scale language models (LLMs) and human-authored texts. We apply Biber's Multidimensional Analysis (MDA) to samples of human-authored and corresponding AI-generated texts to identify the dimensions of variation where LLMs differ most significantly and systematically from humans. We use AI-Brown, a newly generated LLM-generated corpus comparable to the Brown family corpus (BE-21), representing Modern British English. Because all languages except English are underrepresented in the state-of-the-art LLM training data, we repeat a similar analysis for Czech using the AI-Koditex corpus and a Czech multidimensional model. We examine 16 state-of-the-art models across a variety of settings and prompts, focusing on the differences between baseline models and directive-fine-tuned models. This creates a benchmark that allows for comparisons between models and ranking them on interpretable dimensions.

Takeaways, Limitations

Takeaways: We quantitatively analyze the register differences between LLM-generated text and human-written text and provide comparable benchmarks. Through analysis of various languages and model settings, we provide a comprehensive understanding of LLM's register control capabilities. We compare and analyze the differences between the baseline model and the instruction-tuned model to suggest directions for model improvement.
Limitations: The generalizability of the results may be limited due to the size and diversity of the corpus used in the analysis. This analysis is currently limited to state-of-the-art LLMs, and ongoing updates to new models are needed. Analysis of linguistic features other than register variation was not included. The MDA used in the analysis may have influenced the results.
👍