Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

$\Texttt{Droid}$: A Resource Suite for AI-Generated Code Detection

Created by
  • Haebom

Author

Daniil Orel, Indraneil Paul, Iryna Gurevych, Preslav Nakov

Outline

In this paper, we present $\textbf{$\texttt{DroidCollection}$}$, the most extensive open dataset for training and evaluating machine-generated code detectors. $\texttt{DroidCollection}$ contains over a million code samples, seven programming languages, 43 coding model outputs, and at least three real-world coding domains. In addition to fully AI-generated samples, it also includes human-AI co-authored code and adversarial samples explicitly crafted to evade detection. We then develop $\textbf{$\texttt{DroidDetect}$}$, a suite of encoder-specific detectors trained on multi-task objectives using $\texttt{DroidCollection}$. Experimental results demonstrate that the performance of existing detectors fails to generalize beyond the narrow training data set to diverse coding domains and programming languages. Furthermore, while most detectors are easily compromised by humanizing the output distribution using superficial prompting and alignment approaches, we demonstrate that training with a small amount of adversarial data can easily address this issue. Finally, we demonstrate the effectiveness of metric learning and uncertainty-based resampling as a means of improving detector training in potentially noisy distributions.

Takeaways, Limitations

Takeaways:
Providing a large open dataset ($\texttt{DroidCollection}$) for training and evaluating machine-generated code detectors.
We present a new detector ($\texttt{DroidDetect}$) to improve generalization performance across various domains and programming languages.
A method for improving detector robustness using adversarial examples is presented.
A method for improving detector performance through metric learning and uncertainty-based resampling is presented.
Limitations:
Despite the data diversity of $\texttt{DroidCollection}$, it may not completely cover all real-world coding domains and programming languages.
The performance of the proposed detector depends on the dataset used and may be vulnerable to new types of code generation models or adversarial attacks.
Further research is needed on adversarial example generation and defense strategies.
👍