In this paper, we present DroidCollection , the most comprehensive open dataset for training and evaluating machine-generated code detectors . DroidCollection contains over a million code samples, seven programming languages, 43 coding model outputs, and at least three real-world coding domains. In addition to entirely AI-generated samples, it also includes code co-written by humans and AI, as well as adversarial samples explicitly crafted to evade detection. We then develop DroidDetect , a set of encoder-specific detectors trained on multi-task objectives using DroidCollection . Experimental results demonstrate that the performance of existing detectors fails to generalize beyond the narrow training data set to diverse coding domains and programming languages. Furthermore, while most detectors can be easily compromised by humanizing the output distribution using superficial prompting and alignment approaches, we demonstrate that training with a small amount of adversarial data can readily address this issue. Finally, we demonstrate that metric learning and uncertainty-based resampling are effective methods for improving detector training in potentially noisy distributions.