In this paper, we present $\textbf{$\texttt{DroidCollection}$}$, the most extensive open dataset for training and evaluating machine-generated code detectors. $\texttt{DroidCollection}$ contains over a million code samples, seven programming languages, 43 coding model outputs, and at least three real-world coding domains. In addition to fully AI-generated samples, it also includes human-AI co-authored code and adversarial samples explicitly crafted to evade detection. We then develop $\textbf{$\texttt{DroidDetect}$}$, a suite of encoder-specific detectors trained on multi-task objectives using $\texttt{DroidCollection}$. Experimental results demonstrate that the performance of existing detectors fails to generalize beyond the narrow training data set to diverse coding domains and programming languages. Furthermore, while most detectors are easily compromised by humanizing the output distribution using superficial prompting and alignment approaches, we demonstrate that training with a small amount of adversarial data can easily address this issue. Finally, we demonstrate the effectiveness of metric learning and uncertainty-based resampling as a means of improving detector training in potentially noisy distributions.