This paper highlights that Transformer-based models still lack the generality and adaptability required for human-AI coordination. By examining weaknesses in the ARC-AGI task, we uncover differences in constructive generalization and novel rule adaptation, and argue that resolving these gaps requires a revamped inference pipeline and its evaluation. We propose three research directions: a symbolic representation pipeline for constructive generality, an interactive feedback-based inference loop for adaptability, and test-time task augmentation that balances both characteristics. Finally, we demonstrate how ARC-AGI's evaluation tools can be used to track progress in symbolic generality, feedback-based adaptability, and task-level robustness, guiding future research on robust human-AI coordination.