This paper focuses on Builder Action Prediction (BAP), a subtask of the Minecraft Collaborative Building Task (MCBT), aiming to improve AI agents' language understanding, environmental perception, and physical world behavior. To address the evaluation, training data, and modeling challenges of existing BAPs, we present BAP v2. BAP v2 presents improved evaluation benchmarks, fairer and more insightful metrics, and spatial reasoning capabilities, which are key performance detractors. To address data scarcity, we generate various types of synthetic MCBT data and leverage them to enhance the model's spatial capabilities. We present a new state-of-the-art model, Llama-CRAFTS, which leverages improved input representations to achieve an F1 score of 53.0 in BAP v2. While this represents a 6-point improvement over previous work, it still highlights the challenges of the task and lays the foundation for future research.