Plancraft is a multimodal evaluation dataset for LLM agents. It provides a text-only and multimodal interface based on the Minecraft creation GUI. It includes the Minecraft wiki for tool usage and Retrieval Augmented Generation (RAG) evaluation, and a hand-crafted planner and an Oracle Retriever to analyze various components of modern agent architectures. It also includes a subset of examples that are intentionally unsolvable for decision evaluation, providing realistic tasks that require the agent to not only complete the task, but also decide whether it is solvable. We benchmark open-source and closed-source LLMs and compare their performance and efficiency to hand-crafted planners. Overall, we find that LLM and VLM struggle with the planning problems presented in Plancraft, and provide suggestions on how to improve their capabilities.