This paper addresses the growing availability of open Earth observations (EO) and agricultural datasets, which hold great potential for supporting sustainable land management, but high technical barriers to entry limit their accessibility to non-expert users. To address this, this study presents an open-source conversational assistant that integrates multimodal retrieval and a large-scale language model (LLM). The proposed architecture combines orthoimagery, Sentinel-2 vegetation indices, and user-provided documents via augmented retrieval generation (RAG), allowing the system to flexibly utilize multimodal evidence, textual knowledge, or both to construct answers. To assess response quality, we employ an LLM-as-a-judge methodology using Qwen3-32B in a zero-shot, unsupervised setting, scoring directly within a multidimensional quantitative evaluation framework. Preliminary results demonstrate that the system can generate clear, relevant, and context-aware responses to agricultural questions, and is reproducible and scalable across geographic regions. Key contributions include an architecture that fuses multimodal EO and textual knowledge sources, a demonstration of lowering barriers to accessing expert agricultural information through natural language interaction, and an open and reproducible design.