This paper addresses the problem of understanding reference representations in remote sensing, which requires inference of object-context relationships. Supervised learning approaches demonstrate strengths with large datasets, but generalization performance suffers in data-poor environments. To address these limitations, we propose Geo-R1, an inference-driven reinforcement learning-based fine-tuning (RFT) paradigm for solving geospatial referencing problems in low-dataset environments. Geo-R1 first generates an explicit, interpretable inference chain that decomposes reference representations, then leverages these inferences to locate target objects. This "inference-and-act" process effectively leverages limited annotations, improves generalization, and provides interpretability. Geo-R1 consistently outperforms SFT-based models on geospatial referencing benchmarks in three low-dataset environments, demonstrating strong cross-dataset generalization.