This paper highlights the challenges of financial time series forecasting and the limitations of existing approaches (information loss due to data standardization, fixed number of variables and historical time series length, interpretability, and forecast uncertainty). To address these challenges, we construct a diverse financial image-text dataset (FVLDB) and develop an uncertainty-adjusted group-relative policy optimization (UARPO) method capable of forecasting and uncertainty analysis. We propose FinZero, a multimodal pre-trained model fine-tuned with UARPO, to perform inference, forecasting, and analytical understanding of FVLDB financial time series. Experimental results demonstrate strong adaptability and scalability, and in particular, FinZero improves prediction accuracy by approximately 13.48% in the high-confidence group compared to GPT-4o, demonstrating the effectiveness of reinforcement learning fine-tuning in multimodal, large-scale models.