In this paper, we present the first publicly available dataset for slide animation generation and demonstrate how it can be used to improve the performance of a Vision-Language Model (VLM). Using a dataset of 12,000 natural language descriptions, animation JSON files, and rendered videos, we fine-tune the Qwen-2.5-VL-7B model with Low-Rank Adaptation (LoRA) to achieve better performance than the GPT-4.1 and Gemini-2.5-Pro models on BLEU-4, ROUGE-L, SPICE, and the newly proposed CODA metric. The CODA metric evaluates the motion coverage, temporal order, and detail fidelity of the animation. We demonstrate that the LoRA technique provides reliable temporal inference and generalization ability beyond synthetic data. The provided dataset, LoRA-based model, and CODA metric provide a rigorous benchmark and foundation for future research on VLM-based dynamic slide generation.