StorySim is a programmable framework for artificially generating stories to evaluate the theoretical mind (ToM) and world modeling (WM) abilities of large-scale language models (LLMs). To address the pretraining data contamination problem of existing benchmarks, StorySim generates novel, constructive story prompts based on highly controlled storyboards, allowing for precise manipulation of character perspectives and events. Using this framework, we designed primary and secondary ToM tasks, along with WM tasks that assess the ability to track and model mental states. Experiments with state-of-the-art LLMs revealed that most models performed better on WM tasks than on ToM tasks, and tended to perform better in reasoning with humans than with inanimate objects. Furthermore, we found evidence of heuristic behaviors, such as recency bias and overreliance on early events in the story. All code for data generation and evaluation is publicly available.