We present a novel framework for studying the formation and adaptation of internal world models using human neural organoids. This paper trains biological agents using three scalable, closed-loop virtual environments and explores the synaptic mechanisms underlying learning, such as long-term synaptic plasticity (LTP) and long-term synaptic depression (LTD). Furthermore, we propose a meta-learning approach that leverages a large language model to automate the design and optimization of experimental protocols. We also present a multimodal evaluation strategy that directly measures synaptic plasticity beyond task performance to quantify the physical correlates of the learned world model.