Infants learn to recognize objects from different viewpoints with little guidance. During this learning process, infants perform frequent eye and head movements that shape their visual experience. It is currently unclear how these behaviors contribute to infants’ ability to recognize objects as they emerge. To answer this question, this study combines head-mounted eye tracking during interactive play with unsupervised machine learning. We approximate infants’ central visual experience by cropping the image region from a head-mounted camera around the current gaze position estimated from eye tracking. This visual stream is fed into an unsupervised computational model of infants’ learning to construct visual representations that change slowly over time. Experimental results show that infants’ gaze strategies support the learning of invariant object representations. Furthermore, our analysis shows that the limited size of the central visual field, where acuity is high, is important for this. Overall, this study sheds light on how infants’ gaze behavior can support the development of viewpoint-invariant object recognition.