Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Toddlers' Active Gaze Behavior Supports Self-Supervised Object Learning

Created by
  • Haebom

Author

Zhengyang Yu, Arthur Aubret, Marcel C. Raabe, Jane Yang, Chen Yu, Jochen Triesch

Outline

Infants learn to recognize objects from different viewpoints with little guidance. During this learning process, infants perform frequent eye and head movements that shape their visual experience. It is currently unclear how these behaviors contribute to infants’ ability to recognize objects as they emerge. To answer this question, this study combines head-mounted eye tracking during interactive play with unsupervised machine learning. We approximate infants’ central visual experience by cropping the image region from a head-mounted camera around the current gaze position estimated from eye tracking. This visual stream is fed into an unsupervised computational model of infants’ learning to construct visual representations that change slowly over time. Experimental results show that infants’ gaze strategies support the learning of invariant object representations. Furthermore, our analysis shows that the limited size of the central visual field, where acuity is high, is important for this. Overall, this study sheds light on how infants’ gaze behavior can support the development of viewpoint-invariant object recognition.

Takeaways, Limitations

Takeaways: Reveals that infants' gaze strategies play a critical role in the development of viewpoint-invariant object recognition. Suggests that the limited size of the central visual field plays an important role in learning invariant representations. Presents a novel approach to model infants' visual learning process using unsupervised machine learning.
Limitations: The study subjects were limited to infants, so further research is needed on generalizability. Data may be limited due to technical limitations of head-mounted eye tracking and cameras. The model may not perfectly reflect the learning process of actual infants due to simplification. Further research is needed on generalizability to various environments and situations.
👍