This paper discusses the integration of simultaneous localization and mapping (SLAM) and multi-object tracking (MOT), which play a crucial role in autonomous driving. Conventional SLAM and MOT are processed independently, resulting in limited accuracy. Specifically, SLAM assumes a static environment, while MOT tends to rely on vehicle position information. To address these issues, the research team proposed a LiDAR-based SLAMMOT that considers multiple motion models in a previous study (IMM-SLAMMOT). In this paper, we extend this approach to a vision-based system and propose a visual SLAMMOT. The goal of this paper is to verify the feasibility and advantages of visual SLAMMOT that considers multiple motion models.