This paper points out that in the study of human behavior monitoring using smartphones and wearable sensors, there is a difficulty in deriving high-level insights that are context-aware beyond the existing simple behavior recognition (e.g., physical activity recognition). The research team explored solutions to this problem through three user studies targeting 21 experts. To this end, we developed Vital Insight (VI), an LLM-based prototype system, and presented a method to support and visualize the insight derivation process (sensemaking) through human-computer interaction of multimodal passive sensing data. Through VI, we observed the interactions of experts and developed an expert sensemaking model that explains the transition process between data representation and AI-assisted inference. Finally, we present design implications for designing an AI-augmented visualization system that better supports the sensemaking process of experts in multimodal health sensing data.