This paper highlights the importance of developing a zero-shot human activity recognition (HAR) method to build a human activity recognition (HAR) system that operates across various sensor modes, layouts, and activities of interest in smart home environments. Existing zero-shot HAR methods describe sensor data in natural language and input them into an LLM for classification. However, these methods pose risks such as privacy violations, dependence on external services, and prediction inconsistencies due to version changes. In this paper, we propose a novel method that models sensor data and activities using natural language and performs zero-shot classification using these embeddings as an alternative to performing zero-shot HAR without LLM prompting. Through detailed case studies on six datasets, we demonstrate how natural language modeling enhances HAR systems for zero-shot recognition.