This paper presents research to overcome the limitations of the existing Transformer architecture due to the increasing demand for machine intelligence that can process continuous and long-term contextual inputs on local devices. To solve the problems of low efficiency and practical use due to the secondary complexity and memory requirements of the existing Transformer, research is being conducted focusing on State Space Models (SSMs) and hybrid models that provide linear scalability. In this paper, we perform comprehensive comparative benchmarking of Transformer, SSM, and hybrid models for long-term context inference on real consumer and embedded GPUs, and show that SSM is more suitable for long-term context processing and can process up to 220K tokens on consumer GPUs. In particular, we confirm that SSM is up to 4 times faster than Transformer in long-term contexts, and reveal that the hardware-aware SSM kernel accounts for more than 55% of the inference execution time, suggesting that it is a key target for future hardware acceleration. In addition, we provide detailed device-specific characteristic analysis results for edge system co-design, and we plan to open-source the benchmarking framework to further advance our research.