This paper proposes R-Stitch, a novel method for reducing the computational cost of Chain-of-Thought (CoT) inference. CoT inference enhances the problem-solving ability of large-scale language models (LLMs), but it is computationally expensive due to its autoregressive decoding of long token sequences. Existing acceleration strategies either reduce sequence length through early stopping or compression compensation schemes, or improve decoding speed through predictive decoding using small-scale models. However, predictive decoding has limited speedup when the agreement between the small-scale and large-scale models is low, and fails to leverage the potential benefits of small-scale models in generating concise intermediate inferences. R-Stitch is a token-level confidence-based hybrid decoding framework that switches between small-scale language models (SLMs) and large-scale language models (LLMs), utilizing LLMs only when the SLM's confidence falls below a threshold, maintaining both efficiency and accuracy. It is model-independent, requires no training, and is compatible with standard decoding pipelines. Mathematical inference benchmark experiments show that R-Stitch reduces inference latency by up to 85% with minimal accuracy degradation.