This paper proposes a novel hybrid framework, CE-RS-SBCIT, for early diagnosis and accurate classification of brain tumors. To address the high computational cost, sensitivity to subtle contrast changes, and structural heterogeneity and tissue inconsistency of existing CNN and Transformer models, we integrate residual- and spatial-learning-based CNNs with Transformer-based modules. Key innovations include (i) a smoothing- and edge-based CNN-integrated Transformer (SBCIT), (ii) a customized residual- and spatial-learning CNN, (iii) a channel enhancement (CE) strategy, and (iv) a novel spatial attention mechanism. SBCIT utilizes stem convolution and contextual interaction transformer blocks for efficient global feature modeling, while the residual- and spatial CNNs enrich the representational space with transfer-learned feature maps. The CE module amplifies discriminative channels and mitigates redundancy, while the spatial attention mechanism selectively emphasizes subtle contrast and tissue changes. Experiments using various MRI datasets from Kaggle and Figshare showed excellent performance, achieving 98.30% accuracy, 98.08% sensitivity, 98.25% F1-score, and 98.43% precision.