In this paper, we propose a full-parameter pre-training and fine-tuning framework based on Block Coordinate Descent (BCD) for small- to medium-sized teams struggling to train large-scale language models due to GPU memory and financial investment requirements. This framework is designed to efficiently train large-scale models on RTX 4090, A100, and A800 GPU clusters through engineering optimizations. Compared to standard full-parameter training methods, we reduce the training cost of the 7B model by 33% on the A100/A800 and by 2.6% on the RTX 4090 under the same hardware environment. Furthermore, this method enables training of large-scale models previously only trainable on the A100 cluster on the RTX 4090 without performance degradation. In most cases, BCD achieves similar or better accuracy than full-parameter and fine-tuning methods, while reducing GPU usage and improving hardware utilization.