This paper presents Conservative Discrete Quantile Actor-Critic (CDQAC), a novel offline reinforcement learning algorithm, to learn efficient scheduling policies for the Workplace Scheduling Problem (JSP) and the Flexible Workplace Scheduling Problem (FJSP). To overcome the drawbacks of existing online reinforcement learning-based methods, which require numerous simulation interactions and low sampling efficiency, CDQAC learns directly from existing data and updates its policies by estimating the payoff distribution for each machine-task pair. Experimental results demonstrate that CDQAC outperforms existing heuristics and state-of-the-art offline/online reinforcement learning algorithms by learning from diverse data sources, and achieves high performance with a small amount of training data, demonstrating high sampling efficiency. Interestingly, CDQAC outperforms data generated by random heuristics, rather than genetic algorithms or priority assignment rules.