This paper presents a gene panel selection strategy for identifying the most useful genomic biomarkers from unlabeled genomic datasets. Existing methods relying on expert knowledge, machine learning models, or heuristic-based iterative optimization suffer from biases and inefficiencies that can lead to the miss of important biological signals. This study proposes an iterative gene panel selection strategy that leverages ensemble knowledge from existing gene selection algorithms to establish prior knowledge that guides the initial search space and integrates reinforcement learning with a reward function formed by expert actions. This strategy leverages the probabilistic adaptability of reinforcement learning while mitigating biases arising from the initial boundary. Comprehensive comparative experiments, case studies, and subsequent analyses demonstrate the efficiency and accuracy of the proposed method, highlighting its potential to contribute to the advancement of single-cell genomic data analysis.