To meet the growing demand for efficient and intelligent embedded AI systems, this paper proposes MoSE, a novel Mixed Expert (MoE) method. To address the challenges of existing MoE models, which require massive training data and complex optimization processes, MoSE mimics human learning and inference processes, performing skill-by-skill, step-by-step learning. It facilitates skill-by-skill learning by defining and annotating specific skills, allowing experts to identify the competencies required for various scenarios and inference tasks. It builds a hierarchical skill dataset and pretrains routers to encourage step-by-step inference, integrating auxiliary tasks such as perception-prediction-planning for autonomous driving (AD) and high- and low-level planning for robots into a single pass without additional computational overhead. It effectively scales diverse expertise with fewer than 3 billion sparsely activated parameters, outperforming existing models on both AD corner-case inference and robot inference tasks with fewer parameters (less than 40%).