This paper proposes a Hidden Model Markov Decision Process (HM-POMDP) to address the vulnerability of policies to environmental changes in partially observable Markov decision processes (POMDPs), which model sequential decision-making problems under uncertainty. HM-POMDP represents a set of multiple environment models (POMDPs) with common action and observation spaces. It assumes that the true environment model is hidden among several candidate models, and that the actual environment model is unknown at runtime. To compute robust policies that achieve sufficient performance within each POMDP, this paper combines (1) a deductive formal verification technique that supports inferable robust policy evaluation by computing the worst-case POMDP within the HM-POMDP, and (2) an ascent-descent method to optimize candidate policies for the worst-case POMDP. Experimental results demonstrate that the proposed method generates policies that are more robust and generalize better to unknown POMDPs than existing methods, and is scalable to HM-POMDPs with over 100,000 environments.