This paper points out that existing preference sorting techniques, such as Best-of-N (BoN) sampling, fail to properly assess the acceptability of responses, potentially leading to the selection of inappropriate options. To address this issue, this paper proposes a reward model trained by adding external options to preference data, inspired by discrete choice models. This model can identify not only better responses but also sufficiently good ones. Based on this, the paper develops an adaptive inference strategy, best of mini-N in-loop, which balances reliability and efficiency. Experimental results show that this technique reduces reliability failures by 70% when used as an alignment guardrail and improves average inference speed by more than 22% when used as an inference accelerator.