This paper raises the risk that publicly available large-scale language model (LLM) benchmarks could be unintentionally (or intentionally) used in future LLM training or selection, potentially leading to model contamination. Existing solutions, such as benchmark secrecy and participant model/prediction submission, rely on trust in a specific institution and leave open the possibility of overfitting through repeated queries. This paper proposes a method for publicly disclosing benchmarks, enabling the public evaluation of LLMs without revealing the full answers. The core idea is to inject randomness into the answers by providing multiple logically correct answers and including only one of them as the correct answer. This approach reduces the Bayesian accuracy of the benchmark, protecting the correct answer and providing a test for detecting data contamination. Since even perfect models cannot exceed the Bayesian accuracy, exceeding it is a strong indicator of data contamination. Experimental results demonstrate that this method can accurately detect data contamination across a variety of benchmarks, models, and learning methods.