This paper points out the Limitations of the existing benchmarks used in the software engineering field, especially the SWE-bench dataset, and proposes a new benchmark, SWE-MERA, to solve it. SWE-bench points out that the data pollution problem (direct solution leakage and inappropriate test cases) is serious and reduces the reliability, and SWE-MERA aims to solve this problem by automatically collecting real GitHub issues and conducting rigorous quality verification. It currently provides about 10,000 potential tasks and 300 samples, and the evaluation result using the Aider coding agent clearly shows the performance difference of the state-of-the-art LLMs. The performance of more than a dozen state-of-the-art LLMs is evaluated on tasks collected from September 2024 to June 2025.