Epoch AI published the MirrorCode benchmark to test models' ability to reconstruct complete programs, and Claude Opus 4.7 led with a 56 percent solve rate, rebuilding a 16,000‑line toolkit in about 14 hours while models continued to fail the most complex tasks.