This paper presents an open test based on algorithmic probability that avoids benchmark contamination in the quantitative evaluation of state-of-the-art models in relation to claims of artificial general intelligence (AGI) and superintelligence (ASI). Unlike existing tests, it does not rely on statistical compression methods such as GZIP or LZW, which are closely related to Shannon entropy and cannot test for more than simple pattern matching. The test challenges AI, and in particular LLM, in relation to fundamental intelligence features such as synthesis and model generation in the context of inverse problems. We argue that metrics for predictive planning based on model abstraction and induction (optimal Bayesian inference) can provide a robust framework for testing intelligence, including natural intelligence (humans and animals), narrow AI, AGI, and ASI. We find that LLM model versions are fragile and incremental, mainly as a result of memorization, and that progress tends to be driven primarily by the size of the training data. We compare our results with a hybrid neurosymbolic approach that theoretically guarantees universal intelligence based on the principles of algorithmic probability and Kolmogorov complexity. In a proof of concept for short binary sequences, we demonstrate that this method outperforms LLM. We demonstrate that compression is directly proportional to the predictive power of the system, i.e., the better the system can predict, the better it can compress, and the better it can compress, the better it can predict. These results reinforce suspicions about the fundamental limitations of LLM, exposing it as a system optimized for proficiency recognition of human language.