This paper addresses the challenges of causal inference in large-scale language models (LLMs). Recent advances in LLM inference have led to active research on whether state-of-the-art inference models can perform causal discovery robustly, unlike existing models that are vulnerable to data fluctuations and suffer from overfitting. Using the Corr2Cause benchmark, researchers found that inference-first architectures significantly outperform existing approaches, leveraging OpenAI's o-series and DeepSeek-R models. To leverage this strength, they proposed a modular in-context pipeline inspired by Tree-of-Thoughts and Chain-of-Thoughts methodologies, achieving approximately a threefold performance improvement over existing baseline models. Analysis of inference chain length and complexity, along with qualitative and quantitative comparisons with existing models, suggests that while advanced inference models have made significant progress, a carefully constructed in-context framework is essential to maximize their capabilities and provide a generalizable blueprint for causal discovery across a wide range of domains.