This paper is the first to systematically investigate the robustness of the deductive reasoning capabilities of large-scale language models (LLMs). To evaluate the robustness of formal and informal LLM-based inference methods, we propose a framework that generates seven transformed datasets using two types of perturbations: adversarial noise and counterfactual statements. We categorize LLM reasoners based on their inference format, formalization syntax, and feedback for error recovery, and analyze the strengths and weaknesses of each method. Experimental results show that adversarial noise influences automatic formalization, while counterfactual statements affect all approaches. Detailed feedback reduces syntactic errors but does not improve overall accuracy, highlighting the difficulty in self-correcting efficiency of LLM-based methods.