This paper proposes Variational Inference with Neural Speech Prior (VINP), a novel method for simultaneously estimating anechoic speech and room impulse response (RIR) from reverberant speech. VINP builds a probabilistic signal model in the time-frequency domain and utilizes a neural network-based variational Bayesian inference (VBI) framework for estimating anechoic speech priors. Unlike conventional single-channel reverberation cancellation methods, VINP is effective for automatic speech recognition (ASR) systems and estimates waveforms through maximum a posteriori probability (MAP) and maximum likelihood (ML) estimation of anechoic speech and RIR. Experimental results demonstrate state-of-the-art performance in Mean Opinion Score (MOS) and Word Error Rate (WER), as well as superior performance in estimating Reverberation Time at 60 dB (RT60) and Direct-to-Reverberation Ratio (DRR). Code and audio samples are available online.