REACT is a benchmark designed to rigorously evaluate the inference capabilities of large-scale language models (LLMs) in responsible, high-stakes decision-making tasks in healthcare and law. Unlike existing benchmarks that focus on predictive accuracy, REACT emphasizes transparent and interpretable inference, requiring models to closely align their logic with expert-derived procedures. To assess how closely LLM inferences align with human experts, 511 clinical cases in healthcare and 86 legal cases in law were annotated with detailed expert-derived rationales supporting each step of the inference process. These annotations were guided by carefully constructed inference graphs that explicitly encode domain-specific inference structures and decision criteria derived from domain experts. These inference graphs not only serve as a standard for expert annotations, but also serve as structured guidelines for models to make transparent, step-by-step inferences. To address the scalability issue of manual annotations, we developed a semi-automatic annotation pipeline that efficiently generates new graphs using expert-defined inference graph templates, exploring the potential for extending our approach to additional important domains. Our experimental results show that the inference graph significantly improves the interpretability and accuracy of LLM inference compared to existing baselines, but it still leaves a significant gap with expert-level inference performance.