In this paper, we propose a novel approach to automate the manual balancing task of Dungeon Masters (DMs) in Dungeons & Dragons (D&D), called 'Encounter Generation via Reinforcement Learning (NTRL).' NTRL generates encounters based on real-time party member attributes by framing the situational bandit problem. Compared to existing DM heuristics, it increases encounter intensity by increasing the duration of the fight (+200%), increasing the damage inflicted to party members, decreasing post-fight health loss (-16.67%), and increasing the number of player deaths (while keeping the total party wipe low). It maintains a high win rate (70%) while enhancing strategic depth and increasing difficulty to maintain game fairness, and outperforms encounters designed by human DMs.