This paper studies the joint beamforming and resource allocation problem to minimize the average delay in a downlink reconfigurable intelligent surface (RIS)-supported orthogonal frequency division multiplexing (OFDM) system. Each user's data packet arrives at the base station (BS) probabilistically, and this sequential optimization problem is essentially a Markov decision process (MDP), which falls within the scope of reinforcement learning. To effectively handle the mixed action space and reduce the state space dimensionality, a hybrid deep reinforcement learning (DRL) method is proposed. Specifically, the proximity policy optimization (PPO)-Theta is used to optimize the RIS phase shift design, and PPO-N is responsible for the subcarrier allocation decisions. Then, the active beamforming at the BS is derived from the jointly optimized RIS phase shift and subcarrier allocation decisions. To further alleviate the curse of dimensionality associated with the subcarrier allocation, a multi-agent strategy is introduced to more efficiently optimize the subcarrier allocation index. In addition, to achieve more adaptive resource allocation and accurately capture the network dynamics, we integrate key factors closely related to the average delay, such as the number of waiting packets in the buffer and the current packet arrival, into the state space. In addition, we introduce a transfer learning framework to improve the training efficiency and accelerate the convergence. The simulation results show that the proposed algorithm significantly reduces the average delay, improves the resource allocation efficiency, and achieves superior system robustness and fairness compared with the baseline methods.