This paper addresses the dynamic task allocation problem of optimally allocating resources to tasks in business processes. Although deep reinforcement learning (DRL) has recently been proposed as a state-of-the-art method to solve this problem, it is still a difficult task to represent states and possible assignments as inputs and outputs of a policy neural network (NN) when tasks or resources have features that can have infinite values. This paper presents a method to represent and solve the assignment problem with infinite state and action spaces, and provides three contributions: a graph-based feature representation called an assignment graph, a mapping from labeled colored Petri nets to the assignment graph, and an application of a proximate policy optimization algorithm that can learn to solve the assignment problem represented through the assignment graph. We evaluate the proposed method by modeling three typical assignment problems with near-infinite state and action space dimensions, and show that it is suitable for representing and learning near-optimal task allocation policies regardless of the state and action space dimensions.