To address the practical yet challenging task of emotion recognition (ERC) in conversation, this paper proposes a novel multimodal approach, the long-range graph neural network (LSDGNN). Based on the directed acyclic graph (DAG), the long-range graph neural network and the short-range graph neural network are constructed to obtain multimodal features of distant and adjacent utterances, respectively. To enable the mutual influence between the two modules and to make the long-range and short-range features in the expression as distinct as possible, we use the differential regularizer and incorporate the bi-linear module (BiAffine Module) to facilitate feature interaction. In addition, we propose the improved curriculum learning (ICL) to address the data imbalance problem. By calculating the similarity between different emotions and emphasizing the changes in similar emotions, we design the “weighted emotion change” metric and develop a difficulty measure to enable the learning process to learn easy samples first. Experimental results on the IEMOCAP and MELD datasets demonstrate that the proposed model outperforms existing benchmarks.