This paper presents a method for predicting bed departure intentions early, using a single, low-cost load cell installed under the bed leg to prevent patient falls in hospitals and long-term care facilities. The load cell signals are converted into texture maps, including RGB line graphs, recursive plots, Markov transform fields, and Gramian angle fields, and then fused. These are then parallel processed using a dual-stream Swin Transformer called ViFusionTST, and fused using cross-attention to learn data-driven modal weights. To reflect real-world conditions, ViFusionTST is evaluated using data collected over six months from 95 beds in a long-term care facility. The results show an accuracy of 0.885 and an F1 score of 0.794, outperforming existing time-series-based models.