This paper presents a novel method for early prediction of a patient's bed departure intention using a single, inexpensive force sensor. We propose ViFusionTST, a dual-stream Swin Transformer model that transforms the signals from a force sensor installed under a bed leg into three texture maps: an RGB line graph, a recursion plot, a Markov transition field, and a Gramian angle field. These maps are then parallel-processed and fused via cross-attention. We evaluate the model using data collected from 95 beds in a real nursing home over a six-month period. The model achieves an accuracy of 0.885 and an F1 score of 0.794, outperforming existing 1D and 2D time-series-based models. This demonstrates that image-based force sensor signal fusion can be an effective, privacy-preserving, real-time fall prevention solution.