This paper addresses the problem of autonomous navigation in marine environments with spatially varying fluid flow and dynamic and static obstacles. To overcome the existing difficulties, we present a method to integrate local fluid flow measurements. We emphasize that simply utilizing fluid flow data is not enough, and it should be effectively fused with existing sensor inputs such as self-state and obstacle states. To this end, we propose MarineFormer, a Transformer-based policy architecture that integrates two complementary attention mechanisms: spatial attention (sensor fusion) and temporal attention (environment dynamics capture). MarineFormer is trained using reinforcement learning in a 2D simulation environment, and it is shown to improve the episode completion rate by about 23% and shorten the path length compared to existing and state-of-the-art baseline models. Additional ablation studies emphasize the importance of fluid flow measurements and the effectiveness of the proposed architecture.