This paper proposes a novel task called Question-Based Sign Language Translation (QB-SLT). While conventional SLT relies on vocabulary (gloss), QB-SLT leverages conversational context to enhance translation efficiency. Conversations are easier to annotate than vocabulary and reflect natural communication situations. To improve translation performance by aligning multimodal features and leveraging question context, we propose a cross-modal self-supervised learning (SSL-SSAW) fusion method using sigmoid self-attention weights (SSAW). Multimodal features are aligned through contrastive learning, and the SSAW module adaptively extracts features from questions and sign sequences. SSL-SSAW achieves state-of-the-art performance on the newly constructed CSL-Daily-QA and PHOENIX-2014T-QA datasets, demonstrating that question-based assistance performs equally or better than vocabulary-based assistance. Visualization results demonstrate that dialogue integration effectively improves translation quality.