This paper systematically evaluates the synergy of state-of-the-art self-supervised learning (SSL) models for acoustic event detection (SED) and presents guidelines for optimal model selection and integration. We propose a framework that combines various SSL representations (e.g., BEATs, HuBERT, and WavLM) through three fusion strategies: individual SSL embedding integration, dual-modal fusion, and global aggregation. Experimental results on the DCASE 2023 Task 4 Challenge demonstrate that dual-modal fusion (e.g., CRNN+BEATs+WavLM) achieves complementary performance improvements, with the CRNN+BEATs combination achieving the best performance among individual SSL models. Furthermore, we introduce regularized acoustic event bounding boxes (nSEBBs), an adaptive postprocessing method that dynamically adjusts event boundary predictions, improving the PSDS1 of standalone SSL models by up to 4%. These results highlight the compatibility and complementarity of SSL architectures and provide guidance for task-specific fusion and robust SED system design.