Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content

Created by
  • Haebom

Author

Ran Tong, Songtao Wei, Jiaqi Liu, Lanruo Wang

Outline

This paper addresses the problem of hate memes targeting the LGBTQ+ community evading detection systems with even minor alterations to captions or images. Using the PrideMM dataset, we build the first robustness benchmark by combining four realistic caption attacks and three common image corruptions. Using two state-of-the-art detectors, MemeCLIP and MemeBLIP2, as case studies, we present a lightweight Text Denoising Adapter (TDA) that improves the resilience of MemeBLIP2. Experimental results show that MemeCLIP degrades more gently, while MemeBLIP2 is particularly sensitive to caption editing that interferes with language processing. However, adding TDA not only addresses this weakness, but also makes MemeBLIP2 the most robust model overall. Further analysis reveals that while all systems rely heavily on text, architecture choice and pretraining data significantly impact robustness. This benchmark highlights vulnerabilities in current multimodal safety models and demonstrates that targeted, lightweight modules like TDA are an effective way to achieve stronger defenses.

Takeaways, Limitations

Takeaways:
We highlight the challenges of detecting hate memes targeting the LGBTQ+ community and the need to develop robust models to address them.
We compare and analyze the strengths and weaknesses of MemeCLIP and MemeBLIP2 to suggest future model development directions.
We demonstrate that the robustness of multimodal safety models can be improved through lightweight TDA modules.
We highlight the importance of architecture selection and pretraining data in the robustness of multimodal models.
Limitations:
Given the dependence on the PrideMM dataset, further research is needed to determine generalizability to other datasets.
It may be limited to an assessment of a specific type of attack, rather than a comprehensive assessment of all types of attacks.
The effectiveness of TDA may be limited to specific models and datasets, and further research is needed to determine its generalizability to other models and datasets.
👍