Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada

Created by
  • Haebom

Author

Braeden Sherritt, Isar Nejadgholi, Efstratios Aivaliotis, Khaled Mslmani, Marzieh Amini

Outline

This paper emphasizes the importance of accessing real-time information on wildfire situations in Canada and focuses on leveraging social media data to overcome the limitations of existing data sources. Specifically, we present WildFireCan-MMD, a multimodal (text and image) wildfire social media dataset lacking in the Canadian context. This dataset annotates recent Canadian wildfire-related posts (X) into 12 key themes. We compare a zero-shot Vision-Language Model (VLM), a custom-trained model, and a baseline classifier, demonstrating that the custom-trained model outperforms both the zero-shot model and the baseline classifier (84.48% f-score) when labeled data is available. Furthermore, we propose a method for identifying wildfire trends using large-scale, unlabeled datasets, emphasizing the importance of region-specific datasets.

Takeaways, Limitations

Takeaways:
We provide WildFireCan-MMD, a multi-modal dataset specialized for Canadian wildfire situations, which can contribute to future wildfire response research.
We experimentally demonstrate that the custom trained model outperforms the zero-shot VLM and baseline classifiers.
We propose a method for identifying forest fire situation trends through analysis of large-scale unlabeled datasets.
Emphasizes the importance of regionally specific datasets and provides Takeaways for developing disaster response strategies.
Limitations:
Further discussion may be needed regarding the size and diversity of the WildFireCan-MMD dataset.
The challenge of obtaining labeled data necessary for training custom models still exists.
Further research is needed to determine generalizability across different regions and contexts.
👍