Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques

Created by
  • Haebom

Author

Lang Xiong, Raina Gao, Alyssa Jeong, Yicheng Fu, Sean O'Brien, Vasu Sharma, Kevin Zhu

Outline

This paper focuses on sarcasm classification and generation using large-scale language models. To address the challenges of existing sarcasm detection, we present the Sarc7 benchmark, which classifies seven types of sarcasm: self-deprecating, gloomy, neutral, polite, unpleasant, furious, and manic, based on the MUStARD dataset. We evaluate classification performance using zero-shot, few-shot, Chain of Thought (CoT), and a novel emotion-based prompting technique. We then propose an emotion-based generation method by identifying key elements of sarcasm—incongruity, shock, and contextual dependence. Experimental results show that the Gemini 2.5 model achieved an F1 score of 0.3664 when using emotion-based prompting, outperforming other settings. Human evaluators evaluated the emotion-based prompting as 38.46% more successful in generating sarcasm than zero-shot prompting.

Takeaways, Limitations

Takeaways:
Contributed to sarcasm detection research by proposing a new benchmark, Sarc7, which classifies seven types of sarcasm.
We present the possibility of improving the sarcasm classification and generation performance of large-scale language models using emotion-based prompting techniques.
The effectiveness of emotion-based prompting was verified through experimental results using the Gemini 2.5 model.
Limitations:
The F1 score of 0.3664 is still low, even considering the high difficulty of the sarcasm classification problem. Further research is needed to achieve higher performance.
Further validation of the generalization performance of emotion-based prompting is needed.
There is a dependency on a specific model (Gemini 2.5), and experimental results for other models are required.
Due to the dependency on the MUStARD dataset, performance validation on other datasets is required.
👍