Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Comparative Evaluation of ChatGPT and DeepSeek Across Key NLP Tasks: Strengths, Weaknesses, and Domain-Specific Performance

Created by
  • Haebom

Author

Wael Etaiwi, Bushra Alhijawi

Outline

In this paper, we evaluate the performance of ChatGPT and DeepSeek, large-scale language models (LLMs) for natural language processing (NLP) tasks, across five major NLP tasks: sentiment analysis, topic classification, text summarization, machine translation, and text implicitness. We use a structured experimental protocol to ensure fairness and minimize variability by evaluating both models on two benchmark datasets per task using the same neutral prompts. Our experiments show that DeepSeek outperforms in classification stability and logical reasoning, while ChatGPT outperforms in tasks requiring fine-grained understanding and flexibility. These results provide valuable insights for selecting the appropriate LLM based on task requirements.

Takeaways, Limitations

Takeaways:
Provides guidance for choosing the right LLM for a specific NLP task.
Clarifies the strengths and weaknesses of ChatGPT and DeepSeek.
Comparatively analyze the performance of LLM on various NLP tasks to increase understanding of its practical applications.
Provides insight into the domain-specific capabilities of the LLM.
Limitations:
The generalizability may be limited as the evaluated LLMs are limited to ChatGPT and DeepSeek.
The type and number of benchmark data sets used may be limited.
Lack of detailed experimental protocols may require review for reproducibility.
Additional evaluations are needed for more diverse and complex NLP tasks.
👍