This paper presents MMT4NL, a software testing-based framework for evaluating the reliability of in-context learning (ICL) of large-scale language models (LLMs). MMT4NL exploits adversarial examples and software testing techniques to identify vulnerabilities in ICLs. It treats LLMs as software and generates modified adversarial examples from a test set to quantify and identify bugs in ICL prompts. Experiments on sentiment analysis and question-answering tasks reveal various linguistic bugs in state-of-the-art LLMs.