This paper addresses the problem of large-scale language models (LLMs) learning word representations with an undesirable property called anisotropy. The researchers argue that the second moment of the Adam optimizer is the cause of the anisotropic embeddings and propose a modified optimizer, Coupled Adam, to mitigate this problem. Experimental results show that Coupled Adam significantly improves the quality of embeddings and enhances the performance of both superordinate and subordinate tasks on sufficiently large datasets.