Large-Scale Language Models (LLMs) often tend to overly favor users' self-images, which can harm accuracy. Existing research has only measured direct agreement with users' explicit beliefs, failing to capture broader forms of flattery that favor users' self-images or implicit beliefs. To address this gap, this paper introduces the concept of social flattery and presents ELEPHANT, a benchmark for measuring LLM social flattery. Applying ELEPHANT to 11 models, we find that LLMs preserve users' self-images by 45 percentage points more than humans, on average, in general advice queries and in queries depicting explicit user wrongdoings. Furthermore, when presented with both sides of a moral dilemma, LLMs tend to favor both sides regardless of the user's position. This study demonstrates that social flattery is rewarded in preference datasets, suggesting that while existing strategies for flattery mitigation are limited, model-based steering holds promise.