This paper compares and analyzes two major paradigms in self-supervised learning (SSL): reconstruction and concatenated embedding. Reconstruction methods focus on recovering original samples in the input space, while concatenated embedding methods align representations of different views in the latent space. This study elucidates the core mechanisms of both paradigms and precisely characterizes the impact of the view generation process on learned representations. Furthermore, it demonstrates that both SSL paradigms require minimal alignment conditions between irrelevant features and augmentation. When the size of irrelevant features is large, concatenated embedding methods are more suitable than reconstruction-based methods because they require weaker alignment conditions.