This paper presents a novel pretraining method that considers solvent-dependent structural changes and joint learning of multiple correlated tasks to improve the accuracy of protein-ligand interaction prediction. Using an ensemble of ligand structures generated under various solvent conditions as augmented input, we integrate structural flexibility and environmental context to learn. We integrate molecular reconstruction, interatomic distance prediction, and contrastive learning to build a solvent-invariant molecular representation. As a result, we demonstrate improved performance in binding affinity prediction (3.7% improvement), the PoseBusters Astex docking benchmark (82% success rate), and virtual screening (AUC 97.1%), achieving a root mean square deviation (RMSD) of 0.157 angstroms, providing insight into atomic-level binding mechanisms.