This paper addresses the limitations of previous studies on the evaluation of Link Prediction (LP) methods, an important problem in the fields of network science and machine learning, and proposes a more rigorous and controlled experimental setup. We address the issue that previous studies have been evaluated in a uniform setting without considering various factors such as network type, problem type, geodesic distance between nodes, characteristics and applicability of LP methods, and class imbalance. In this paper, we present an experimental setup that considers these factors and conduct extensive experiments using various real-world network datasets. Based on the experimental results, we provide insights into the interactions of factors that affect the performance of LP methods and suggest best practices for evaluating LP methods.