In this paper, we propose Hypernetwork Model Alignment (Hyma) to solve the computational cost problem that occurs in the process of building a multi-modal model by combining various pre-trained single-modal models. Existing multi-modal model building methods require a lot of computational cost to train the connection modules that connect multiple single-modal models. Hyma solves this problem by utilizing hypernetworks to select the optimal combination of single-modal models and train the connection modules simultaneously. It efficiently finds the optimal model combination by jointly training the connection modules for N x M combinations of single-modal models through the parameter prediction function of hypernetworks.