This paper presents a multi-faceted approach to overcome the difficulty and limited dataset of pediatric wrist lesion diagnosis. We use a fine-grained recognition strategy to identify fine X-ray lesions that are overlooked by conventional CNNs, and improve the network performance by fusion of patient metadata and X-ray images. In addition, we utilize pre-trained weights on a fine-grained dataset instead of a common dataset such as ImageNet. We show a 2% improvement in diagnostic accuracy on a limited dataset and more than 10% on a larger fracture-focused dataset. This is a novel attempt to utilize metadata integration in wrist lesion diagnosis.