This paper presents WildFit, a novel adaptation framework, to address the issue of poor accuracy of deep learning models in resource-constrained IoT devices, using a wildlife camera trap as an example. WildFit generates training data through on-device synthesis, focusing on background variations, and uses a drift-aware fine-tuning technique to update the model only when necessary. This maintains accurate species classification even under limited connectivity and energy constraints. Background-aware synthesis is more efficient than existing methods, and drift-aware fine-tuning improves accuracy while reducing the number of updates. As a result, WildFit outperforms existing domain adaptation methods by 20-35% and consumes only 11.2 Wh of energy over 37 days.