This paper highlights that increasing model size is a critical factor in improving performance in image-based deep reinforcement learning, and presents research to improve the existing Impala-CNN (a 15-layer ResNet-based image encoder). Instead of flattening the output feature map of Impala-CNN, we propose Impoola-CNN, which utilizes global average pooling. We experimentally demonstrate that Impoola-CNN outperforms existing models, particularly in generalization, on the Procgen benchmark. This performance improvement is particularly pronounced in games without agent-centric observation, and we speculate that it is related to the network's reduced sensitivity to transformations. In conclusion, we emphasize the importance of efficient network design, not just increasing model size.