We present a deep learning framework designed to enhance the precise and adaptive grasping capabilities of quadruped robots equipped with limbs and arms. We minimize reliance on real-world data collection by using a sim-to-real approach. We developed a pipeline to generate a synthetic dataset of grasping attempts on common objects within the Genesis simulation environment. Thousands of interactions were simulated from various viewpoints, generating pixel-wise annotated grasp-quality maps to serve as ground truth for the model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multimodal inputs from onboard RGB and depth cameras. The trained model outputs a grasp-quality heatmap to identify optimal grasp points. We validated the entire framework on a quadruped robot. The system successfully performed a complete loco-manipulation task: autonomously navigating to a target object, detecting the object with sensors, using the model to predict the optimal grasp pose, and performing a precise grasp.