This paper addresses the challenges of balancing model performance, computational complexity, and memory constraints when deploying quantized neural networks (QNNs) on resource-constrained devices (e.g., microcontrollers). Tiny Machine Learning (TinyML) addresses these challenges by integrating advances in machine learning algorithms, hardware acceleration, and software optimization to efficiently execute deep neural networks on embedded systems. This paper introduces quantization from a hardware-centric perspective and systematically reviews essential quantization techniques used to accelerate deep learning models for embedded applications, focusing on the critical tradeoffs between model performance and hardware capabilities. Furthermore, we evaluate existing software frameworks and hardware platforms specifically designed to support QNN execution on microcontrollers, highlighting current challenges and promising future directions in the rapidly evolving field of QNN deployment.