Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network

Created by
  • Haebom

Author

Liangjin Liu, Haoyang Zheng, Pei Zhou

Outline

This paper proposes Dual-SignLanguageNet (DSLNet) to address the challenge of Independent Sign Language Recognition (ISLR), which struggles to distinguish between morphologically similar but semantically distinct gestures. DSLNet employs a dual-reference, dual-stream architecture that models hand shape and movement trajectories in separate coordinate systems. It performs viewpoint-independent shape analysis using a wrist-centered coordinate system, and context-aware trajectory modeling using a face-centered coordinate system. It utilizes topology-aware graph convolution for shape analysis and a Finsler geometry-based encoder for trajectory modeling, and integrates the two streams via a geometry-based optimal transfer fusion mechanism. Experimental results demonstrate that DSLNet achieves accuracies of 93.70%, 89.97%, and 99.79% on the WLASL-100, WLASL-300, and LSA64 datasets, respectively, demonstrating state-of-the-art performance with significantly fewer parameters than competing models.

Takeaways, Limitations

Takeaways:
A novel approach to modeling hand shape and movement trajectories separately is presented.
Robust performance against viewpoint changes by utilizing a dual reference coordinate system
Effectively utilize topology-aware graph convolution and Finsler geometry-based encoders.
Achieving cutting-edge performance with fewer parameters than existing models
Limitations:
Further research is needed on the generalization performance of the proposed model.
Applicability to various sign languages and datasets needs to be verified.
Need to evaluate real-time processing performance
👍