Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations

Created by
  • Haebom

Author

Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu

Outline

To overcome the limitations of existing spoken language models, this paper proposes UniCodec, a unified token that encompasses both linguistic and sublinguistic information. UniCodec aims to capture the full meaning of speech, generating natural and expressive speech. It utilizes a low-bitrate neural codec to learn discrete representations that separate meaning at global and local scales. Experiments on various language datasets demonstrate the effectiveness of UniCodec.

Takeaways, Limitations

Takeaways:
We suggest the possibility of improving the performance of spoken language models by integrating linguistic and paralinguistic information.
We have developed a technology that can contribute to natural speech generation and preservation of paralinguistic properties.
We validated the effectiveness of our technology using various language datasets.
Limitations:
The paper lacks detailed descriptions of specific technical details (e.g. model architecture, learning method, etc.).
Comparative analysis results with other existing models may not have been sufficiently presented.
There may be a lack of discussion of potential issues that may arise in practical use (e.g., computational costs, data requirements, etc.).
👍