To overcome the limitations of existing spoken language models, this paper proposes UniCodec, a unified token that encompasses both linguistic and sublinguistic information. UniCodec aims to capture the full meaning of speech, generating natural and expressive speech. It utilizes a low-bitrate neural codec to learn discrete representations that separate meaning at global and local scales. Experiments on various language datasets demonstrate the effectiveness of UniCodec.