This paper presents a complementary step in the postprocessing process of improving grammar, punctuation, and readability in a conversation transcription pipeline by leveraging a large-scale language model (LLM). This enriches the transcripts by adding metadata tags, such as the speaker's age, gender, and sentiment. Some tags are global for the entire conversation, while others are time-varying. We present an approach that combines a fixed audio-based model, such as Whisper or WavLM, with a fixed LLAMA language model to infer speaker attributes without task-specific fine-tuning of either model. Using a lightweight, efficient connector that connects audio and linguistic representations, we achieve competitive performance on speaker profiling tasks while maintaining modularity and speed. Furthermore, we demonstrate that the fixed LLAMA model achieves an equal error rate (ER) of 8.8% in some scenarios by directly comparing x-vectors.