haebom
Sign In
When Metrics Disagree: Automatic Similarity vs. LLM-as-a-Judge for Clinical Dialogue Evaluation
Created by
Haebom
Category
Empty
Made with Slashpage