When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels
μμ±μ
Haebom
μΉ΄ν κ³ λ¦¬
Empty
μ μ
Sushant Gautam, Finn Schwall, Annika Willoch Olstad, Fernando Vallecillos Ruiz, Birk Torpmann-Hagen, Sunniva Maria Stordal Bj{\o}rklund, Leon Moonen, Klas Pettersen, Michael A. Riegler