Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Combining Cost-Constrained Runtime Monitors for AI Safety

Created by
  • Haebom

Author

Tim Tian Hua, James Baskerville, Henri Lemoine, Mia Hopman, Aryan Bhatt, Tyler Tracy

Outline

This paper explores how to efficiently combine multiple runtime monitors into a single monitoring protocol. The goal is to maximize the probability of applying safety measures (recall) for misaligned output. Because executing monitors and applying safety measures incur costs, we must adhere to the average-case budget constraint. We develop an algorithm that finds the optimal protocol by considering the performance and cost of existing monitors. This algorithm performs an exhaustive search to determine when and which monitors to invoke, and assigns safety measures based on the Neyman-Pearson lemma. By focusing on likelihood ratios and strategically trading off the costs of monitors and measures, we more than double the recall compared to baselines in a code review setting. We also demonstrate that combining two monitors yields a Pareto improvement over using a single monitor. This framework provides a principled methodology for combining existing monitors to detect undesirable behavior in cost-sensitive environments.

Takeaways, Limitations

Takeaways:
We present a novel algorithm that efficiently integrates multiple runtime monitors to maximize the recall of safety measures.
We provide a principled methodology for designing optimal monitoring protocols under cost constraints by leveraging the Neyman-Pearson lemma.
We experimentally demonstrate performance improvements over existing methods in code review settings.
It shows that combining multiple monitors can achieve better performance than using a single monitor.
Limitations:
Since the algorithm uses exhaustive search, the computational complexity may increase as the number of monitors increases.
Since we only evaluated performance in a code review setting, further research is needed to determine its generalizability to other application areas.
Since the performance and cost of the monitor are assumed to be given values, further research is needed to account for the uncertainty in these values.
👍