Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Avoiding Catastrophe in Online Learning by Asking for Help

Created by
  • Haebom

Author

Benjamin Plaut, Hanlin Zhu, Stuart Russell

Outline

This paper points out the limitation of existing online learning algorithms that assume that all mistakes can be recovered, and proposes a new online learning problem that considers the case where some mistakes are irreversible 'fatal' mistakes. We define the reward of each round as the 'catastrophe avoidance probability', and aim to maximize the product of the catastrophe avoidance probabilities (total catastrophe avoidance probability) within a limited number of mentor queries. We allow knowledge transfer between similar inputs, and prove that in general, the mentor query rate is linear or the catastrophe occurrence probability is close to 1. However, in a standard online model, we present an algorithm in which the mentor query rate and regret converge to 0 as the time horizon increases in an environment where the mentor policy class is learnable. Although we focus on the product of rewards, we also present a bound on general additive regret. In essence, we show that if a policy class is learnable in the absence of fatal risk, it is learnable even in the presence of fatal risk if it can receive help from a mentor.

Takeaways, Limitations

Takeaways: A new online learning framework considering the possibility of fatal mistakes is presented. The possibility of learning through mentoring is demonstrated. An efficient learning strategy is presented even under limited mentoring. A warning is presented about both additive and multiplicative regret.
Limitations: Need to assume that the mentor policy class can be learned in a standard online model. Need to further study whether the mentor's query count limit is applicable to real situations. Need to evaluate performance in real applications. Lack of consideration for the mentor's response accuracy.
👍