Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Created by
  • Haebom

Author

Jobst Heitzig, Ram Potham

Outline

This paper explores the concept of "power," a key concept in AI safety. It addresses the pursuit of power as a goal in AI, the sudden or gradual loss of human power, and the balance of power in human-AI interactions and international AI governance. Simultaneously, power, as the ability to pursue multiple goals, is essential to human well-being. This paper explores the idea of promoting both safety and well-being by enabling AI agents to explicitly enhance human power and manage the power balance between humans and AI agents in a desirable manner. Using a principled and partially axiomatic approach, we design a parameterizable and decomposable objective function that represents human power inequality and risk-averse long-term aggregation. This objective function takes into account bounded human rationality and social norms, and importantly, diverse human goals. We derive an algorithm for computing this metric via backward induction or a form of multi-agent reinforcement learning from a given world model. We illustrate the results of (smoothly) maximizing this metric in various situations and explain what instrumental subgoals it entails. Careful evaluation suggests that gently maximizing an appropriate aggregate measure of human power may constitute a more beneficial goal for safe agent AI systems than a direct utility-based goal.

Takeaways, Limitations

Takeaways:
Setting the goal of AI to enhance human power presents a new approach that can simultaneously promote AI safety and human well-being.
It provides a more realistic AI goal function design method that takes into account various human goals, limited rationality, and social norms.
We present an algorithm for computing the proposed objective function via backward induction or multi-agent reinforcement learning.
It suggests the possibility of designing AI systems that are safer than direct utility maximization.
Limitations:
There is a lack of clear definitions and objective metrics for quantifying and measuring 'human power'.
There is a lack of specific explanation on how to set parameters and optimize the proposed objective function.
There is a lack of experimental verification of its practical applicability and effectiveness in various situations.
Further analysis of the specific definition of 'soft' maximization and its effects is needed.
Sufficient consideration must be given to long-term safety and unpredictability.
👍