Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Utility-Learning Tension in Self-Modifying Agents

Created by
  • Haebom

Author

Charles L. Wang, Keir Dorchen, Peter Jin

Abstract

Considering the future direction of AI systems, we hypothesize that agents can self-improvement across all aspects of their design. We formalize this through a five-axis decomposition and decision hierarchy, separating incentives and learning behaviors to analyze each axis individually. Our main result identifies and introduces a structural conflict known as the utility-learning tension. This tension arises from the fact that utility-based changes that improve immediate or expected performance can weaken the statistical prerequisites for reliable learning and generalization. This study demonstrates that distribution-independent guarantees are preserved only when the model family capable of achieving a policy is uniformly capacity-bounded. When capacity can grow indefinitely, utility-rational self-improvement can render learnable tasks unlearnable. Under general assumptions, these axes collapse along the same capacity criterion, forming a single boundary for secure self-correction. We validate this theory through numerical experiments across multiple axes, comparing a destructive utility policy with two gated policies used to maintain learnability.

Takeaways, Limitations

Takeaways:
The utility-learning tension: Immediate performance gains can hinder learnability.
Capacity constraints: A limit on the model's capacity is needed to maintain trainability.
Safe Self-Correction: Presents a single boundary for safe self-correction.
Two-gate policy: We propose a policy that preserves the possibility of learning.
Limitations:
There may be a lack of explanation of specific implementation methods or algorithms.
Experimental results may be limited to specific conditions and may not guarantee the same results in general environments.
The complexity involved in the five-axis decomposition and decision-making hierarchy can make it difficult to understand.
👍