Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Learning in Repeated Multi-Objective Stackelberg Games with Payoff Manipulation

Created by
  • Haebom

Author

Phurinut Srisawad, Juergen Branke, Long Tran-Thanh

Outline

This paper studies reward manipulation, a strategy by which a leader can strategically influence a follower's optimal deterministic response, for example, by sharing their own rewards, in an iterated multi-objective Stackelberg game. The follower's utility function (representing their preferences for multiple objectives) is assumed to be linear, though unknown, and its weighting parameters must be inferred through interactions. This presents the leader with a sequential decision-making task: balancing immediate utility maximization with preference induction. This paper formalizes this problem and proposes a manipulation policy based on expected utility (EU) and long-term expected utility (longEU). This strategy guides the leader in selecting actions and providing incentives by balancing short-term gains with long-term impact. We demonstrate that longEU converges to optimal manipulation under infinitely repeated interactions. Empirical results in a baseline environment demonstrate that our approach enhances cumulative leader utility while promoting mutually beneficial outcomes, even without explicit negotiation or prior knowledge of the follower's utility function.

Takeaways, Limitations

Takeaways:
A novel approach to the reward manipulation problem in multi-objective Stackelberg games.
Proof of the possibility of effective reward manipulation without prior knowledge of the followers' utility functions.
Proposals for manipulation policies based on expected utility (EU) and long-term expected utility (longEU) and their effectiveness verification.
Prove that long-term interactions converge to optimal operation
Presenting a compensation manipulation strategy that promotes mutually beneficial outcomes.
Limitations:
Assuming that the follower's utility function is linear
Assuming infinite repetitive interactions (in reality, finite interactions)
Lack of consideration for different types of follower behavior (e.g., irrational behavior)
Further research is needed for real-world applications.
👍