Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Learning in Repeated Multi-Objective Stackelberg Games with Payoff Manipulation

Created by
  • Haebom

Author

Phurinut Srisawad, Juergen Branke, Long Tran-Thanh

Outline

We study leader reward manipulation in a repeated multi-objective Stackelberg game. Leaders can strategically influence followers' deterministic optimal responses, for example, by offering a portion of their own reward. Followers' utility functions (representing their preferences for multiple objectives) are assumed to be linear, though unknown, and their weighting parameters must be inferred through interactions. This presents the leader with a sequential decision-making task, requiring a balance between preference induction and immediate utility maximization. This study formalizes this problem and proposes a manipulation policy based on expected utility (EU) and long-term expected utility (longEU). This policy guides the leader's actions and incentive choices, allowing them to trade off short-term gains and long-term impacts. We demonstrate that longEU converges to an optimal manipulation under infinitely repeated interactions. Experimental results in a benchmark environment demonstrate that the proposed method enhances cumulative leader utility and promotes mutually beneficial outcomes, even without explicit negotiation or prior knowledge of follower utility functions.

Takeaways, Limitations

Takeaways:
We present a method by which leaders can effectively manipulate followers' behavior without prior knowledge of the followers' utility functions.
We show that manipulation policies based on expected utility and long-term expected utility can achieve optimal manipulation by taking into account both short-term benefits and long-term impacts.
The proposed method promotes mutually beneficial outcomes without explicit negotiation or prior knowledge.
We mathematically prove that under infinitely repeated interactions, the long-term expected utility converges to the optimal operation.
Limitations:
The assumption that the follower's utility function is linear does not always hold true in real-world situations.
Because it assumes infinite iteration interactions, performance may degrade in finite iteration situations.
Experimental results are limited to a specific benchmark environment and performance may vary in other environments.
The performance of followers in recognizing and responding to the leader's manipulation attempts was not considered.
👍