This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Harry Mead, Clarissa Costen, Bruno Lacerda, Nick Hawes
Outline
When optimizing the conditional value-at-risk (CVaR) using policy gradients (PG), existing methods suffer from sample efficiency problems due to discarding a large number of trajectories. In this paper, we reformulate the CVaR optimization problem by limiting the total return of the trajectories used in training, and show that setting this limit appropriately makes it equivalent to the original problem. Experimental results in various environments show that this reformulation of the problem consistently improves the performance compared to the baseline. All codes are available at https://github.com/HarryMJMead/cvar-return-capping .
Takeaways: A reformulation of the CVaR optimization problem that limits the total revenue of trajectories significantly improves sample efficiency over existing methods. It shows consistent performance improvements across a wide range of environments.
•
Limitations: There may be a lack of clear guidance on setting optimal constraints for the proposed method. Optimal constraints for a particular environment may need to be determined empirically. Additional research is needed on generalization performance in a variety of environments.