Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning

Created by
  • Haebom

Author

Maxence Boels, Harry Robertshaw, Thomas C Booth, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin

Outline

This paper compares the performance of imitation learning (IL) and reinforcement learning (RL) for surgical action planning, which predicts future surgical actions (instrument-verb-target triplet) in laparoscopic surgery. Using the CholecT50 dataset, we compared and evaluated imitation learning-based Dual-task Autoregressive Imitation Learning (DARIL) with three reinforcement learning variants (world-model-based RL, direct video RL, and inverse reinforcement learning-enhanced). Results show that all reinforcement learning techniques underperform imitation learning-based DARIL (e.g., world-model RL achieved 3.1% mAP after 10 seconds), and distribution matching on the expert-annotated test set tends to favor imitation learning. This finding challenges the conventional assumption of the superiority of reinforcement learning in sequential decision-making.

Takeaways, Limitations

Takeaways:
We experimentally demonstrated the superiority of imitative learning in surgical action planning.
We analyzed the cause of the performance degradation in reinforcement learning as distribution matching bias in the expert annotation dataset.
Provides important insights into the development of surgical AI.
This suggests that existing assumptions about the superiority of reinforcement learning in sequential decision-making should be reconsidered.
Limitations:
Using only one CholecT50 dataset may have limited generalizability.
Further discussion may be needed regarding the evaluation metric (mAP).
Further research is needed on various reinforcement learning algorithms and hyperparameter tuning.
A new approach is needed to overcome distribution matching bias in expert annotation datasets.
👍