Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

MIRROR: Modular Internal Processing for Personalized Safety in LLM Dialogue

Created by
  • Haebom

Author

Nicole Hsing

Outline

MIRROR is a modular architecture that maintains the user's safety-related context in personalized multi-turn conversations, suppresses flattery tendencies, and prevents harmful recommendations while prioritizing user safety. Inspired by dual-process theory, it consists of an immediate response generation (Talker) and asynchronous deliberative processing (Thinker). On the CuRaTe safety benchmark, MIRROR achieved a 21% relative improvement over various models, with the open-source model outperforming the commercial model.

Takeaways, Limitations

Takeaways:
We present a modular architecture that is effective in reducing harmful recommendations in personalized conversations.
Improving the security of open source models to bridge the gap with commercial models.
Improving AI accessibility at low cost and in a safe manner
Modular architecture design for flexible deployment
Limitations:
The specific Limitations is not directly mentioned in the paper.
👍