This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper proposes Adversarial Trigger Learning with Augmented Objectives (ATLA) to overcome the limitations of existing adversarial trigger learning methods. ATLA improves the negative log-likelihood loss function to a weighted loss function, ensuring that learned adversarial triggers are more optimized for response-format tokens. This allows ATLA to learn adversarial triggers with just a single question-response pair, and the learned triggers generalize well to other similar queries. Furthermore, we improve trigger optimization by adding an auxiliary loss function that suppresses evasive responses. Experimental results show that ATLA outperforms existing state-of-the-art techniques, achieving a nearly 100% success rate while requiring 80% fewer queries. The learned adversarial triggers also exhibit high generalization performance, generalizing well to new queries and LLMs. The source code is available ( https://github.com/QData/ALTA_Augmented_Adversarial_Trigger_Learning ).