Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

Created by
  • Haebom

Author

Xiaonan Si, Meilin Zhu, Simeng Qin, Lijia Yu, Lijun Zhang, Shuaitong Liu, Xinfeng Li, Ranjie Duan, Yang Liu, Xiaojun Jia

SeCon-RAG: A Two-Stage Semantic Filtering and Contradiction-Free Framework for Reliable Augmented Search Generation

Outline

This paper proposes a two-stage semantic filtering and consistency framework to address the vulnerability of augmented search generation (RAG) systems that leverage external knowledge to corpus contamination and attacks. In the first stage, the entity-intent-relation extractor (EIRE) performs semantic and cluster-based filtering to evaluate the semantic relevance between user queries and filtered documents, selectively adding useful documents to the search database. In the second stage, an EIRE-based consistency filtering module analyzes the semantic consistency between the query, candidate answers, and retrieved knowledge, thereby removing internal and external contradictions that could mislead the model. Through this two-stage process, SeCon-RAG preserves useful knowledge while mitigating contamination-induced contradictions, enhancing the robustness of generation and the reliability of output.

Takeaways, Limitations

Takeaways:
Improving the reliability and integrity of the RAG system: We present a defense mechanism against corpus contamination attacks to enhance the robustness of the model.
Minimize knowledge loss: Avoid aggressive filtering methods and preserve useful information based on semantic relevance.
Leverage EIRE: Perform more sophisticated filtering by extracting entities, potential targets, and relationships.
Consistency Guarantee: Eliminates contradictions and improves answer accuracy by analyzing consistency between queries, answers, and knowledge.
Achieving SOTA performance: Demonstrating superior performance compared to existing defense methodologies across a variety of LLMs and datasets.
Limitations:
Performance dependence of EIRE: The performance of the overall framework may be affected by the accuracy of EIRE.
Computational complexity: It can be computationally expensive because it involves two stages of filtering.
Dataset dependence: It may show performance specific to a specific dataset, and generalization performance to other datasets requires further research.
Model generalizability: Further analysis is needed to determine how well the proposed method generalizes to different LLMs.
👍