Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

On the Consistency of GNN Explanations for Malware Detection

Created by
  • Haebom

Author

Hossein Shokouhinejad, Griffin Higgins, Roozbeh Razavi-Far, Hesamodin Mohammadian, Ali A. Ghorbani

Outline

This paper proposes a novel framework for malware detection based on control flow graphs (CFGs). We embed CFG node features using a hybrid approach combining rule-based encoding and autoencoder-based embedding, and use a GNN-based classifier to detect malicious behavior. To enhance model interpretability, we apply GNNExplainer, PGExplainer, and CaptumExplainer (using Integrated Gradients, Guided Backpropagation, and Saliency). We also enhance the quality of explanations using a novel aggregation method, RankFusion. We also propose a novel subgraph extraction strategy called Greedy Edge-wise Composition (GEC), and we validate the effectiveness of the proposed framework through comprehensive evaluations using accuracy, fidelity, and consistency metrics.

Takeaways, Limitations

Takeaways:
Improving CFG-based malware detection performance through a hybrid approach combining rule-based and learning-based embeddings.
Enhanced explainability using GNNExplainer, PGExplainer, CaptumExplainer, and RankFusion.
Improving the structural consistency of explanations through a new subgraph extraction strategy, GEC.
Validation of the effectiveness of the proposed framework through rigorous evaluation using accuracy, fidelity, and consistency metrics.
Limitations:
Potential performance bias for specific malware types.
Further research is needed to determine the generalizability of the new explanatory technique.
Consideration needs to be given to the computational complexity of GEC.
Possibility of poor generalization performance due to limitations of the dataset used.
👍