Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Signal-Based Malware Classification Using 1D CNNs

Created by
  • Haebom

Author

Jack Wilkie, Hanan Hindy, Ivan Andonovic, Christos Tachtatzis, Robert Atkinson

Outline

This paper proposes a novel method for classifying malware by converting malware binaries into 1D signals to overcome the limitations of existing static and dynamic analysis methods. While existing 2D image conversion methods suffer from information loss due to quantization noise and the introduction of 2D dependencies, our 1D signal conversion method addresses these issues. We apply a conventional 2D CNN architecture to 1D signal classification, and develop a custom 1D CNN based on the ResNet architecture and squeeze-and-excitation layers. We evaluate the proposed method on the MalNet dataset. As a result, we achieve state-of-the-art F1 scores of 0.874, 0.503, and 0.507 for binary, type, and series-level classification, respectively.

Takeaways, Limitations

Takeaways:
We solved the information loss problem of existing 2D image conversion methods by converting malware binaries into 1D signals.
We demonstrated the applicability of existing 2D CNN architectures to 1D signal classification.
We achieved state-of-the-art performance on the MalNet dataset using a custom 1D CNN.
We present a new malware classification modality, opening up possibilities for future research.
Limitations:
Only performance evaluations were conducted on the MalNet dataset, and further research is needed to determine generalization performance on other datasets.
There is a lack of in-depth analysis of the feature extraction and classification performance of 1D signal transformation methods.
Additional research is needed to apply this method to real-world malware detection systems (e.g., real-time processing performance, scalability, etc.).
👍