Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Learning Temporal Invariance in Android Malware Detectors

Created by
  • Haebom

Author

Xinran Zheng, Shuo Yang, Edith CH Ngai, Suman Jana, Lorenzo Cavallaro

Outline

This paper addresses the issue of Empirical Risk Minimization (ERM)-based Android malware detectors, which suffer from performance degradation over time due to distribution shifts caused by malware variants and new families. We attribute the shortcomings of existing detectors to their inability to learn stable discriminative features. To address this, we propose a novel framework, TIF, which applies time-invariant learning theory. TIF constructs an environment based on application observation dates to reveal temporal shifts and integrates specialized multi-proxy contrastive learning and invariant gradient alignment to generate and align high-quality, stable representations. Experimental results demonstrate that TIF performs well, especially during the early deployment phase, outperforming state-of-the-art methods.

Takeaways, Limitations

Takeaways:
We provide a systematic analysis of the performance degradation problem of Android malware detectors that suffer from time-dependent distribution shift issues.
Proposal of a new detection framework, TIF, utilizing time-invariant learning theory and verification of its effectiveness.
TIF can be easily integrated into existing learning-based detectors.
It meets the requirements of real-world environments and performs particularly well in the initial deployment phase.
Limitations:
There are challenges such as lack of pre-environment labels, various moving factors, and low-quality representation generation due to various malware families.
Only experimental results for long-term datasets are presented, so further research is needed to determine generalizability to other datasets.
👍