Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

LogAction: Consistent Cross-system Anomaly Detection through Logs via Active Domain Adaptation

Created by
  • Haebom

Author

Chiming Duan, Minghua He, Pei Xiao, Tong Jia, Xin Zhang, Zhewei Zhong, Xiang Luo, Yan Niu, Lingzhe Zhang, Yifan Wu, Siyu Yu, Weijie Hong, Ying Li, Gang Huang

Outline

A key challenge in log-based anomaly detection is ensuring the stability and performance of software systems. Existing methods rely heavily on labeling, but labeling a large volume of logs is extremely challenging. Transfer learning and active learning-based approaches have been proposed to address this issue, but their effectiveness is limited due to differences in source and target system data distributions and the cold-start problem. In this paper, we propose LogAction, a novel log-based anomaly detection model based on active domain adaptation. LogAction integrates transfer learning and active learning techniques. It trains a base model using labeled data from mature systems to address the cold-start problem of active learning. Furthermore, it utilizes free energy-based sampling and uncertainty-based sampling to select logs on the distribution boundary for manual labeling, thereby addressing the data distribution differences in transfer learning with minimal manual labeling effort. Experimental results on six dataset combinations show that LogAction achieves an average F1 score of 93.01% with only 2% of manual labels, outperforming some state-of-the-art methods by 26.28%.

Takeaways, Limitations

Takeaways:
Achieving high performance even with limited labeled data through active domain adaptation.
Solving the cold start problem and data distribution differences by integrating transfer learning and active learning.
Demonstrated high F1 scores through experiments on six datasets.
Limitations:
There is no specific mention of Limitations in the paper.
👍