Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Online hierarchical partitioning of the output space in extreme multi-label data stream

Created by
  • Haebom

Author

Lara Neves, Afonso Louren\c{c}o, Alberto Cano, Goreti Marreiros

Outline

This paper proposes the iHOMER (Incremental Hierarchy of Multi-label Classifiers) framework to address the challenges of mining data streams with multi-label outputs, particularly those facing evolving distributions, high-dimensional label spaces, sparse label occurrences, complex label dependencies, and concept shifts. iHOMER is an online multi-label learning framework that incrementally partitions the label space into mutually exclusive and correlated clusters without a predefined hierarchy. It guides instance segmentation by leveraging Jaccard similarity-based online split-set clustering and a global tree-based learner driven by a multivariate Bernoulli process. It also integrates global and local movement detection mechanisms to address anomalies and enable dynamic label splitting and subtree reconstruction. Experimental results on 23 real-world datasets demonstrate that iHOMER outperforms existing state-of-the-art global and local methods by 23% and 32%, respectively.

Takeaways, Limitations

Takeaways:
We present iHOMER, an effective online learning framework for multi-label stream data.
We present a dynamic label space segmentation and model adaptation strategy robust to concept shift.
Experimentally verified superior performance compared to existing state-of-the-art techniques.
An efficient clustering and tree-based learning strategy based on Jaccard similarity and multivariate Bernoulli process is presented.
Limitations:
Lack of detailed analysis of the computational complexity of the proposed method.
Further experiments are needed to investigate the robustness of different types of concept transfer.
Potential performance degradation for certain types of data or label distributions.
Lack of evaluation of scalability and real-time processing performance required for practical applications.
👍