Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Highly Imbalanced Regression with Tabular Data in SEP and Other Applications

Created by
  • Haebom

Author

Josias K. Moukpe, Philip K. Chan, Ming Zhang

Outline

This paper addresses the highly imbalanced regression problem in tabular data with imbalanced proportions exceeding 1,000. Accurately estimating the target value of rare instances is crucial for applications such as predicting the intensity of rare and hazardous solar energetic particle (SEP) events. The conventional MSE loss function does not account for the correlation between predicted and actual values, the typical inverse importance function only allows convex functions, and uniform sampling can generate mini-batches devoid of rare instances. Therefore, this paper proposes CISIR, which integrates correlation, monotonically decreasing involution (MDI) importance, and hierarchical sampling. Experimental results on five datasets demonstrate that CISIR achieves lower error rates and higher correlation than other recent methods, and that adding a correlation component to other state-of-the-art methods can improve their performance. Finally, MDI importance outperforms other importance functions. The source code can be found at https://github.com/Machine-Earning/CISIR .

Takeaways, Limitations

Takeaways:
Proposing a CISIR algorithm that is effective for highly imbalanced regression problems.
Complementing the Limitations of the MSE loss function and improving accuracy by taking correlation into account.
We validate the superiority of the MDI importance function and suggest potential performance improvements over other methods.
Increased reproducibility and usability through open source code
Limitations:
Experimental results using a limited number of datasets (5)
Generalizability testing is needed for various imbalance ratios and data characteristics.
Further comparative analysis with other advanced regression techniques is needed.
👍