Getting Started

Background

Class-imbalance (also known as the long-tail problem) is the fact that the classes are not represented equally in a classification problem, which is quite common in practice. For instance, fraud detection, prediction of rare adverse drug reactions and prediction gene families. Failure to account for the class imbalance often causes inaccurate and decreased predictive performance of many classification algorithms.

Imbalanced learning (IL) aims to tackle the class imbalance problem to learn an unbiased model from imbalanced data. This is usually achieved by changing the training data distribution by resampling or reweighting. However, naive resampling or reweighting may introduce bias/variance to the training data, especially when the data has class-overlapping or contains noise.

Ensemble imbalanced learning (EIL) is known to effectively improve typical IL solutions by combining the outputs of multiple classifiers, thereby reducing the variance introduce by resampling/reweighting.

About duplebalance

Learning a classifier from skewed datasets is an important and common problem in machine learning. Unfortunately, in practice, existing methods often suffer from unsatisfactory performance, high computational cost, or lack of adaptability. They ignored that there are two kinds of imbalance that both need to be considered: the difference in quantity between examples from different classes as well as between easy and hard examples within a single class. To this end, we present DupleBalance, a general ensemble learning approach that takes both inter-class and intra-class imbalance into account. Specifically, our method achieves inter-class balancing via progressive hybrid sampling and intra-class balancing by computing and harmonizing the prediction error distribution. We stress that the proposed framework is computationally efficient as it involves no distance computing and training data expansion, which are widely used by most of the existing methods. Extensive experiments over synthetic and real-world datasets demonstrate the effectiveness of DupleBalance. Code, documentation, and examples are available at github.com/NeurIPS2021AnonSub/duplebalance.