Active learning algorithms automatically identify the most informative samples from large amounts of unlabeled data and tremendously reduce human annotation effort in inducing a robust machine learning model. Real-world data often exhibit significantly skewed class distributions, where samples from one class dominate over the other. While active learning has been extensively studied, there have been limited research efforts to develop active learning algorithms specifically for class imbalance applications. In this paper, we propose a novel framework to address this research challenge. We pose the active sample selection as a constrained optimization problem and derive a linear programming relaxation to select a batch of samples. Contrary to existing algorithms, our framework is generic and is applicable to both binary and multi-class problems, where the imbalance may exist across multiple classes. Our extensive empirical studies on four vision datasets spanning three different application domains (face, facial expression and handwritten digits recognition) with varied degrees of class imbalance demonstrate the promise and potential of the method for real-world imbalanced data applications.
Supplementary material (PDF)