Aided Selection of Sampling Methods for Imbalanced Data Classification

Published in " In 8th ACM IKDD CODS and 26th COMAD."
Deep Sahni , Satya Jayadev Pappu , Nirav Bhatt

Building an effective classifier for imbalanced data is a challenging task as most of classifier work on the assumption of balanced data. Therefore, several sampling methods have been devised to bridge this gap by re-sampling the imbalanced datasets. Although sampling methods are in abundance, there is no single method that is best suitable for all kinds of datasets and applications. Building classifiers for all the sampling methods and comparing the results using appropriate performance metrics is computationally inefficient. In this work, we propose a framework to find a relation between datasets and sampling methods via a set of meta-features that characterizes the distribution of data. Also, we take into account the effect of probability threshold on the choice of sampling methods. The main objective of this work is to develop an approach that aids the selection of one or more sampling methods together with a probability threshold to be used for building a suitable classifier for a given dataset. It is based on mapping functions learned between classifier performance and datasets after re-sampling. In this work, extensive experiments are performed to validate the framework using synthetic as well as KEEL benchmark datasets.