Classification on imbalanced data sets, taking advantage of errors to improve performance

Jair Cervantes Canales; Farid García Lamont; ASDRUBAL LOPEZ CHAU

Please use this identifier to cite or link to this item: http://ri.uaemex.mx/handle20.500.11799/41187

Title:	Classification on imbalanced data sets, taking advantage of errors to improve performance
Keywords:	Imbalanced;Classification;Synthetic instances;info:eu-repo/classification/cti/7
Publisher:	Springer
Project:	10.1007/978-3-319-22053-6_8;
Description:	Classification methods usually exhibit a poor performance when they are applied on imbalanced data sets. In order to overcome this problem, some algorithms have been proposed in the last decade. Most of them generate synthetic instances in order to balance data sets, regardless the classification algorithm. These methods work reasonably well in most cases; however, they tend to cause over-fitting. In this paper, we propose a method to face the imbalance problem. Our approach, which is very simple to implement, works in two phases; the first one detects instances that are difficult to predict correctly for classification methods. These instances are then categorized into “noisy” and “secure”, where the former refers to those instances whose most of their nearest neighbors belong to the opposite class. The second phase of our method, consists in generating a number of synthetic instances for each one of those that are difficult to predict correctly. After applying our method to data sets, the AUC area of classifiers is improved dramatically. We compare our method with others of the state-of-the-art, using more than 10 data sets.
Other Identifiers:	http://hdl.handle.net/20.500.11799/41187
Rights:	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-nd/4.0
Appears in Collections:	Producción

Show full item record

Google Scholar^TM

Check

DSpace CRIS

Es una versión "extendida" de DSpace, con un modelo de datos potente y flexible para describir no sólo las publicaciones, sino también todas las entidades del entorno de investigación y sus enlaces significativos.

Creado en 2009 en la Universidad de Hong Kong

Google Scholar^TM

DSpace CRIS

Es una versión "extendida" de DSpace, con un modelo de datos potente y flexible para describir no sólo las publicaciones, sino también todas las entidades del entorno de investigación y sus enlaces significativos.Creado en 2009 en la Universidad de Hong Kong

Google ScholarTM

Es una versión "extendida" de DSpace, con un modelo de datos potente y flexible para describir no sólo las publicaciones, sino también todas las entidades del entorno de investigación y sus enlaces significativos.

Creado en 2009 en la Universidad de Hong Kong

Google Scholar^TM