Segmentation of images by color features: A survey

Image segmentation is an important stage for object recognition. Many methods have been proposed in the last few years for grayscale and color images. In this paper, we present a deep review of the state of the art on color image segmentation methods; through this paper, we explain the techniques based on edge detection, thresholding, histogram-thresholding, region, feature clustering and neural networks. Because color spaces play a key role in the methods reviewed, we also explain in detail the most commonly color spaces to represent and process colors. In addition, we present some important applications that use the methods of image segmentation reviewed. Finally, a set of metrics frequently used to evaluate quantitatively the segmented images is shown.


Introduction
Image segmentation is one of the most important object recognition stages for artificial vision systems. Image segmentation is defined as the union of sets that contains the pixels coordinates with an specific feature; in other words, let I s = ∪ n i =1 R i be the segmented image, such that ∩ n i =1 R i = ∅ , where n is the number of segments and R k = (i, j) ∈ N 2 | I(i, j) = δ k , being I ( i, j ) the value of the pixel located in ( i, j ) of the input image I and δ k is the kth threshold value [25] . That is, the segmentation consists on grouping the pixels according to specific features of the object to recognize; such as texture, shape, color, among others [173] . Segmentation of images by color features has been addressed or studied recently. The algorithms for color image segmentation have been developed because color features may provide relevant data about the objects within the image. These algorithms have been applied in different areas such as medicine [9,55,127,160,193] and food analysis [47,108,111] , among others [7,14,33,74,137,169,174] . Many of the techniques developed for image segmentation in gray scale have been extended for color images [34,95,114,147,158,172,197] ; however, such techniques cannot be always successfully applied, because they are designed to process mainly the intensity of the colors without considering the chromaticity. Therefore, the algorithms for color image segmentation must be developed taking into account the characteristics of color. For color processing, it is important to select an adequate mathematical representation of color, such that all the features of color can be processed independently; basically, the most important features of color are intensity and chromaticity [49] . There are different color spaces to represent color; selecting a color space depends on its characteristics, the way the color is contemplated to be processed and the nature of the method employed for color processing. For instance, the RGB color space is adequate for image displaying, but not for color processing, because the intensity is not decoupled from the chromaticity. Thus, in this paper we present the color spaces more often used in related works, the characteristics of these color spaces are described and we compare their advantages and disadvantages. An important part of the segmentation stage is the quantitative evaluation of the segmented image. So far, there have not been defined standard metrics for quantitative evaluation of color image segmentation. Therefore, in this survey we present a set of metrics for quantitative evaluation of color image segmentation; these metrics are often employed to evaluate quantitatively the segmentation of color images The rest of this paper is divided as follows: in Section 2 the most common color spaces employed for image segmentation are presented; in addition their characteristics, advantages and disadvantages are described. In Section 3 the segmentation techniques for color images of previous works are introduced and reviewed. In Section 4 a set of metrics widely used for quantitative evaluation of color image segmentation are presented. Section 5 shows some applications of color image segmentation. Finally, Section 6 closes the paper with conclusions.

Color spaces
The goal of a color space is to ease the specification of colors within a tridimensional coordinated system, or form a subspace of the system where every color is represented by a unique point. Most of the color spaces employed are oriented to hardware devices, such as monitors, printers, or applications for color manipulations, like creation of graphics for animation. The usual models oriented to hardware are RGB (red, green, blue) for monitors and video cameras; CMY (cyan, magenta, yellow) for printers, and YIQ (where Y is brightness and I and Q are chromatic components) which is the standard for television [49] .
In the literature, the color spaces for color image processing are the following: RGB, HSV (hue, saturation, value), HSI (hue, saturation, intensity), L * a * b * , L * u * v * , YUV and YCbCr. Table 9 shows the color spaces employed in different previous works to represent and process colors.
The RGB space is adequate for color displaying, for instance, it is widely employed for television systems and image acquisition; although this is space is often employed for color recognition, the RGB space is not suitable for segmentation or color processing, because of the high correlation between the components R, G and B. There are other spaces that do not have this problem, but they have their respective disadvantages. Next we present the features of the RGB, HSV, HSI, L * a * b * , L * u * v * , YUV and YCbCr color spaces; as well as, the equations to map colors between the RGB space and the color spaces mentioned.

RGB space
In this space every color is represented with the spectral components of red, green and blue. The origin of this model can be found in television technology, and it can be considered as the fundamental representation of color for computers, digital cameras and scanners; but also, for image storage. Most of the software developed for image processing and graphics employ this model. In the RGB model the combination of colors is based on the addition of the individual components considering as base the black color. The process can be considered as the combination of three rays of color red, green and blue. The intensity of the different components of color determines both the hue and the brightness of the resulting color [49] . The shape of the RGB space is a cube, whose coordinates correspond to the three basic colors: red (r), green (g) and blue (b). The values of each component are in the range [0, 255] ⊂ , where every possible color corresponds to a point within the cube; but, usually the range of values of each color component are normalized to the range [0, 1]; hence, space color is represented as the unit cube shown in Fig. 1 .
The colors red, green and blue constitute the coordinate axis; different colors are obtained by combining the values of the coordinate axis. Table 2 shows the combination values of the axis for usual colors.
The RGB space is a simple model, in several studies it is employed for color processing or when it is necessary to transform colors to a different color space.

HSV space
In the HSV space, the color is represented with the components hue (h), saturation (s) and value (v). Hue is the chromatic feature that describes a pure color; for instance, yellow, orange, red, etc.  Table 1 Color spaces employed for color image segmentation in different works.
The hue is in the range [0, 2 π ] ⊂ ; saturation is in the real range [0, 1], while value is often in the range [0, 255] ⊂ . The HSV space is cone shaped, as shown in Fig. 2 . Geometrically, the radius and the height of the cone represent the saturation and value components, respectively. Note that for colors black, white and gray, the hue parameter is undefined, due to these colors are considered as singularities within this color space; because they do not have a specific chromaticity.

HSI space
The HSI space, represented with the components hue (h), saturation (s) and intensity (i), is very similar to the HSV space. Despite the similarity, computing their components, hue, saturation and the intensity, is different with respect to how same components are obtained in the HSV space. The ranges of the components are the same than the ranges of the components of the HSV space.
That is, the hue is in the range [0, 2 π ] ⊂ ; saturation is in the real range [0, 1], while intensity is in the range [0, 255] ⊂ . The HSI space is double cone shaped, as shown in Fig. 3 . Geometrically, the radius and the height of the cone represent the saturation and intensity components, respectively [49] .
For both spaces HSV and HSI, the same hue and saturation parameters are employed to represent colors, except for brightness, where in each space the parameters are different.

CIE XYZ space
The standard model XYZ, developed by the Comisin Internationale dclairage, is the base of most of the calibrated color models employed nowadays. The calibrated color models are employed to reproduce the colors independently of the display devices. Several problems happen because there is a strong dependence between the device and the reproduction of images. All the color spaces described before are related to the physic measures of the output devices employed to display images; for instance, the configurable parameters of a laser printer. The color model is developed by several measures performed under strict conditions. The model consists of three basic colors X, Y and Z; they are selected such that, through positive components all the colors and combinations can be described. This space is perceived as no linear by humans. That is, in some parts of the space, huge color changes are produced before little position variations; while in other parts of the space happens the opposite, little color changes are experimented for large position changes [49] . Thus, variations of the CIE model have been developed for different kinds of applications, or to represent the colors such that it mimics the human perception of color. Examples of CIE variations are the spaces YUV, YCbCr, L * u * v * and L * a * b * ; in this study we address the spaces L * u * v * , L * a * b * , YUV and YCbCr because they are often employed for color image processing.
This color space is developed considering linearizing the tonality changes, where the colors are defined by three variables: L * is the intensity, a * and b * are the tonality components [49,66] . The value of a * defines the distance through red-green axis, while the value of b * defines the distance through the blue-yellow axis. Usually a * and b * are in the ranges [ −127 , 128] ⊂ and L * in the range [0, 100] ⊂ .
The shape of this space is similar to the RGB space, but the location of colors is different, see Fig. 4 .
Note that in this space, similarly to the HSV and HSI spaces, the intensity is decoupled from the chromaticity, mimicking the way   the humans perceive the colors [69] . But also, because of the tonality changes are linear, the chromaticity differences can be computed using the Euclidean distance.

CIE L
This space is similar to the CIE L * a * b * space, where the colors are represented by the intensity component L * and the chromatic components u * and v * [49] . Usually u * and v * are in the ranges [ −175 , 175] ⊂ and L * in the range [0, 100] ⊂ . Table 6 shows the parameters for luminance, and chromaticity for usual colors.

YUV and YCbCr color spaces
The YUV and YCbCr color spaces are employed to standardize the images for television. The YUV space is the base for color coding for NTSC system; while the YCbCr is the standard for digital television. In these models the components that define them feature three planes: the luminance (Y) and the other two called chrominance components, (UV and CbCr for spaces YUV and YCbCr, respectively) [49] . Table 7 shows the luminance and chrominance parameters for usual colors for the YUV space. Due to humans are not able to distinguish sharpness with high precision in colors, and also humans are more sensitive to brightness, the bandwidth can be reduced considerably for the definition of the color components. This feature is employed by the color  Table 8 shows the luminance and chrominance paramaters for usual colors for the YCbCr space.

Transformations between color spaces
As mentioned before, the hardware devices employed for image acquisition and display, employ the RGB space to represent colors. Therefore, it is necessary to map the colors to the color spaces presented above, in order to process the colors under the features of such color spaces; and then, to map the resulting colors to the RGB space so as to display the result of processing. In this section, we present the equations to map the RGB colors to the color spaces mentioned before, as well as the inverse mapping [25,49] .

Mapping between RGB and HSV spaces
Mapping a RGB color to the HSV space involves the following , otherwise The inverse operation, in other words, mapping HSV color vec-  (1) and (2), while saturation and intensity are computed with: The inverse operation, in other words, mapping HSI color vector ψ = [ h, s, i ] to the RGB space involves the following operations. If  (27) where δ = 6 29 and The inverse operation is performed with the following operations: , t ≤ δ

Mapping between RGB and L * u * v spaces
Mapping a RGB color to the L * u * v * is obtained with the following equations. Let Ï,g,b] be a color represented in the RGB space: where δ = 6 29 , the quantities Y ref , u re f and v re f are obtained by substituting the tristimulus values for the reference white. The values of X, Y and Z are computed employing Eq. (27) . The inverse operation is performed with the following operations: The r, g and b values are obtained using Eq. (33) .

Mapping between RGB and YUV spaces
The luminance Y is defined by the RGB components, where it is considered the RGB values have been already corrected by the gamma factor. The U and V components are linear factors between the luminance and the red and blue planes of the RGB model. Mapping a RGB color to the YUV space is computed with the following equation: The inverse operation, mapping a YUV color to the RGB space can be obtained with:

Mapping between RGB and YCbCr spaces
Mapping a RGB color to the YCbCr space is computed with the following equation: The inverse operation, mapping a YCbCr color to the RGB space can be obtained with:

Discussion
The quality of image segmentation by color features depends, to some extent, on the color space employed to represent colors. One of the most employed color spaces to represent colors is the RGB space, because it is widely used for color displaying devices; for instance, monitors and video cameras. The RGB images are obtained by combining three independent images, planes, each one with a basic color. Although several previous works use the RGB space to represent and process the colors, such space is not suitable to process colors because: 1. The Euclidean distance cannot be employed to compute differences between colors [125] ; that is, the color changes within the RGB space are not linear. 2. It is sensitive to illumination due to the high correlation between the components [49] ; in other words, two colors with the same chromaticity can be recognized as different if their intensities are not the same. For example, Fig. 5 shows two squares with the same chromaticity, green, but with different intensity. In the RGB space, despite the intensity difference between the squares (a) and (b) of Fig. 5 is small, both colors are identified as different.
Usually, the color images that employ the RGB space are processed by applying the processing technique to each color channel, because many of the methods developed for color image processing are extended versions of techniques for gray scale images. Thus, with such methods the colors are processed by the intensity and the chromaticity is not processed adequately. For instance, the histogram equalization is an adequate technique for the contrast improvement of an image. Because of the independence of the three images and that the histogram equalization just employs the intensity values, the obvious approach is to apply such technique to each plane independently; the resulting image is an image where the chromatic features are altered, due to the correlation between the components. On the other hand, an important feature of the HSV, HSI, L * a * b * , L * u * v * , YUV and YCbCr spaces is that the intensity is decoupled from the chromaticity. Under these color spaces, the contrast improvement of an image is obtained by applying the histogram equalization technique to the intensity channel of the image. Fig. 6 shows the resulting images by processing the original images with the histogram equalization technique for the spaces RGB, HSV, HSI, L * a * b * , L * u * v * , YUV and YCbCr.
It is easy to appreciate from the Fig. 6 that the chromaticity of several colors of the RGB images is modified; while for the images obtained using the other spaces, the chromaticity is not modified, just the brightness. The resulting images using the HSI and YCbCr spaces are brighter than the obtained using the HSV, L * a * b * , L * u * v * and YUV spaces, but none of them the chromaticity is modified.
In order to process the RGB images without separating the color channels, the colors of the pixels are represented as vectors, color vectors, as explained in Section 2.1 . The magnitude and orientation of the color vector characterizes the intensity and the chromaticity, respectively. As stated before, the RGB space is sensible to illumination; the negative effects of non-uniform illumination are reduced by normalizing the color vectors, however, the negative effects are not eliminated totally.
For instance, reference [42] presents a recognition method for Mexican banknotes, where the discriminative colors of the banknotes are selected so as to characterize each denomination more precise. The experiments are performed using the RGB and HSV spaces in order to compare which space is more suitable. According to the results reported, the highest recognition rates are obtained using the HSV space.    RGB stands for images processed using the RGB color vectors, the row RGB * stands for normalized RGB color vectors so as to reduce the intensity effects and to process only the chromaticity; HSV stands for images using the HSV space for color representation. The pixels in black are the ones that are selected as not discriminative.
The 20 pesos banknote can be recognized by the dominant color blue; hence, the pixels in blue must be selected. It is easy to appreciate that in the 20 pesos banknote image, of the row RGB, there are many pixels in black; that is, the colors of those pixels do not provide important data about the denomination, including pixels in blue, but also, there are other colors different than blue that are selected as discriminative. In the row RGB * , the resulting image is improved because the image is cleaner, the pixels are not scattered and other colors, different than blue, are selected as no discriminative; however, the effects of the intensity of the colors is not avoided because some parts in blue are selected as no discriminative. For example, the hair part of the banknote character is blue, because of the intensity difference with respect to the rest of the banknote; it is selected as different color.
In contrast, for the resulting image using the HSV space, more parts in blue are kept than in the RGB images, despite the different blue intensities. Also, Fig. 8 shows the resulting images after applying the same color processing for the 50 pesos banknote; note the similar appearances of the images with respect to the ones obtained with the 20 pesos banknote. The HSV, HSI, L * a * b * , L * u * v * , YUV and YCbCr spaces are more suitable for color processing because: They are robust before non-uniform illumination due to the chromaticity is decoupled from the intensity. The color changes within such spaces is linear, thus, the color differences can be computed with the Euclidean distance. However, in the HSV and HSI spaces, it is not possible to compute the chromaticity differences with the Euclidean distance for the tonalities whose values are almost 0 or 2 π . In other words, the tonalities whose values are h ≈ 0 or h ≈ 2 π , chromatically are very similar but numerically they are very different; therefore, if the comparison is performed using just the scalar value, the tonalities are defined as different while it is the opposite. This problem is overcome by modelling the chromaticity as a two-element unit vector, where its orientation defines the chromaticity, as proposed in [39][40][41]43,44] . That is, let φ = [ h, s, v ] be a HSV color, the chromaticity is modeled as ψ = [ cosh, sinh ] , thus, the problem mentioned before is solved. An drawback with the HSV and HSI spaces is that white, black and gray do not have a specific chromaticity; thus, these colors are considered as singularities [25,49] ; therefore, it is difficult to recognize these three colors. Table 9 resumes the advantages and disadvantages of the RGB, HSV, HSI, L * a * b * , L * u * v * , YUV and YCbCr color spaces.
The color space must be selected depending on the purpose of the color image segmentation, as we have stated before, the RGB space is not suitable for color image processing. While in the HSV, HSI, L * a * b * , L * u * v * , YUV and YCbCr spaces, the color processing is more precise, but it is important to consider their respective disadvantages, as presented in Table 9 .

Segmentation techniques for color images
Image segmentation remains one of the major challenges in computer vision. It is a critical and essential step in a pattern recognition system that aims to make high-level image analysis and understanding. Image segmentation determines the quality of the pattern recognition system. Moreover, in many cases a good image preprocessing allows to improve the quality of the segmentation. Advanced preprocessing techniques have been used in many Table 9 Advantages and disadvantages of color spaces.

Color space
Advantages Disadvantages RGB Convenient for image acquisition and Non-uniform illumination sensitive; displaying; Differences between colors is not linear HSV, HSI Based on human color perception; Non removable singularities Robust before non-uniform illumination; The chromaticity is decoupled from the intensity L * a * b * , L * u * v * Efficient in measuring small color Singularity problem as other difference; The chromaticity is decoupled from nonlinear transformations the intensity; YUV, YCbCr Efficient coding color information for Due to the linear transformation, TV signal.
correlation between the component channels exists, although not as high as the RGB space  [147,148,158,198] . The image segmentation is based on measurements taken from the image such as brightness, color, depth, pixel value in gray scale, texture. There are many papers and several surveys on monochrome image segmentation techniques. However, over the last several years the power of personal computers (PCs) in enhancing the effectiveness to solve problems in real time has allowed the use of more complex algorithms. Color image segmentation is a very important in vision systems mainly because color images can provide more information than gray level images and the power of personal computers is increasing rapidly and PCs can be used to process color images now [25] .
Many image color segmentation techniques have been proposed; they can be categorized as the following methods: edge detection, threshold, histogram-thresholding, region, feature clustering and neural network based methods. We present works that employ the methods mentioned; but also, we present related works that use other techniques.

Edge detection
Edges detectors have been used to found brightness discontinuities in images. An edge is a boundary between two pixels with significantly different brightness values. This variation usually occurs because an edge usually represents a physical boundary between two objects having different intensities. A successful edgebased segmentation is due to three key steps: Detecting edges, eliminate irrelevant edges and connecting or grouping. The general procedure is as follows: 1. The image is first smoothed using a Gaussian low-pass filter.
This preliminary step is taken to reduce the image noise. Large values of Ïill suppress much of the noise at the expense of weakening potentially relevant edges. 2. The local gradient (intensity and direction) is computed for each point in the smoothed image.
3. The algorithm obtains only wide ridges, leaving only the pixels at the top of each ridge, in a process known as no maximal suppression 4. The ridge pixels are then thresholded using two thresholds T low and T high : ridge pixels with values greater than T high are considered strong edge pixels; ridge pixels with values between T low and T high are said to be weak pixels. This process is known as hysteresis thresholding.
The principal disadvantages of edge-based segmentation are that the technique is more sensitive to noise than other techniques and the performance of these techniques is not appropriate for images in which the edges are ill defined or there are many edges. In [86] and [121] the authors proposed novel procedures of detecting meaningful discontinuities in color images. Kibria et al. [84] proposed a new measure for defining homogeneous regions that is stated in terms of visible color difference. The authors incorporate edge information by using Canny detector splitting this region for proper segmentation. In [58] the authors computed the Sobel operator on each of the three RGB planes and then sum the results to obtain the resultant edges. This is an adequate technique for edge detection when colors and objects are well defined. However, this approach would probably be inadequate for more complex color images. Carron et. al. [20] applied the Sobel operator to each component of the HSI space and the individual results were combined using a trade-off parameter between hue and intensity. An interesting feature of this trade-off parameter was its dependence on the level saturation. Other color segmentation edge-based techniques can be found in [37,50,89,176] . In [108] a system for quality control in citrus fruits was presented. In citrus manufacturing industries, caliper and color are successfully used for the automatic classification of fruits using artificial vision. The detection of flaws in the citrus surface is carried out by means of human inspection. The proposal consists on using a computer vision system capable of detecting defects in the citrus peel and also classifying the type of flaw. The segmentation of faulty zones is performed by applying the Sobel gradient to the image. Color and texture features of the flaw are extracted considering different color spaces, some of them related to high order statistics.

Threshold
Most of the image segmentation techniques found in the literature are for binary or gray image; however, there are very few segmentation threshold techniques on color images. In this subsection we show the basic threshold techniques: Global threshold, adaptive threshold, Otsu method and the most important research in this field.

Global threshold
The intensity thresholding is the most basic idea in segmentation; this technique can be thought of as an extreme form of gray level quantization. Thresholding creates a binary image b ( x, y ) from an image I ( x, y ) according to a simple criterion.
where δ is the threshold. This technique is defined as global threshold. Global threshold is a fast segmentation technique with low computational cost. In the case of color segmentation, the thresholding rule becomes:

Adaptive threshold
A major concern in global threshold technique is setting the threshold level appropriately. Usually these levels are chosen manually by trial and error. However, many image processing tasks require full automation, and there is often a need for some criterion for selecting a threshold automatically.
Sonka [153] give details of an adaptive method for automatic threshold that is summarized in Algorithm 1 . To begin, an initial guess of the threshold is made, typically by computing the mean grey level of the whole image. Then, the threshold is refined according to the Algorithm 1 . The process continues iteratively until the threshold value stops changing. At that point, the threshold has reached a best guess value. The algorithm can be used on grayscale images.

Otsu method
Otsus thresholding chooses the threshold to minimize the intraclass variance of the thresholded black and white pixels. Otsu method can be widespread in [92,126] . The Otsu method can be extended to multilevel thresholding as The sigma terms σ i of each class are obtained as follows where μ T represents the mean intensity of the input image And the summation of the probability of system is given by The optimal threshold values for color image segmentation can be achieved by Generally, the Otsu method [126] , Kapur method [78] , Tsallis entropy [13] and minimum cross entropy [96] are the best methods in thresholding based on optimizing the objective function. The goal of these methods is to find the optimal threshold in images by maximizing the between-class variance (Otsu method), by using the entropy of the histogram, Kapur and Tsallis methods, and minimizing the cross entropy between the original image and its segmented image (minimum croos entropy). These techniques can be extended to multilevel thresholding segmentation. However; the computational complexity is severely increased when extend to multilevel thresholding.
Many algorithms based on Otsu method, Kapur entropy, Tsallis entropy and minimum cross entropy are used for multilevel thresholding problems [10,36,96,128,141,178] .
In 2015, Sarkar et al. [141] proposed a novel multi-level thresholding method for unsupervised segmentation from a natural color image using the concept of the minimum cross entropy and differential evolution. The evolutive algorithm is used in order to improve the computation time and robustness.
In [36] the authors proposed a novel automatic segmentation method using saliency combined with Otsu threshold. The saliency theory is used to enhance lesion objects and then the Otsu threshold method is improved to correctly segment the images obtaining robust segmentation results and removing the spots and holes in the image.
Pare et al. [128] used three well known objective functions, Kapurs entropy, between-class variance, and Tsallis entropy, with different parameter analysis for solving the color image multilevel thresholding problem. In their experiments, the authors consider spatial contextual information of the image.
In [88] Kurban et al. presented a hybrid algorithm to solve the multilevel color image thresholding problem. The authors use several swarm based and evolutionary computational techniques in their exhaustive experiments. Kapurs entropy is used as the fitness function to be maximized. The authors argue that swarm based algorithms are much more precise for multilevel thresholding problems.
Bhandari et al. [13] presented a study of multilevel thresholding for colored satellite image segmentation. Hybrid algorithms are used in their experiments and Tsallis entropy is the fitness function to be maximized.
In [120] was proposed a new multi-thresholding algorithm for selection of optimum thresholds for segmentation of color images. The authors proposed an algorithm in five steps using the concept of A-IFS histon obtained from Atanassovs Intuitionistic Fuzzy Set (A-IFS) representation of the image. In a rough set theoretic sense, A-IFS histon and the histogram can be correlated to upper and lower approximations. The proposed algorithm based on rough sets detects the optimum threshold values as it exploits the hesitancy in determining the pixel intensities near the border and also takes into account the spatial correlation and correlation among the pixels in all of the three color components. In [19] the authors proposed an unsupervised algorithm to segmentation based on graph theory. The image is mapped into a weighted undirected graph, the pixels are considered to be as nodes, the best thresholding is obtained by objective function of maximum weighted entropy to realize unsupervised segmentation. Harrabi and Braiek [55] combined different data sources associated to the same color image to increase the information quality and to get a more reliable and accurate segmentation effect. The projected segmentation approach is conceptually different and explores a novel strategy. In fact, instead of considering only one image for every application, the method consists in combining many realizations of the identical image, together, in categorize to increase the information quality and to get a best segmented image. The segmentation method proposed in [77] is based on that in general human has attention on 3 or 4 major color objects in the image at first. In order to determine the objects, three intensity distributions are constructed by sampling them randomly and sufficiently from three R, G and B channel images. Three means are computed from the intensity distributions. This procedure is repeated to obtain three mean distribution sets. Each of these distributions comes to show normal shape based on the central limit theorem. For object segmentation, each of the normal distribution is divided into 4 sections according to the standard deviation. The sections with similar representative values are merged based on the threshold. The threshold is not chosen as constant but varies based on the difference of representative values of each section to reflect various features for different images.

Histogram-thresholding based methods
According to He and Huang [57] , thresholding is simple and most widely used method for image segmentation. Thresholding techniques can be classified into two different types: bi-level and multilevel thresholding. If the objects are clearly distinguished from the background of an image by a single threshold value, it is termed as bi-level thresholding; while dividing an image into several different segments by multiple threshold values is known as multilevel thresholding. Over the years numerous thresholding techniques have been reported in the literature [31] . Kapur et al. [78] used the entropy of the histogram to find optimal thresholds called Kapur entropy method and the technique has been widely used for image thresholding segmentation problem. Minimum cross entropy method is used to minimizing the cross entropy between the original image and its segmented image to find optimal thresholding [96] . These techniques can be easily extended to multilevel thresholding segmentation. However, the computational time will quickly increase when extend to multilevel thresholding since they exhaustively search the optimal threshold values to optimize the objective functions.
The exploitation of meta-heuristic computing algorithms has been very successful throughout the last few years. To achieve optimum multilevel threshold, many heuristic optimization techniques have been applied for solving the multilevel image segmentation tasks. Over the years, in literature, numerous works, based on swarm based systems, such as firefly algorithm [57,134] , cuckoo search algorithm [12] , differential evolution (DE), wind driven optimization (WDO), particle swam optimization (PSO) have been reported to tackle many multilevel image segmentation problems for determination of optimum threshold [12] . He and Huang [57] proposed a modified firefly algorithm to find the optimal multilevel threshold values for color images. Kapurs entropy, minimum cross entropy and between-class variance method are used as the objective functions. Rajinikanth and Couceiro [134] considered RGB histogram of the color image to solve the multi-level thresholding problem. The maximization of Otsu's between-class variance function is chosen as the objective function. The proposed segmentation procedure employs heuristic methods, such as Brownian search based Firefly Algorithm, Lvy Flight based Firefly Algorithm and conventional Firefly Algorithm. The proposed method was implemented and validated on standard color images. A novel approach for colored satellite image segmentation using cuckoo search and other optimization algorithms (DE, PSO and WDO) based on multilevel thresholding process was presented in [12] . The method is based on segmentation of subsets of bands using multilevel thresholding, followed by the fusion of resulting segmentation channels. For color images, the band subsets are chosen as RGB pairs, whose two-dimensional histograms are processed via a peak-picking algorithm to affect multilevel thresholding. In [85] multi-dimensional histograms were employed for segmentation of images of chronic wounds. In this reference was shown that color histograms of higher dimensions provide a better cue for robust separation of classes in the feature space. An important condition for the segmentation is an efficient sampling of multidimensional histograms. A multi-dimensional histogram sampling technique is proposed for generation of input featured vectors for the support vector machine classifier. Aghbari and Al-Haj [5] proposed an approach called hill-manipulation algorithm. It starts by segmenting the 3D color histogram into hills according to the number of local maxima found, and then each hill is checked against defined criteria for possible splitting into more homogeneous smaller hills. Details of an image are distinguished and the details are captured in the segmentation.

Region based methods
In these methods the pixels are grouped into larger regions based on their similarity according to predefined similarity criteria and considering the adjacency spatial relationships between pixels. Simple examples of similarity criteria might be [152] : 1. The absolute intensity difference between a candidate pixel and the seed pixel must lie within a specified range; 2. The absolute intensity difference between a candidate pixel and the running average intensity of the growing region must lie within a specified range; 3. The difference between the standard deviation in intensity over a specified local neighborhood of the candidate pixel and that over a local neighborhood of the candidate pixel must (or must not) exceed a certain threshold this is a basic roughness/smoothness criterion.
Segmentation can also be based on spatial coherence. This process normally includes two steps: Dividing or merging existing regions from the image and growing regions from seed points. For simple images, the segmentation process is clear and effective due to small pixels variations. For complex images, the utility for subsequent processing becomes very tough process. Formally, regionbased segmentation method scan be described as n regions ( R 1 , R 2 , R n ) in an image I , such that satisfies the following properties [49] : Where P ( R i ) is the logical predicate defined over the points in set R i and ∅ is the empty set. Nock and Nielsen [123] presented an approach for image segmentation by region merging following a particular order in the choice of regions. The blend of algorithmic and statistics limits the segmentation error from both the qualitative and quantitative standpoints. The approach is approximated in linear time and space, leading to fast segmentation.
Mignotte [116] estimated a segmentation map into regions from a boundary representation. The author defined a non-stationary Markov random field (MRF) model with long-range pairwise interactions whose potentials are estimated from the probability of the presence of an edge at each pair of pixels. That paper shows that an efficient and interesting strategy to complex region-based segmentation models consists in averaging soft contour maps and using the MRF reconstruction model to achieve an accurate segmentation map into regions. An algorithm based on the theory of gravity called stochastic feature based gravitational image segmentation was presented in [136] . The proposed algorithm employs color, texture and spatial data to partition the image. The algorithm is equipped with an operator called escape that is inspired by the concept of escape velocity in physics. A stochastic characteristic is incorporated to the algorithm that gives it the ability to search the image for finding the fittest pixels that are suitable for merging. Salah et al. [140] proposed a multi-region graph cut image partitioning via kernel mapping of the image data. The data of the image is transformed by the kernel function, so that the piecewise constant model of the graph cut becomes applicable; an objective function contains an original data term to evaluate the deviation of the transformed data, within each segmentation region, from the piecewise constant model. A common kernel function is employed; the energy minimization consists on iterating image partitioning by graph cut iterations. In [24] the authors proposed a color image segmentation by a simplified pulse-coupled neural network (NN). In this research, the segmentation is obtained transforming a color image into channels with low intensity by integrat-ing normalized RGB color space with opponent color space Lu et al. [110] developed a region-based color modelling method to perform the joint crop and maize tassel segmentation that mainly consists of two stages. That is, region proposals generation and color model prediction. Concretely, the efficient graph-based segmentation algorithm [38] and simple linear iterative clustering [1] are first employed to generate region proposals, which have the effect of region-smoothing and edge preserving. Next, each region proposal is passed to the neural network based ensemble models called neural network intensity color model, in order to attach semantic meaning of the crop, tassel or background to these regions. The objective of [60] is to propose an improved principal component analysis (PCA)-based multichannel selection Chan-Vese model to segment wheat leaf lesions using color features. In the proposed scheme, three channels are adaptively selected by PCA. Then, a kmeans initial segmentation is used to obtain the initial curve and label the lesions as the object region and the rest of the leaf as a background region. Lee et al. [93] presented a method that can find the lip area using its shape feature, regardless of the influences from the light and background with over 94% accuracy and over 98% precision. The method finds the face area from an input image, divides the face image in half, and applies sliding window detection to the bottom half of the image. Then, it obtains the histogram of oriented gradient (HOG) feature vector from the image that corresponds to the window, and uses it as the input to a pre-trained support vector machines (SVM). HOG and SVM are used for coarse detection. If SVM determines that the image is not the lip, sliding window detection is reapplied. Otherwise, the image is used as input to convolutional neural network (CNN), which is employed for fine detection and to determine whether the image is the lip. If CNN determines that the image is the lip, canny edge detection is applied to the image to obtain the mouth contour. Liver segmentation on non-contrast images has been achieved by using a conditional statistical shape model. However, this method still encounters difficulties when the morphology of the liver is abnormal [164] . Yamaguchi et al. [184] presented a 3D regional segmentation method for use with non-contrast abdominal CT (Computed Tomography) images based on a correlation map of locoregional histogram and probabilistic atlas to address this problem.

Feature clustering based
Clustering methods are one of the most used algorithms in image segmentation. Fuzzy c-means (FCM) algorithm is one of the most widely used fuzzy clustering algorithms in image segmentation. Conventional FCM algorithm works well on most images [171,179] . However, it fails to segment images corrupted by noise, outliers and other imaging artifacts. The K-means algorithm consists of the following iteration [46] : Fuzzy c-means algorithm [11] is an important tool for image processing in segmentation of color image. It is an iterative clustering algorithm in which a pixel can belong to more than one cluster and with each pixel a set of membership level is associated. The procedure of FCM algorithm is given as follows: Two improved FCM clustering algorithms with spatial constrains for color image segmentation were presented in [119] . The rank M-type and L-estimators were used in order to obtain spatial data of the pixels. With these estimators the local data of every color component in the RGB model is incorporated; the proposed approach is applied in the chromatic subspace in the IJK color space in order to overcome some limitations related to RGB model. Such estimators are involved into the FCM algorithm to provide robustness for the proposed segmentation techniques. In [66] was introduced a clustering algorithm that maintains coherence of data in feature space, the algorithm works under the paradigm of clustering-then-labeling. Applied on the L * a * b * color space, the image is segmented by setting each pixel with its corresponding cluster. The algorithm is based on the theory of minimum description length, which is an effective approach to select automatically the parameters for the proposed segmentation method. Tan and Isa [161] presented an approach based on histogram thresholding; this approach can be applied in pattern recognition, particularly for color image segmentation. The approach employs the histogram thresholding technique to obtain all possible uniform regions in the color image. The compactness of the clusters forming the uniform regions is improved with FCM. Guo and Sengur [51] applied neutrosophic set which studies the origin, nature and scope of neutralities. A directional -mean operation is proposed to reduce the set indeterminacy; the FCM algorithm is improved by integrating with neutrosophic set and employed to segment the color image. The membership computation and the clustering termination are redefined accordingly. In [185] the segmentation of color images was addressed as a problem of clustering texture features as multivariate mixed data. The distribution of the texture features is modeled using a mixture of Gaussian distribution. The mixture distribution is segmented with an agglomerative clustering algorithm derived from a lossy data compression approach; the algorithm employs either 2D texture filter banks or simple fixedsize windows to obtain texture features. In [170] the multi-level low-rank approximation-based spectral clustering method is proposed to segment high resolution images. The proposed method is a graph-theoretic approach, which finds natural groups in a given data set. It approximates the multi-level low-rank matrix, the approximations to the affinity matrix and its subspace, as well as those for the Laplacian matrix and the Laplacian subspace, gains computational spatial efficiency. An algorithm where bilateral filtering is employed as a kernel function to form a pixonal image was proposed in [122] . The bilateral filtering is a preprocessing step that eliminates unnecessary details of the image and results in a few numbers of pixons. Later, the computed pixonal image is segmented using FCM. Most of the reviewed works employ clusterbased methods; as mentioned before, the drawback with these methods is that the number of clusters must be defined a priori. Mignotte [115] introduced a segmentation approach based on a Markov random field fusion model that combines several segmentation results associated with simple clustering methods. The fusion model is based on the probabilistic rand measure for comparing one segmentation result to one or more manual segmentations of the same image. This non-parametric measure lets to derive an appealing fusion model of label fields expressed as a Gibbs distribution. This Gibbs energy model encodes the binary constraints set given by the segmentation results to be fused. In many cases, the development of clustering based segmentation methods are limited by the initially chosen cluster centers, and also on the cardinality of chosen cluster centers. This problem is solved by using evolutionary computing. These approaches avoid the less desirable solutions. Initialization has a significant effect on the final partitions obtained by the iterative c-means clustering approaches. The genetically guided clustering attempts to achieve both avoidance of local extrema and minimal sensitivity to initialization. Garcia-Lamont et al. [44] presented an approach to compute automatically the number of clusters so as to segment the images using FCM. A competitive neural network and a self-organizing map are trained with chromaticity samples of different colors; the neural networks process each pixel of the image to segment, where the activation occurrences of each neuron are collected in a histogram. The number of clusters is set by computing the number of the most activated neurons. The number of clusters is adjusted by comparing the similitude of colors. In [139] was introduced a multiobjetive optimization algorithm; the segmentation is addressed as a clustering problem by grouping the image features, where the multiobjetive optimization algorithm is combined with seeded region growing. The main features of an image are color, texture and gradient magnitudes, which are measured by using the local homogeneity, Gabor filter and color spaces. The seeded region growing employs the extracted feature vector to classify the pixels spatially. The optimization algorithm determines the coordinates of the seed points and similarity difference of each region by optimizing a set of cluster validity indices so as to improve the quality of segmentation. The segmentation is completed by merging small and similar regions. In reference [81] the segmentation of color images was addressed as a clustering problem and a fixed length genetic algorithm. An objective function was proposed to evaluate the quality of the segmentation and the fitness of a chromosome. A selforganizing map was used to determine the number of segments in order to set the length of a chromosome automatically. The initialization of the population was performed with and opposition based strategy.
Khan et al. [83] apply a spatial fuzzy genetic algorithm for segmentation of color images; the performance of the algorithm is influenced by the number of clusters and the initialization of the cluster centers. Rajaby et al. [133] include hue and intensity components in the objective function of FCM for color image segmentation. Each one of these components is weighted such as if they were approximately constant or if they have a high variation caused by noise, their weight is low. This method is faster than NWFCM_rgb [190] , FGFCM_rgb [18] and HTFCM [161] ; however, it needs as a parameter the number of segments to discover. Instead of analyzing one pixel each time, Ji et al. [71] replace each pixel with its corresponding image patch and assigned a weight to each pixel in the image patch. The algorithm introduced in [71] is faster than other ones; however, it suffers from the problem of requiring introducing the number of clusters as an input parameter. Dominant sets clustering based methods have been effective for image segmentation. An advantage of this is that the number of clusters can be discovered automatically. However, dominant sets algorithm is very sensitive to the similarity matrix. How et al. [59] proposed to reduce the sensitiveness to similarity measures in similarity matrix by applying histogram equalization. Because the similarity matrix occupies a large space of memory, the method proposed in [59] is applicable to small images, 120 × 80 pixels.

Neural networks based segmentation
Neural networks (NN) are a computational approach, based on a large collection of neural units also known as artificial neurons, to freely model the way a biological brain solves problems with large groups of neurons connected by axons [61,62] . Neural networks offer important advantages in machine learning tasks [32][33][34]44,[63][64][65]196,197] , its high degree of parallelism allows very fast computational times and makes them suitable for real time applications, and good robustness to disturbances. Another important advantage in the case of image segmentation is that, neural networks permit accounting for spatial information. However, in most cases the final number of segments within an image must be known beforehand and run a preliminary learning phase to train the network to recognize patterns. Several algorithms have been proposed for segmenting color images by means of neural networks. Ong et al. [125] presented for color image segmentation a two-stage hierarchical NN based on self-organizing maps (SOMs). The first stage of the network uses a two-dimensional feature map that captures the dominant colors of an image. The second stage employs a onedimensional feature map to control the number of color clusters that is used for segmentation.
These factors are overcome using a progressive technique based on self-organizing maps to find the optimal number of clusters automatically. The clusters centers are set with the weights of the neurons represented in the histogram peaks. Other works used unsupervised NN, but the NN employed are trained every time a novel image is given. That is, a NN trained with the colors of a given image cannot always recognize all the colors of a different image; hence, the NN must be trained with the colors of the new image. To overcome this drawback, in references [44] and [43] SOMs and a three-layered SOM were trained, respectively, with chromaticity samples of different colors represented in the HSV space. The NNs are trained just once and they can segment a given image without training them again. Aghajari et. al. [3] proposed a self-organizing map based extended FCM, a method that uses discrete wavelet transform (DWT), SOM and FCM. DWT is used to decompose an image into various frequencies. Feature prototypes are formed with gradient, pixel value and statistical parameters, standard deviation, mean and energy. A random selection of pixels throughout images is used as input for the SOM, to obtain codebooks. These are clustered with FCM. The clustered codebooks vector centers are used for image segmentation based on minimum distance criterion. In [56] Hassanat et al. proposed a method for segmentation of human skin color in images for detection of libs, faces, fingers and hands. The method uses color information combining HSV, YCbCr and YIQ models. Neighbors of 10 pixels chosen from the images are used to train a neural network. A problem with this approach is that the pixels need to be chosen manually during training phase. A technique for segmentation or color images based on geometrical properties of lattice auto-associative memories is described in [166] ; the lattice associative memories are a class of neural networks that store a finite set n-dimensional vectors and are able to recall them when a noisy or incomplete input vector is presented. The canonical lattice autoassociative memories include the min and max memories, defined as square matrixes. The column vectors of these matrixes are used to determine a set of extreme points whose convex hull encloses the finite set of n-dimensional vectors. Due to the color images form subsets of a finite geometrical space, the scaled column vectors of each memory correspond to saturated color pixels. Zhang et al. [192] presented an approach to separate the foreign fiber objects in a captured color image from the background accurately. The captured RGB color images are separated to R, G and B color channels, and the color information for each channel is computed. The R saliency for each pixel in the R channel, the G saliency for each pixel in the G channel, and the B saliency for each pixel in the B channel are computed respectively. The comprehensive saliency map is obtained by the weighted R, G and B saliency. The weights for the R, G, and B saliency are determined by the corresponding color information of each color channel. The foreign fiber targets are separated from the comprehensive saliency map using a threshold method.
In [183] was presented a color map segmentation method, similar to color image segmentation based on the self-organizing neural network. A SOM with two layers is employed, the input layer simulates the retina apperceiving external information, and the output layer simulates the cerebral cortex of the brain. Each node in the input layer assembles external data to each nerve cell in the output layer by weight vectors. The format of the input layer is the same as a back propagation neural network, the number of the node equals the dimension of the sample. The output layer is the competition layer; the relationship of the input layer and the output layer is entire inter-connected, each node of the output layer is connected lateral inhibitory. Stephanakis et al. [155] proposed a window-based self-organizing map, which is used a multidimensional input color vectors defined upon spatial windows in order to capture the correlation between color vectors in adjacent pixels. The window is used for capturing color components in the L * u * v * color space. The neuron featuring the smallest distance is activated during training. Neighboring nodes of the neural network are clustered according to their statistical similarity. The authors of [72] proposed an image segmentation method based on ensemble of self-organizing maps, which clusters the pixels in an image according to color and spatial features with many selforganizing maps. The feature vectors are five-dimensional feature vector whose elements are the x and y coordinates, and the R, G and B values of the corresponding pixel. These feature vectors are fed to a self-organizing map. After the training is accomplished, input vectors that are topologically close are mapped to the same class, which means the input space is divided into k classes. In [118] was presented a proposal using a fuzzy inference system in optimized color space. The system, which is designed by neuroadaptive learning technique, applies a sample image as an input and can reveal the likelihood of being a special color for each pixel through the image. The intensity of each pixel shows this likelihood in the gray level output image. After choosing threshold value, a binary image is obtained, which can be used as a mask to segment desired color input image. Khan et al. [82] presented a modified version of the fuzzy c-means algorithm that incorporates spatial information into the membership function for clustering of color images. A progressive technique based on self-organizing maps is used to automatically find the number of optimal clusters. In [156] was proposed a two stage color image segmentation method. As a first stage, clustering hierarchical or hybrid schemes in order to achieve color reduction and enhance robustness against noise is performed; 2D self-organizing map defined upon 3D color space are usually employed to render the distribution of colors of an image without taking into consideration the spatial correlation of color vectors throughout various regions of the image. Clustering color vectors pertaining to segments of an image is carried out in a consequent stage via unsupervised or supervised learning. A second stage of density-based clustering of the nodes of the selforganizing map is employed in order to facilitate the segmentation of the color image.
Ilea and Whelan [68] presented the development of an unsupervised image segmentation framework that is based on the adaptive inclusion of color and texture in the process of data partition. Also, a new formulation for the extraction of color features that evaluates the input image in a multispace color representation is given. This is achieved using the opponent characteristics of the RGB and YIQ color spaces are used, where the key component was the inclusion of the self-organizing map network in the computation of the dominant colors and the estimation of the optimal number of clusters in the image. In reference [165] was presented a method that divides color space into clusters. Competitive learning is used as a tool for clustering color space based on the least sum of squares criterion. The method is applied to various color scenes; the proposal is efficient as a color image segmentation method. Cengiz and Kse [22] determined which eye color is more perceived and adopted by which eye color using artificial neural network for 6 and 17 ages. By considering these determinations, it was studied with graphical and statistical illustrations how different eye color groups prefer colors, how much they are able to recognize primary and secondary colors, and to what extent various colors are able to perceive RGB and CMY colors correctly. In [8] was introduced a self-organizing map with variable topology. The network, is a fast convergent network capable of performing color image segmentation. The neural network reaches a high color palette variance and a better 3D RGB color space distribution of learned data from the training images than the other models. Halder et al. [54] proposed to segment a color image by semi-supervised clustering method based on modal analysis and mutational agglomeration algorithm in combination with the selforganizing map. The modal analysis and mutational agglomeration algorithm is used for initial segmentation of the images. The sampled image pixels of the segmented image are used to train the neural network.

Multi-feature fusion
Xia et al. [177] developed a method for automatic image annotation with multi-feature fusion including rotation-invariant uniform local binary patterns (LBP) histogram distribution, weighted histogram integral areas and statistics of connected regions based on multi-label learning k-nearest neighbor algorithm. Free version of part of Corel5K image data set was used for the experiments in such study. Comparisons among different dimensional features combinations were made to show that the proposed method outperformed that of traditional one with only basic color moments and texture distribution. Liu et al. [102] presented an additional color feature, namely Color Information Feature (CIF), which is incorporated with the LBP-based feature for the image classification and retrieval. The CIF compensates the difficulty of the LBP-based operator on describing color distributions.
Wang et al. [168] proposed to combine the polarization images, resulted from polarization state of each pixel, with the color images to improve the accuracy of image semantic segmentation. The combination method, more specifically, is through the HOG feature [29] and LBP [124] features that are extracted on both the polarization image and the color images independently. These features are concatenated and feed into a join boosting classifier, a feature selection base classifier known for its facility to integrate new sources of features. In the training process, the classifier randomly selects different polarization-based semantic segmentation results. In comparison, authors repeat the same algorithm, which extracts the HOG and LBP features on, however, only color images. After training another join boosting classifier, the color-based semantic segmentation results were given. The comparisons showed that the accuracy of the semantic segmentation is improved thanks to the included polarization features. Suryanto et al. [159] introduced an algorithm for object tracking in video sequences. In order to represent the object to be tracked, a new spatial color histogram model was proposed, which encodes both the color distribution and spatial distribution. Experiment results showed successful tracking of the object even when the object being tracked changes in size and shares similar color with the background. In [7] was introduced an interactive, semiautomatic segmentation method that processes the color information of each pixel as a unit, avoiding color information scattering. The process has two steps: (1) the manual selection of few sample pixels of the color to be segmented; (2) the automatic generation of the color similarity image, which is a gray level image with all the tonalities of the selected color. The color data of every pixel is integrated by a similarity function for direct color comparisons.

Fuzzy approaches
Fuzzy set theory provides a method to transform image histogram into corresponding membership functions and carry out the image segmentation. However, traditional fuzzy sets do not consider the uncertainty of membership function and membership degree, and type-2 fuzzy sets only consider the fuzziness of membership. Thus, Qin et al. [130] proposed a kind of image segmentation approach based on the cloud model, which considers the randomness and fuzziness of membership simultaneously. Kcktunc et al. [87] proposed an automatic shot-boundary detection algorithm for the videos on which various transformations are applied. In contrast to most of the existing methods, they utilize fuzzy logic approach for extracting color histogram to detect shot boundaries. The proposed method aims to detect both cuts and gradual transitions (fade, dissolve) effectively in videos where heavy transformations occur, such as cam-cording, insertions of patterns, strong re-encoding. Along with the color histogram generated with the fuzzy linking method on L * a * b * color space, the system extracts a mask for still regions and the picture-in-picture transformation for each detected shot, which will be useful in a content-based copy detection system. In [113] was addressed the unsupervised co-segmentation, which involves, usually, to optimize an energy function, which evaluates the similarity between the foreground objects in the input images. The objective is to evaluate the correspondence of foreground objects that penalizes the dissimilarity between them. The purpose is to integrate spatial information in order to avoid false detection. In addition to the integration of the spatial information and the usage of the local entropy during the histogram computing, the proposed technique employs the fuzzy local-entropy classification that allows modeling the ambiguity of a pixel membership to a histogram bin. A methodology for semantic indexing and retrieval images, based on techniques of image segmentation and classification combined with fuzzy reasoning is proposed in [151] . In the proposed knowledge-assisted analysis architecture a segmentation algorithm firstly generates a set of oversegmented regions. After that, a region classification process is employed to assign semantic labels using a confidence degree and simultaneously merge regions based on their semantic similarity.

Texture analysis-based methods
Liu et al. [103] proposed an strategy to segment the exudates in retinal fundus images that involves three stages: (1) Anatomic structure removal, in which adverse effects from the main vessels and optic disk with similar structure information to exudate regions are eliminated; (2) Exudate location in which the patches containing exudate regions are identified, in this stage the histograms of completed local binary patterns [52] are extracted to describe the texture structures of the patches, and (3) Exudate segmentation. The main goal of the iris segmentation process is to extract iris texture from surrounding structures, and remove pupil and reflections from iris texture. Therefore, Radman et al. [132] presented and iris segmentation method that accurately localizes the iris by a model designed on the basis of the HOG descriptor [29] and SVM, namely HOG-SVM. Based on this localization, iris texture is automatically extracted by means of cellular automata that evolved via the GrowCut technique [167] . Shahangian and Pourghassem [144] proposed an automatic brain hemorrhage detection and classification algorithm on computed tomography images. To achieve this purpose, after preprocessing, a modified version of distance regularized level set evolution [97] is used to detect and separate the hemorrhage regions. Then a perfect set of shape and texture features from each detected hemorrhage region are extracted. Moreover, authors define a synthetic feature that is called weighted grayscale histogram feature. In this feature, valuable information from shape, position and area of the hemorrhage region are integrated with the grayscale histogram of hemorrhage region. After that a synthetic feature selection algorithm is applied to select the most convenient features. Eventually, the segmented regions are classified into four types of the hemorrhages such as epidural hemorrhage, intracerebral hemorrhage, subdural hemorrhage and intraventricular hemorrhage by a hierarchical structure of classification. Lee et al. [91] addressed an efficient moving object segmentation method using a motion orientation histogram (MOH) in adaptively partitioned blocks. Specifically, given the motion vectors of a regularly partitioned image, each vector is classified into one of eight possible orientations in order to reduce memory space and computational load. In parallel, initial shapes of moving objects are estimated using frame differencing and LBP, which can efficiently describe image texture features [124] . Authors also partition the input image into 3232 macroblocks, and each macroblock is further divided into smaller blocks to fit the object boundary. The motion vectors of the moving objects that are detected are then analyzed using MOH. The final result is a real-time video with labels and directions of all moving objects.

Contrast enhancement
For segmentation and identification of objects and features in a scene, the information content of the image has to be enhanced for better performance. Conventional processes for contrast enhancement include gray-level transformation based practices, i.e., logarithm transformation, power-law transformation, piecewiselinear transformation, among others, and histogram based processing techniques, i.e., histogram equalization, histogram specification, to mention a few. The most popular method is histogram equalization, which is based on the assumption that a uniformly distributed grayscale histogram will have the best visual contrast [135] . Adaptive histogram equalization (AHE) [73] method computes different histogram corresponding to a definite part of the image, and employs them to reconstruct the brightness values of the image. It is thus ideal for increasing the local contrast of an image and bringing out more detail. However, AHE tends to over amplify noise in fairly homogeneous regions of an image [79] . Although the techniques of contrast enhancement perform quite well with images having a uniform spatial distribution of gray values, difficulties arise when the background has a non-distribution of brightness [49] . Raju and Nair [135] proposed a fast and efficient fuzzy logic and histogram based algorithm for enhancing low contrast color images, which enables improvement of visual quality of image as well as aid in extraction of the spatial features present in the image. This algorithm is based on two important parameters M and K, where M is the average intensity value of the image, calculated from the histogram and K is the contrast intensification parameter. Kaur and Sidhu [79] evaluated the effectiveness of histogram equalization, AHE and fuzzy enhancement techniques [135] in terms of mean square error and peak signal to noise ratio. The results showed the effectiveness of the fuzzy based enhancement technique with improved visual quality of the image. Integrating histogram segmentation with histogram bin modification to yield excellent image enhancement results is the main contribution of [162] , which successfully adopted the good parts of both histogram segmentation and histogram bin modification. The proposed bi-histogram equalization using modified histogram bins, first divides input histogram into two sub-histograms according to the median value of the image to preserve the mean brightness of the image. Thereafter, the histogram bins are altered to minimize the domination effects by high-frequency histogram bins.

Model-based methods
Model-based methods involve active contour models and the level set methods [53,99,117,[171][172][173] . The central assumption of active contour model is to start with a curve around the object to be segmented, and gradually moves the curve toward its interior and stops on the true boundary of the object, the movement is controlled by using only low-level features such as discontinuity and homogeneity. This method can partition a given image into two regions, one representing the objects to be detected, and the second one representing the background. In [172,173] the authors proposed region-based active contour models for image segmentation, the proposed model segment images with intensity inhomogeneity by introducing the local image information into the model. The methods are based on an improved numerical solution of bimodal Chan-Vese model. The models can automatically detect each object region after given an initial curve and stop computing in the circumstance that no more object regions can be left in current image layer. The authors argue that unlike multi-phase Chan-Vese model, their proposed methods discard unnecessary computations and have a lower computational complexity. Another successful improvement to active contour model is the level set, Level set methods are the ones based on active contours. In [117] , the authors introduce an energy term based on multi-layer structure and optimize each layer of energy term. In their algorithm, the authors incorporate optimal evolution layer that is used to construct final energy functional. The final segmentation results are obtained by minimizing the final energy functional based on optimal evolution layer. In [53] the authors introduce the moment competition and weakly supervised information into the energy functional construction that is adopted to drive the contour evolution. The moment can be constructed and incorporated into the energy functional to drive the evolving contour to approach the object boundary. Additionally, the method is more robust due to the integration of global statistical information and weakly supervised information. Level set methods are designed to handle the segmentation of deformable structures, which display interesting elastic behaviors, and can handle topological changes. The early level set methods depend on the gradient of the given image for stopping the evolution of the curve. Therefore, these methods can only detect objects with edges defined by the gradient. Proposed models in [172,173] are based on Chan-Vese model to detect objects whose boundaries are not necessarily detected by the gradient. Feature/detail preserving models for color image smoothing and segmentation using the Hamiltonian quaternion framework were presented in [157] . A novel quaternionic Gabor filter is first introduced; it combines the color channels and the orientations in the image plane. The filters are optimally localized in the spatial and frequency domains and provide a good approximation to quaternionic quadrature filters. Then, continuous mixtures of appropriate exponential basis function are proposed and derive analytic expressions in order to model the derived orientation information. Ref. [16] proposeed an algorithm that models the human-based perception according to Gestalt laws of similarity and proximity. The mean shift clustering is employed to translate laws into analysis of color layout of an image. In [106] was presented a segmentation method of mixture models of multivariare Chebyshec orthogonal polynomials for color image to solve the problem of over-reliance on a priori assumptions of the parametric methods for finite mixture models and the problem that monic Chebyshev orthogonal polynomials can only process the gray images. The multivariate Chebyshev orthogonal polynomials are derived by the Fourier analysis and tensor product theory, and the nonparametric mixture models of multivariate orthogonal polynomials are proposed. To resolve the problem of the estimation of the number of density mixture components, the stochastic nonparametric expectation maximum algorithm is used to estimate the orthogonal polynomial coefficient and weight of each model. Wangenheim et al. [175] evaluate a combined approach intended for reliable color image segmentation, in particular images presenting color structures with strong but continuous color or luminosity changes. The proposal combines an enhanced version of the gradient network method, with common region-growing approaches used as pre-segmentation steps. The method is a postsegmentation procedure based on graph analysis of global color and luminosity gradients in conjunction with a segmentation algorithm to produce a reliable segmentation result. In [163] is proposed a denoising concept, the method used for pre-processing the color image includes wavelet based segmentation. The wavelet transform has multi-resolution in both time domains as well as in frequency domain, so it can be used to describe the partial features for both domains. Using multi-resolution of wavelet, the nonsteady features of signals can be analyzed efficiently.

Discussion
Several of the segmentation methods are extended version of techniques developed for gray scale images. For instance, the edge detection, thresholding, histogram-thresholding and region based techniques work using the intensity of the pixels. In a RGB image, the methods are applied in each of the color channels; in the spaces where the intensity is decoupled from the chromaticity, the same techniques are applied to the intensity channel. These extended techniques not always work for all the color spaces. For example, Fig. 9 shows the resulting images after detecting the edges by computing the gradients using the Sobel mask, where the technique is applied in the color spaces mentioned. Note in the resulting images processed in the HSV, HSI, L * a * b * , YUV and YCbCr spaces, that the edges are detected; however, the edges of the images processed in the RGB and L * u * v * spaces are not well defined. Hence, this technique is not adequate for this last space. Thus, the development of techniques must be performed considering the features of the color spaces.
As mentioned previously, the clustering techniques are popular and they are often employed, mainly the FCM because they are effective and easy to implement; but they require defining a priori the number of clusters in the data. Defining the number of clusters affects the number of parts obtained when the image is segmented; but, there are related works that address how computing the number of groups for FCM [44,68,82,83] . According to the reviewed works, the color image segmentation is precise when the FCM are used along with the L * a * b * and L * u * v * spaces to represent colors. The explanation is that the FCM technique uses the Euclidean metric to compute the distance between the vectors, colors, and the center of the groups; but also, as stated in Section 2.4.1 , the Euclidean metric can be employed to compute the color changes because in the L * a * b * and L * u * v * spaces the chromaticity changes are linear.
For instance, Fig. 10 shows the resulting images by segmenting them using the FCM. The segmentation of the images obtained employing the RGB, YUV and YCbCr spaces are affected by the intensity of the colors, segmenting different parts despite having the same color, see the segmented part corresponding to the sky of the farm image. The segmented parts of the resulting images, employing the HSV, HSI, L * a * b * and L * u * v * , are more homogeneous and the intensity effects are minimal. Although in the images of the cup and the beach, using the HSV and HSI spaces, are not well defined, but is easy to observe that the intensity of the colors does not affect the segmentation processing. While using the L * a * b * and L * u * v * spaces, the segmented parts of the images are well defined, and also the effects of the colors intensity are minimal. Alongside FCM, neural networks have been, also, widely employed for color image segmentation. Most of the works employed unsupervised neural networks, principally self-organizing maps. Essentially, all the proposals using neural networks work as follows: the neural networks are trained with the colors of the given image, and then the image is processed with the already trained neural network, where the color of each pixel is set with the color of the winning neuron. The contribution of the works using neural networks, are proposals or modified versions of the training algorithm, or architectures of the neural networks. The drawback under this approach is that the neural networks must be trained every time a novel image is given. On the other hand, in references [43,44] , the self-organizing maps employed are trained to recognize different colors by the chromaticity features, where the neural networks are trained with chromaticity samples of different colors; the chromaticity is modeled using the HSV space. Under this approach, the neural networks can be employed to any given image without training them again; but also, it is not necessary, to some extent, to know the number of colors within the image to segment. The neural networks process accurately the colors depending on the function employed to compute the winning neuron. That is, if the Euclidean metric is employed, the usage of the L * a * b * and L * u * v * spaces is advisable; if the inner product is used, the RGB, HSV, HSI, YUV and YCbCr are adequate to process color by the neural networks. The other techniques mentioned in Section 3.7 have shown to be efficient for color image segmentation. According to the state of the art review, there is not a tendency to use a specific technique or to establish a technique as the best one. These methods are less common than the FCM and the neural networks, because the last methods are easy to implement and their performance is acceptable; while the other techniques may demand a huge background on mathematics and image processing. Besides, in some cases, the complexity of the algorithms may be high.

Quantitative evaluation of color image segmentation
Despite different methods have been proposed to evaluate the segmentation of color images; standard metrics for this purpose have not been defined. The evaluation methods are divided in two classes [194] : subjective (supervised) and objective (unsupervised). In the objective methods, some of them examine the impact of a segmentation method on the larger system/application, while others analyze the segmentation method independently. In the subjective methods, the segmentation results are judge by a human evaluator. Subjective evaluation scores may vary significantly from one human evaluator to another because each evaluator has their own distinct standards for assessing the quality of a segmented image. It is important to mention that these kinds of methods are the most often used [81,83,115,116,119,136] . In these methods the resulting images are compared against a manually-segmented reference image, which is often referred as ground truth. The degree of similarity between the human and computer segmented images determines the quality of the segmented image. Comparisons based on such references images are somewhat subjective because there is not guarantee that one manually-generated segmentation image is better than another. In the following sections we present the usual methods we found during the state of the art review, for both subjective and objective methods.

Subjective evaluation methods
As we have stated before, the supervised methods are the most employed for segmentation evaluation. It involves evaluating the resulting images depending on a defined ground truth. In several papers we found that the Berkeley segmentation database (BSD) often employed as benchmark. Thus, the BSD it is becoming the standard benchmark that provides ground truth images to evaluate segmented images [35] . The BSD contains 500 color images of size 481 × 321 (321 × 481) pixels; for each of these images, the database provides between four and nine human segmentations in the form of label maps. Reviewing several papers [81,83,115,116,119,136,154] , we have found that the most employed methods to evaluate the segmentation of color images are probabilistic random index (PRI), variation of information (VOI), global consistency error (GCE) and boundary displacement error (BDE). Next we present the equations employed for each method.

Probabilistic random index
The PRI, also known as rand index, compares the image obtained from the tested algorithm to a set of manually segmented images [115,136,154] . Let G = { I 1 , I m } and S be the ground truth set and the segmentation provided by the tested algorithm, respectively. L I k i is the label of pixel x i in the kth manually segmented image and L S i is the label of pixel x i in the tested segmentation. The PRI index is computed with: where n is the number of pixels, c i, j is a Boolean function: The expected value of the Bernoulli distribution of the pixel distribution p i, j is computed with: where I k ∈ G and The PRI index is in the range [0, 1], where high values indicate a large similarity between the segmented images and the ground truth.

Variation of information
The VOI index measures the sum of loss of information and the gain between two clusters belonging to the lattice of possible partitions in the following way [115,119,136] : where E and G are the entropy and the mutual information between two clusters, respectively. The entropy is computed with: where n i is the number of points belonging to the ith cluster and c is the number of clusters. The mutual information between two clusters is computed with: where c S and c I k are the number of clusters of S and I k , respectively; P (S i , I j k ) is the joint probability distribution function of clusters i and j of images S and I k , respectively; P ( S i ) and P (I j k ) are the probability density functions of clusters i and j of images S and I k , respectively. The range of VOI is [0, ∞ ); where the smaller the VOI value is, the closer the segmentation obtained and the ground truth are.

Global consistency error
The GCE computes how a segmented image is viewed as the refinement of other. A measure of error at each pixel x i can be written as [16,119] : where | · | is the cardinality, is the set difference, and R ( S, x i ) is the set of pixels corresponding to the region in segmentation S that contains the pixel x i . The measure forces all local refinements to be in the same direction, this is defined as: The range of GCE is [0, 1], the better the segmentation S with respect to the ground truth I k is when the closer GCE is to zero.

Boundary displacement error
The BDE evaluates the average displacement error of boundary pixels between two segmented images by computing the distance between the pixel and the closest pixel in the other segmentation. Given an arbitrary pixel x i of S , the BDE uses the minimal Euclidean distance from x i to all points of I k . A distance distribution signature D I k S is then obtained by adding the distances over all points of S . The BDE in computed with [119] : The range of BDE is [0, ∞ ), the lower the value is the better the segmentation is.

F evaluation
This method measures the average squared color error of the segments, penalizing over-segmentation by weighting proportional to the square root of the number of segments. It requires no userdefined parameters and is independent of the contents and type of image. The F evaluation is obtained with the following equation [17,136] : where I is the segmented image, N × M the image size, R the number of regions of the segmented image, A i the area of the ith region, and e i is the sum of the Euclidean distances between the RGB color vectors of the pixels of region i and the color vector attributed to region i in the segmented image. In other words: where I o is the original image and R i is the set of pixels in the region i . The smaller the value of F ( I ), the better the segmentation result should be.

F evaluation
The F evaluation was proposed to improve the F evaluation, because F was found to have a bias toward over-segmentation, which is the characteristic of producing many more regions than desired within a single real-world object. Since F favors segmentations with a large number of small regions, F extends F by penalizing segmentation that have many small regions of the same size. The F evaluation is computed with [17,136] : (76) where R ( A ) is the number of regions having exactly area A , and n is the number of pixels of the largest region or area in the segmented image.

Q evaluation
The Q evaluation improves upon F by decreasing the bias toward both over-segmentation and under segmentation; that is, having two few regions to represent all the real-world objects in the image. The Q evaluation is obtained with [17,136] : where R ( A i ) is defined as in F ; that is, it denotes the number of regions having exactly area A i .

E evaluation
This evaluation function is based on information theory and used for measuring the uniformity of pixels. Entropy is a measure of the disorder within a region. Given a segmented image, V k defined as the set of all possible values for the luminance in region k , and L k ( i ) denotes the number of pixels in region k that have luminance of i in the original image. The entropy of region k, E ( R k ) is defined as [136] : The expected region entropy of image I , is defined as the expected entropy across all region where each region has weight proportional to its area: (79) The final evaluation measure is given with: Therefore, E aggregates both the layout entropy and the expected region entropy:

Zeboudjs contrast
This evaluation is based on the internal and external contrast of the regions measured in the neighborhood of each pixel [136] . Let W ( p ) be the neighborhood of the pixel p and b k is the border of R k with length Lb k , the contrast inside and outside, C i and C o , respectively, and the total contrast of the region R k , denoted as Ct k are computed with: where I g is the gray level image. The Zeboudj's contrast Z is obtained with:

Applications
Most of the image segmentation methods have been developed mainly for gray scale images, where the shape and texture features of the objects within the images are extracted. Color image segmentation methods have been addressed recently; the interest for this kind of segmentation is due to the scientific or commercial areas where the color features of the objects, captured in the images, provide important data. Hence, in this section we present a summary of the different areas applying methods for color image segmentation. We used Science direct and IEEExplore search engines to retrieve publications, published only in conferences and journals, containing the term image segmentation, the search produced more than 34,0 0 0 results. Fig. 11 shows the number of publications in conferences and journals per year, from 2010 to 2016.
For the purpose of identifying the main applications of image segmentation, we searched publications related to applications of image segmentation using the search engine Scopus. Using other search engines, such as Web of Science, IEEExplore and ACM digital library produced overlapped results. We selected the publications that satisfy the following criteria: The papers analyzed are the published in journals, book chapters or conferences. We omit other types of publications such as books. Recent papers, from 2010 up to 2017, were considered in the study. This ensured that the the analysis is up-to-date Publications were selected that contain at least one of the search terms in the title, abstract and/or list  of key-words. The search terms are the shown in the following list. This ensured that the publications are related to image segmentation.
We found 5859 publications that satisfied these criteria. Finally, to ensure the relevance of publications analyzed, we considered the papers cited at least 20 times. Such consideration reduced our set of papers to 235. Fig. 12 shows the distribution of papers by subject area.
These 235 documents were thoroughly scrutinized by the authors, to identify the specific application of image segmentation. Most of the papers analyzed propose methods to improve image segmentation, and then apply them to images of a specific area.
However, some papers only tested their proposed methods with benchmark data sets, such as Berkeley or common objects in context (COCO) . Applications in the area of health and medicine are the most popular in the publications selected. The application of image segmentation for artificial vision, object detection or object identification is also important. Other field of applications is agriculture, for plant identification, illness of plants or monitoring crop growing. Table 11 summarizes the main applications of image segmentation. The related papers correspond to the more cited. where s takes values from the set 1 , 2 , k . 3: Define S k as the set of seed blocks whose affine parameter vector is closest to Ā k . Then, update the class means Ā k = n ∈ S k A n n ∈ S k 4: Repeat steps 3 and 4 until the class means Ā k do not change by more than a predefined amount between successive iterations. spaces are recommended to use for color representation. The other techniques presented in Section 3.7 are less common but they have shown acceptable performances, according to the quantitative evaluations they report. Due to some of the proposed techniques are relatively new, there are several aspects open to be improved, leading to new research trends for image segmentation by color features. It is important to mention that some of such techniques are not so popular because some of them demand huge mathematical and image processing background. We have presented metrics for quantitative evaluation of the segmented images; the evaluation methods are divided into subjective and objective. In the subjective evaluation methods, the segmented images are compared with respect to a set of images segmented by a human; for this reason they are subjective, because the criteria for segmenting the same image by different persons may vary. However, despite there are not absolute standard metrics defined, the metrics presented in this work, along with the Berkeleys image segmentation database, are becoming the benchmark to evaluate the algorithms of color image segmentation, because in different current works have been employed [51,66,81,83,115,116,119,170,175] . In the objective evaluation metrics, the segmented parts obtained within the resulting image are compared with the respective input image. These metrics have not been employed as much as the subjective ones; however, we have observed an increasing usage of such metrics [72,106,116,136] .