Abstract

Features analysis is an important task which can significantly affect the performance of automatic bacteria colony picking. Unstructured environments also affect the automatic colony screening. This paper presents a novel approach for adaptive colony segmentation in unstructured environments by treating the detected peaks of intensity histograms as a morphological feature of images. In order to avoid disturbing peaks, an entropy based mean shift filter is introduced to smooth images as a preprocessing step. The relevance and importance of these features can be determined in an improved support vector machine classifier using unascertained least square estimation. Experimental results show that the proposed unascertained least square support vector machine (ULSSVM) has better recognition accuracy than the other state-of-the-art techniques, and its training process takes less time than most of the traditional approaches presented in this paper.

1. Introduction

Bacteria colony isolation [1] is a labor intensive task over the past decades. Manual bacteria colony picking is tedious and experience dependent. Colony screening is in unstructured environments due to different agar mediums and cultivations. Figure 1 is an example of Erythrosin bacteria colony sitting on agar. An automatic colony picking system can be used to make this process consistent and reliable with less time consumption. Researchers worldwide are currently seeking fast and reliable methods for high throughput colony picking. To achieve this, we need to make sure high quality colony illumination and image segmentation are the critical stages of a colony picking system. Currently, there are three major illumination techniques used for image acquisition: drop-in bright-field illumination, back-projective bright-field illumination, and suspended dark-field illumination. Figure 2 shows the different imaging quality based on the three techniques introduced above. Suspended dark-filed illumination based approaches can be used to reduce the influence of lights. In this circumstance, colony agar plates are placed in a suspended dark-field environment. Using reflected and refractive lights, we can obtain volumetric structures of colonies with good image quality. However, the suspended dark-filed illumination based approaches could achieve less satisfactory quality of images than the other two approaches, due to a similar color caused by the crowdedness of colonies. Image segmentation approaches such as thresholding [2], region growing [3], watershed [4], and mean shift [5] are commonly used in medical image analysis. Each of these classical methods has its own strengths and weaknesses. For example, a thresholding method is fast but requires the systematic parameters to be changed for different environments. Region growing methods are more robust than the thresholding methods but lack sufficient efficiency. The colony picking systems available on the market have a number of specific requirements in order to achieve good segmentation performance, for example, setting the region of interests, controlling the extent of cluttering, and maintaining appropriate light conditions.

In this paper, we deploy an intensity histogram based morphological features extraction algorithm, which contributes to colony analysis. The proposed method employs a peak-searching method in a standard intensity histogram. Afterwards, to achieve correct colony feature classification, we propose an entropy based mean shift algorithm to smooth the image as a preprocess stage. Finally, we introduce an improved approach for feature selection using an unascertained least square support vector machine (ULSSVM) classifier. To our knowledge, this is the first attempt to use the unascertained attributes of the detected features for the purpose of classification. We evaluate the proposed approach on a large dataset of colony images. Based on these experiments, we show that our approach can efficiently deliver correct classification results.

2. Proposed Approach

In the smoothing process of image, it is very important for a nonlinear filter to keep the fringe detail of an image during the process. Optimization techniques have been popularly used in image processing. This is driven by the performance need as the target of an application. However, it is very difficult to obtain an optimal solution (stopping criterion) for individual applications. More details can be found in [69]. Mean shift has proven to be appliance effective tool for image processing because of its nonparametric property. Smoothing by mean shift algorithm has been reported in the literature. For example, in [10], Zhao and Xi introduced mean shift as a smooth filter for processing YIQ color images and compared it with Wiener filter. In [11], Han and Sohn used mean shift combined with a sigma filter in an illumination and color compensation system. In literature [12], Sahba and Venetsanopoulos applied mean shift to reserve fringe detail and detect breast mass. These results are promising but the computational speed of mean shift is unexpectedly slow.

Entropy is a measure of complexity. We can also use entropy to inspect system uncertainty. Low entropy images have very little contrast and large runs of pixels with the same value. An image that is perfectly flat will have entropy of zero. Consequently, they can be compressed to a relatively small size. On the other hand, high entropy images have a great deal of contrast from one pixel to the next and consequently have more details than low entropy images. Entropy has been applied in pattern recognition, object tracking, and image segmentation, for example, [1315], where entropy has been used as a termination criterion. As mentioned in the first section, the proposed feature based colony classification approach begins with entropy based mean shift filter, and it is followed by applying intensity histogram analysis to the filtered images. The characteristic peaks’ coefficients retrieved from intensity histograms are then applied to colony classification within the framework of unascertained least square support vector machine.

2.1. Entropy Based Mean Shift Filter

The following is the idea of a standard mean shift approach [16]. Let be a numerical sample of in a -dimensional space. The basic mean shift is defined as where is a window based on the center and radius . is the sample set number in . is the relative offset of center .

Equation (1) is a monotonic form and less effective in a practical application. A Kernel based mean shift algorithm is described as follows: where is the self-impact factor and is a kernel function.

In a color image with pixels, each pixel corresponds to a 5-dimension vector (). Due to the independence of space and color information, the kernel function is obtained in where is the spatial position of a pixel; is the color information of a pixel; is a spatial window based on the center and radius ; is a color window based on the center and radius .

Let be the gray value probability of the outcome , .  is an image with ; is called the surprisal of the outcome . Entropy is defined as Entropy is determined based on the pixels distribution in an image, which is influenced by two factors: foreground and background or called noise. The uncertainty of entropy is dominated by the noise’s variance. Entropy can be used to measure the homogeneity of an image area: the more homogeneous image, the less the entropy values. In practice, when we work with images, due to the noise, entropy cannot decrease to zero but it can reach a stable value. Thus, entropy can be applied as a stopping criterion for a mean shift iteration (Algorithm 1).

Let be the input image. Let be the filtered image. Pixel   R 5.
Let ent.0 be the entropy initial value, ent.1 be the next iteration of ent.0, and ent.2 be the next value of ent.1.
Let erras be the absolute value of the difference between the first two iterations. Let edset be the
thresholding as iterations stopping criteria. Our algorithm comprises the steps listed below:
 (1) Initialize , , , , .
 (2) While , then
   (2.1) Filtering image based on mean shift. Store the image in .
    (2.2) Calculating entropy from the  and store the image in .
    (2.3) Entropy is used to calculate the absolute difference which is obtained in the previous step; .
    (2.4) Update the parameters; and .
    (2.5) Calculate the mean shift which is carried out until entropy convergence.
 (3) Store which is calculated , here    is the spatial information and   is the color range information.  

2.2. Model of Peak Searching in Intensity Histogram

Intensity histogram is an important feature of images and can be regarded as the approximate expression of a density function of image intensities. It shows the frequency of an intensity appearing in an image. An intensity histogram is described in where and represent the total numbers of rows and columns, respectively, and represents the appearance times of intensity , and is described as follows: where is the intensity value of point . As the sizes of different images may be different, to avoid the impact of image size, we normalize each image according to the following equation: where is a normalization value and represents the total amount of the pixels of an image. Figure 1 shows an example of bacteria colony and its intensity histogram using (5), (6), and (7).

In Figure 1, the entire image can be divided into two main zones: culture medium zone and colony zone, according to the contrast and density of the image. In Figure 3, there are two peaks, which represent high frequency values of the corresponding intensities. The difference between the gray levels of the two peaks is 133, and the difference between the two intensity frequencies of the two peaks is 0.39. The intensity histogram mathematical model will be introduced in the following sections.

The peaks and valleys shown in Figure 3 can be obtained using the second-order derivative. The method to find a peak or a valley is described in where refers to a positive threshold which is set according to a specific image to reduce inaccuracy because of infinitesimal disturbance. represents the tendency of the histogram curve at the point where the intensity equals . means that the curve ascends, means that the curve descends, and means that the curve is flat. Therefore, indicates the peak of the curve, and refers to the valley of the curve. In Figure 3, the peaks have been extracted according to the method described above.

Through the experiments, it is found that most of the intensity histogram curves change from double-peak to multipeak due to different bacteria colony in different illumination conditions. It is possible to obtain multiple local peaks or valleys in case the boundary of the intensity histogram is not smooth. The proposed peaks searching algorithm is shown in Algorithm 2.

(1) Obtain from the image transformed based on Meanshift algorithm, and using a second order differentiation described in (8).
(2) Find the highest peak from the entire curve.
(3) Adaptively select a defined threshold: The threshold used to obtain the global peak is determined by both the Maximum
  and the minimum   peaks (threshold = (max (peak) − min (peak))/8).
  If two peaks are detected, continue. If two peaks are found, jump to Step  6.
 Repeat
(4) Increase the threshold, from ((max (peak) − min (peak))/8) to ((max (peak) – min (peak))/4).
 Until
(5)  Stop if finding two characteristic peaks.
(6) End

3. Unascertained Least Square Support Vector Machine

Support vector machines (SVM) have been well studied in the machine learning field, which was proposed by Vapnik [17]. The performance of SVM has been verified in many applications, such as handwriting recognition [18], face recognition [19], and medical pattern matching [20]. But the training speed of SVM is too slow and this hinders its applications. Different from the classical support vector machine methods, the least squares support vector machines (LSSVM) proposed by Suykens and Vandewalle [21] were to change the form of the original convex quadratic optimization problem into a linear optimization problem and they effectively enhance the training speed. But it is hard to classify some uncertain information. Based on LSSVM and unascertained mathematical models, we propose the ULSSVM algorithm. For unascertained information, we can use unascertained number [22] and unascertained programming [23] to describe our algorithm. Please see below for a summary of these theories.

Theorem 1. , , , if function satisfies where and form order unascertained numbers and are set as . is the main reliability. is the value range. is the main reliability distribution density function. is a possible value sequence of the unascertained numbers. is a confidence value sequence of the unascertained numbers.

Theorem 2. Setting the unascertained number , , , One calls inequalities , unascertained events.

Theorem 3. One calls the following programing as an unascertained constraint programing: where is a decision vector and , are unascertained parameter vectors. is the target function. is a constraint function. ,   are confidence levels of the constraint and target function. is a credible degree of the unascertained events.

Based on the preliminary knowledge mentioned above, if the SVM training data obtains unascertained information, we can transform the unascertained information into unascertained number

The training set is defined in where , is an unascertained number, is an unascertained training point, and is an unascertained training set.

The objective function can be minimized as follows: We then define a Lagrange function as According to the KKT condition, Equation (19) can turn into the following matrix problem: where , , , , and .

We eliminate and and then get the following equations: where , where is a kernel function and satisfies the Mercer theorem.

A set of linear equations will be solved instead of a QP problem. Finally, we can obtain the following optimal classification function: where is the optimal solutions and corresponding bias . is an unascertained set.

4. Experimental Results

The proposed algorithm is evaluated on colony image databases which are captured using a Basler CCD sensor. The images are resized to be 640 × 480. The used computer is of a 3.2 G CPU running Windows 7 with a 4 G memory. The first three experiments have been carried out using the colony images to analyze the feasibility and efficiency of the proposed algorithm with MATLAB 7.2. The last experiment is carried out to demonstrate the segmentation effect and performed with Visual Studio 2010.

In image denoising, classical low-passing filters can suppress high frequency noise [24]. However, it is hard for them to preserve the edges of images due to the mixture in some frequency bands. Here, we use energy density spectrums to illustrate the outcomes of different filters [25, 26]. In Figure 4(a), this is the energy spectrum of the original image, where yellow-orange indicates the major energy of symbol “+” and this area is contaminated by the background noise (i.e., blue and green areas). Figure 4(b) shows the outcome of a low-pass filter, where only the central area of the symbol is kept but the edges of symbol “+” are mixed with the background. In Figure 4(c), based on mean shift, it is clear that the central area of the symbol is outstanding and the edges are also kept. Traditional low-pass filters have good performance on image smoothing but also affect the edge details. In Figure 5 we observe that, after 5 iterations, entropy can reach a stable value, and meanwhile the mean shift iteration automatically stops. Listeria colony entropy is different from the other two. Second row in Figure 6 is Listeria colony. This is because of the cluster colony and agar color is approximated with colony. Figure 6 shows different kinds of colony. The first column shows three different cultures: Microsporum audouinii, Listeria monocytogenes, and Cephalosporin. During the development of cultures, their biochemical reactions appear to be significantly different. As a result, the histograms of cultures in drop-in bright-filed illumination may accompany a number of noisy peaks, illustrated on the second column. Using mean shift based filtering algorithms, we can remove the irregular backgrounds and hence reduce noisy peaks. The result of using mean shift is shown on the 3rd column. Furthermore, we apply an adaptive thresholding based peak searching approach in order to detect two peaks, which indicate the features of cultures. This results in the 4th column.

The performance of mean shift filtering can be measured with mean square error (MSE): where is denoted by the filtered output image and is denoted by the original input image; Table 1 shows the performance of the three kinds of filtering approach.

After extracting the characteristic peaks, we apply the ULSSVM classifier to the data for classification. We now evaluate the performance of our ULSSVM classifier against the classical SVM [27], LSSVM [28], and fuzzy SVM [29] using 400 colony images, where 150 samples belong to class 1, 50 samples belong to class 2, 100 belong to class 3, and the remainder belong to class 4. We randomly select 300 samples as the training set and the remaining 100 samples are considered as the testing set. There are 130 samples labeled as the unascertained numbers, and half of them are set as training samples. Table 2 shows the values corresponding to the unascertained information. Table 3 shows the classification results. In Table 3, the experimental results demonstrate that the ULSSVM effectively improves the performance of classification.

We carry on screening colony using the ULSSVM classifier. Figure 7 shows the interface of colony screening. Figure 8 is adaptive colony segmentation. The segmentation outcomes of two different colony picking methods are as follows: the first row shows the colony with homogeneous medium and its segmentation results using region growing, and the next two rows show the colony with inhomogeneous medium and the corresponding segmentation results using thresholding. The first column is original images, the second column is image process based on our approach, the third column is the identification results, and the fourth column is the local zooming of the screening. Meanwhile, we calculate the time consumption of the colony screen. The process of using the thresholding method took 2.57 s and the process of using the region growing method took 8.61 seconds.

5. Conclusions

In this paper, we have deployed an approach to perform adaptive colony segmentation in unstructured environments using feature extraction and selection in an intelligence classifier. We used the intensity histogram peaks as features. To properly determine the importance of the extracted features for colony classification, we used an unascertained theory based LSSVM classification algorithm. Experimental results show that this new approach had better performance than other state-of-the-art techniques in terms of accuracy and speed. This approach works well for adaptive colony segmentation, whilst optimizing the time consumption of colony picking.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the National Key Scientific Instrument and Equipment Development Projects (2012YQ15008703 and 2012YQ15008702).