Weighted Nuclear Norm Minimization Based Tongue Specular Reflection Removal
In computational tongue diagnosis, specular reflection is generally inevitable in tongue image acquisition, which has adverse impact on the feature extraction and tends to degrade the diagnosis performance. In this paper, we proposed a two-stage (i.e., the detection and inpainting pipeline) approach to address this issue: (i) by considering both highlight reflection and subreflection areas, a superpixel-based segmentation method was adopted for the detection of the specular reflection areas; (ii) by extending the weighted nuclear norm minimization (WNNM) model, a nonlocal inpainting method is proposed for specular reflection removal. Experimental results on synthetic and real images show that the proposed method is accurate in detecting the specular reflection areas and is effective in restoring tongue image with more natural texture information of tongue body.
In traditional Chinese medicine (TCM), the practitioners observe the color, shape, texture, and coating characteristics of tongue to evaluate the healthy condition of a person. Because of its convenience and effectiveness, for thousands of years, traditional tongue diagnosis has been very popular in the countries of East Asia, especially in China, Korea, Japan, and so forth [1–3].
Traditional tongue diagnosis, however, is a skill that requires years of training to master, and the diagnosis result is highly dependent on the practitioners’ personal experience. That is, for a specific patient, the diagnosis results given by several practitioners may be distinctly different. In summary, the limitations of traditional tongue diagnosis greatly restrict its applications in modern medicine.
Recently, with the progress in image processing and pattern recognition, computational tongue diagnosis has received considerable interests to make tongue diagnosis more objective and automated. Many tongue diagnosis systems have been developed, as hyperspectral-based tongue diagnosis system , 3D tongue acquisition system , and so forth [6–10]. Different features of tongue, for example, shape , color [8, 12, 13], texture, coating [14, 15], and sublingual vein , have been suggested for the representation of tongue images. Many methods, such as Bayesian network , support vector machine , decision trees, and naive Bayes , have been proposed for tongue image analysis.
However, because of the saliva on the tongue body, specular reflection generally is inevitable for the existing tongue image acquisition systems . Figure 1 shows one typical semiclosed system for tongue image acquisition  and several tongue images with specular reflections. The specular reflection on the tongue image would adversely affect many tongue texture analysis results, for example, tongue fur, tongue texture, and tongue color detection. To alleviate the effect of specular reflection, one may develop some approaches to improve the robustness of the existing feature extraction and classification methods, but the natural strategy is to detect and repair the specular reflection areas (i.e., the detection and inpainting pipeline).
By far, several methods have been proposed for the detection and repairing of the specular reflection areas in tongue images. These methods, however, are limited in both detection and inpainting. In the detection stage, as shown in Figure 2, a typical specular reflection region in tongue image includes both the highlight reflections and the subreflections, while the existing detection methods usually only considered the highlight reflections [16, 20, 21] or just adopted some trivial strategies (e.g., morphological operators) to cope with the subreflections . In the repairing stage, bilinear interpolation  and total-variation- (TV-) based methods  were applied for inpainting the tongue specular reflection regions. These methods, however, only considered the local smoothness of the image and were not effective when the reflection areas are large.
In this paper, we adopted the detection and inpainting pipeline and proposed a novel method for the removal of specular reflection areas in tongue image. In the detection stage, adaptive thresholding and superpixel segmentation methods were applied for the highlight reflections and subreflections detection. Referring to the location of the initial highlight reflections obtained via thresholding, subreflections were defined as the surrounding superpixels with lower illumination than that of normal pixels of tongue body. The highlight reflections together with the subreflections were regarded as the final detection result of specular reflection areas.
In the inpainting stage, based on the homogenous property of tongue image, we proposed a nonlocal inpainting approach, that is, weighted nuclear norm minimization- (WNNM-) based tongue specular reflections removal. For a single small patch of a tongue image, there were many nonlocal similar patches across the whole image. If the patch and its nonlocal similar ones are stacked into a matrix, it was reasonable to assume that the matrix should be of low rank. Thus by solving a low rank matrix completion problem with WNNM framework, we could fill the specular reflection areas with the texture information of tongue.
This paper focused on the tongue image specular reflections removal, whose contributions are of twofold: (i) detection of highlight reflection and subreflection areas, and (ii) WNNM-based inpainting. First, we analyzed the adverse impact of subreflections, which to the best of our knowledge receive little attention in tongue specular reflection removal. To address this, we proposed a superpixel based specular reflection detection method which can effectively detect the highlight reflection together with subreflection areas, while affecting little the normal pixels around the subreflection areas. Second, we proposed a WNNM-based nonlocal inpainting method for the removal of specular reflections. The proposed method converts the inpainting problem to an unconstrained optimization problem which could be iteratively solved by using standard WNNM. Compared with other inpainting approaches, our method can obtain more natural textures information of tongue, especially for large reflection area, which could promote both the PSNR value and the visual effect of the restored tongue image.
The remainder of the paper is organized as follows: Section 2 describes the superpixel based specular reflections detection method. Section 3 presents the WNNM-based nonlocal inpainting method for specular reflection removal of tongue image. The experimental results are provided in Section 4, and finally Section 5 concludes this paper.
2. Tongue Image Specular Reflection Detection
In this section, we firstly described the characteristics of specular reflections of tongue image and then proposed our superpixel-based specular reflections detection method.
Figure 2(a) shows a typical example of tongue image with the specular reflection areas, while its enlarged subimage is shown in Figure 2(b). As shown in Figure 2(b), the pixels of highlight reflection areas always have higher illumination and less saturation values than other pixels, while the subreflections consist of many abnormal dark pixels with lower illumination around the highlight reflection areas. In , Jia et al. proposed to use morphological dilation to detect the subreflection pixels around the highlight reflection area. However, for large area of highlight reflection, because the surrounding abnormal pixels of subreflection area are not uniform, the morphological dilation often ended with inaccurate detection result by including much more normal pixels of tongue body, as shown in Figures 5(c) and 5(d).
To solve this problem, we proposed a two-stage method for the detection of specular reflection areas. First, utilizing hue-saturation-illumination (HSI) color model, an initial detection result was obtained for highlight reflection area. Then, the subreflection areas were further detected via superpixel segmentation.
2.1. Detection of Highlight Reflection Area
As for our observation, the highlight reflection areas of tongue image often have higher illumination values and less saturation values. Thus, it is natural to transform the tongue image into the HSI color space, and thus the hue and saturation were adopted to describe color. The following equations were adopted to convert the tongue image into the HSI color space:where , , , and are the three channels of RGB color space, respectively.
Figure 3(a) shows a typical tongue image with specular reflection, while Figures 3(b) and 3(c) show the channel and the channel of Figure 3(a) in the HSI color space, respectively. For each channel, a threshold was used to check whether a pixel should belong to the highlight reflection area or not. The thresholds for channel and for the channel were adaptively obtained bywhere , , and denote, respectively, the maximum, minimum, and mean value of all illumination values in a tongue image, and , , and denote, respectively, the maximum, minimum, and mean value of all saturation values in the tongue image. Finally, the highlight reflection areas image could be obtained by using the following criterion:where and denote the channel and channel of pixel , respectively. Figure 3(d) shows an example of the detection result of highlight reflection areas in a typical tongue image.
2.2. Superpixel Based Detection of Subreflection Area
For a tongue image, by applying oversegmentation, a large number of small regions, that is, superpixels, were obtained. Subreflections were defined as the superpixels that surround the highlight reflection and which had lower illumination values than that of normal pixels of tongue body. Utilizing the isotropic characteristics of superpixels [22, 23], we could naturally find the subreflections and avoid bringing in much normal pixels of tongue body.
Superpixels can be obtained by oversegmenting a tongue image with any reasonable existing segmentation algorithms. In this paper, a graph-based segmentation method  was adopted. For each tongue image, an undirected graph was defined with the pixels/regions as nodes, connected by the edges of its neighborhood. A nonnegative weight was used to edge to measure the dissimilarity of its corresponding nodes. In the beginning, each pixel was a node. The graph-based segmentation algorithm gradually merged similar regions/nodes to the same superpixel. The merging process was motivated by the internal variation of region defined aswhere is the size of region , is the minimum spanning tree of , and is a nonnegative parameter related to the size or the numbers of superpixels. For any two different regions and , there will be at least one edge between them, with its weight higher than and (if there is no edge between the two regions, the edge weight is regarded as +). Otherwise, the two regions were merged into one new region, and the internal variation of this compound region was updated. Finally, for a tongue image, we could obtain a series of superpixels. For the detail of this algorithm, please refer to .
After oversegmentation, we could obtain many superpixels of tongue image which could be roughly grouped into three categories, highlight reflections superpixels, subreflection superpixels, and normal tongue body superpixels. As the subreflections were commonly around the highlight reflection, we could use the following strategy to locate them. First, as shown in Figure 4, with the initial highlight reflection detected in Section 2.1, we could obtain an interest area, a rectangle box with green border, and a circle band with red dash line border around each highlight reflection using morphological dilation operation. Then, for each superpixel in the circle band, we determinate its category with where means pixels value of single superpixel in the band, is the mean value of normal pixels of tongue body in the interest area, and and are two nonnegative parameters which were empirically set to 1 and 1.15 in this paper. The width of the circle band was adaptively adjusted according to the size of highlight reflection, while for large highlight reflection the width of corresponding circle was wider. By classifying all the candidate superpixels in the circle band using Formula (5), we could detect all the subreflections and update the highlight reflections at the same time.
Parameter in (4) is one parameter that indirectly affects the granularity of the final segmentation. With a larger , it usually leads to larger regions but also reduces a higher likelihood of missing segmentation boundaries, while a smaller often leads to a consistent oversegmentation. Figure 5 shows the detection results by setting to 0.01, 0.05, 0.10, and 0.15, respectively. The mask map in Figure 5(b) can cover the highlight reflections and subreflections, while excluding the normal pixels around subreflections. In Figure 5(a), the subreflections areas are not detected, while in Figures 5(c) and 5(d) many normal pixels are misclassified to the subreflection area. Our experiments show that, for most of tongue images, satisfactory subreflection detection results are obtained with .
Finally, the highlight reflections and subreflections were combined together as the final detection results of specular reflection areas for inpainting. Figure 6(b) shows the final detection result of the specular reflection area of Figure 6(a). For comparison, Figures 6(c) and 6(d) show the results of morphological dilation based method  by using disk-shaped structuring element with 1 × 1 pixel and 5 × 5 pixels, respectively. Compared with the enlarged subimages in Figures 6(b) with 6(c) and 6(d), we can see that the mask map in Figure 6(c) does not cover all the subreflections, especially in the part inside the red circle, while the mask map in Figure 6(d) eroded too much normal pixels. Generally, the proposed superpixel-based method could accurately detect the highlight reflection and subreflection areas while compared with morphological dilation.
3. The WNNM-Based Nonlocal Inpainting Method
In this section, we proposed an examplar-based inpainting method, that is, WNNM-based inpainting, for tongue specular reflection removal. The examplar-based methods can be traced back to 1999  and had been widely adopted for image inpainting [25–31]. The arising of nonlocal self-similarity methods  further triggered the development of examplar-based inpainting approaches. Most existing methods, however, fill the holes based on the best matched patch or the mean of the nonlocal similar patches, while the second-order information usually is neglected in inpainting. To remedy this, a low rank based inpainting model  is proposed to synthesize the missing regions, where tensor trace norm is adopted as the low rank regularizer. Besides, the WNNM model  generally is superior to other low rank models, for example, trace norm or nuclear norm, for image denoising. This motivates us to employ WNNM for tongue specular reflection removal, resulting in the proposed WNNM-based inpainting model.
Given a small patch on the tongue image, there are many nonlocal similar patches across the whole image. If we stretch each patch to a vector and stack all the vectors in a matrix, it is intuitive that the matrix should be of low rank. Based on this assumption, the tongue image inpainting work could be regarded as a low rank matrix completion problem. By utilizing a newly proposed weighted nuclear norm model, we further showed that tongue image inpainting could be well solved by iteratively performing weighted nuclear norm minimization (WNNM).
In the following subsection, a brief review on matrix completion was firstly introduced. Then, WNNM was described. Finally, the WNNM-based nonlocal inpainting method was proposed for specular reflections removal of tongue image.
3.1. Matrix Completion
Matrix completion aims to recover a low rank matrix from incomplete samples of its entries, which has received considerable attention in many areas of engineering and science [35, 36], for example, the well-known Netflix problem . Matrix completion can be cast as the following minimization problem:where is recovered from , denotes the set of position of known entries in matrix , and is a linear operator with the following definition: for any matrix ,
The model in (6) is nonconvex and is NP-hard problem which is nontrivial to solve. Therefore, convex relaxation of (6) usually was adopted with the following formulation:where is the nuclear norm of , and are the th singular values of . The problem in (8) can be approximated by the following unconstrained convex minimization problem:where denotes the Frobenius norm, and is a tradeoff parameter. When goes infinite, the solution of (9) will be the same as that of (8). As an unconstrained optimization problem, (9) could be solved by the iterative shrinkage algorithm  and the APG method .
3.2. Weighted Nuclear Norm Minimization
The iterative shrinkage algorithm of the model in (9) usually involves solving the nuclear norm minimization (NNM) problem aswhere , , , and .
Cai et al.  proved that the NNM problem of (10) can be easily solved by the singular value thresholding (SVT) method; that is,where is the solution to (10), is the singular value decomposition (SVD) of , is the diagonal matrix with singular values, and is the soft-thresholding function on as
Recently, Gu et al.  suggested that, as the soft-thresholding operator shrinks each singular value with the same in order to pursue the convex property, it ignores the prior knowledge about the singular values. Compared with the small singular values, the larger ones are generally associated with the major projection orientation of the matrix in the lower subspace and they should be shrunk less to preserve the major data components. Thus, they extended the standard nuclear norm to the weighted nuclear norm:where and is a nonnegative and assigned to the singular value and the weighted nuclear norm minimization (WNNM) problem could be formulated as
Generally, the model in (14) is not convex, but Xie et al.  further proved that if the weights, assigned to the singular vales, are arranged in ascending orders, the globally optimal solution could be obtained with the following theorem.
Theorem 1 (see ). If the weights satisfy , then the WNNM problem (16) has a globally optimal solution:where denotes the SVD of matrix , is the diagonal matrix with singular values arranged in descending order; that is, , and is the soft-thresholding operator:
WNNM has been applied to image denoising and achieved better results than state of the arts, such as LSSC  and BM3D , by PSNR and SSIM values. Moreover, WNNM is effective in preserving the local structures of images and generating less visual artifacts.
3.3. Tongue Image Inpainting by Using WNNM
In this section, we proposed a WNNM-based nonlocal inpainting method for repairing tongue image specular reflection areas. First, the WNNM-based inpainting model was introduced to utilize the nonlocal information of tongue image. Then, we described the proposed optimization algorithm by iteratively solving a series of standard WNNM problems. Finally, we analyzed the convergence of our algorithm.
Given a tongue image with specular reflection , the proposed WNNM-based tongue inpainting model is formulated aswhere is the inpainted image, denotes the nonreflection area, and is a linear projection operator defined in (7). The model in (17) can be obtained by substituting the standard nuclear norm with weighted nuclear norm in (9). Because of the introduction of the projection operator , the model in (17) cannot be directly solved by WNNM algorithm in . Thus, we adopt the following iterative shrinkage algorithm. In each iteration, we consider the following surrogate function  of at a given point :where and is the Lipschitz constant of . satisfies that (i) for any and (ii) . By ignoring the constant term, the minimization of (18) can be formulated aswhere . Equation (19) has the same form of (14), which could be solved by using WNNM. Finally, we can get the minimizer of (17) by iteratively solving (19) with proximal gradient as described in Algorithm 1.
There are many repeated patterns in the tongue image which are useful for tongue image inpainting. In order to take advantage of this nonlocal information, we split the tongue image into many 5 × 5 pixels patches. For a single patch , many nonlocal similar patches could be found across the whole image. Stretching each similar patch to a vector and stacking them into a matrix , it is intuitively that should be of low rank. By using (19), we could get the estimation of aswhich could be solved by the generalized soft-thresholding method in Theorem 1. Specifically, the weight assigned to the th singular value, , of as , where is a constant, is the number of similar patches in , and is to avoid dividing by zero. In this paper is set to 10−3, is set to 0.1, and are set to 2 and 5, respectively. After that, the patch can be inpainted by finding the most similar one to it in . Finally, by aggregating all the patches together, the whole image can be updated.
3.4. Convergence of the WNNM-Based Nonlocal Inpainting Method
In this section, we proved the convergence of WNNM-based inpainting method which could be summarized as in the following theorem.
Theorem 2. To solve , one iteratively performs the two stepsand has
Proof. For image inpainting, we have . Define the surrogate function asSince , is positive semidefinite, where is the identity matrix. Thus we can obtain thatGiven , with the surrogate function in (24), we havewhere is the constant term w.r.t. to . Equation (26) is actually the iterative process of updating using gradient descent as With WNNM, we can obtain the optimal solution of (26); . Then one can getFrom (25), one can derive thatCombining (28) and (29), we have From (30), we get .
From Theorem 2, the proposed algorithm can guarantee to decrease the loss function along with iterations until convergence to a fixed point.
4. Experimental Results
In this section, we validated the effectiveness of the proposed method for tongue image inpainting. For better evaluation, both synthetic images and real images were used in the experiment. The inpainting results of the TV-based method  were used for comparison.
4.1. Experimental Results on Synthetic Images
In this section, we quantitatively compared the performance of the competing methods on synthetic images. The synthetic images were constructed as follows: we first detect the specular reflections of one real tongue image and then randomly put these reflections onto nonreflection tongue images to synthesize new images for inpainting. The benefit was that, on one hand, we could ensure that the topological structure of reflections was as natural as real ones. On the other hand, we could conveniently calculate the peak signal to noise ratio (PSNR) since we had the ground truth.
We used two ways to detect the highlight reflections of tongue image. One was the morphological dilation based method in  and the other was our superpixel based method. Synthetic images with both kinds of reflections were used in the experiments of WNNM-based inpainting and TV-based inpainting, respectively.
Figure 7 shows the comparison experimental results of competing methods on synthetic tongue images with reflections detected by morphological dilation based method, while Figure 8 shows the results on the synthetic tongue images with reflections detected by the proposed superpixel based method.
(a) Ground truth
(b) Synthetic tongue images
(c) Results of WNNM-based method (PSNR: left: 30.37, right: 31.56)
(d) Results of TV-based method (PSNR: left: 29.32, right: 30.06)
(a) Ground truth
(b) Synthetic tongue images
(c) Results of our WNNM-based method (PSNR: left: 32.29, right: 31.34)
(d) Results of the TV-based method (PSNR: left: 31.39, right: 30.47)
From Figures 7, 8, and 9, we could find that for the same synthetic tongue image, in terms of PSNR, the proposed inpainting method obtained better result than the TV-based inpainting method. Furthermore, the TV-based inpainting method usually finds the pixels via minimizing total variation value, while the proposed method usually finds the similar pixels to the area around the reflections. Thus, compared with the TV-based method, the proposed method is much better in handling the case of tongue image with large reflection areas by involving more texture information, as shown in Figure 9. In Figure 9(b), we manually add a block to original tongue image to simulate the large reflection area. It is easy to find that the inpainting result of the proposed method is more similar to the ground truth, which is much better than the TV-based method by involving more texture information of tongue body.
(a) Ground truth
(b) Tongue image with missing block
(c) Inpainting result of our WNNM-based method (PSNR = 25.39)
(d) Inpainting result of the TV-based method (PSNR = 23.66)
4.2. Experimental Results on Real Tongue Images
For better evaluation of the proposed inpainting method, we used two real tongue images with serious specular reflection areas to validate its performance. The experimental inpainting results of three real tongue images are shown in Figure 10. Moreover, on the lower-right of each image of Figures 10(b) and 10(c), we further showed the enlarged subimages of the inpainting images. From Figure 10, one could observe that the proposed method would achieve satisfactory visual results with much texture information of tongue image while part of horizontal or vertical discontinuity can be observed from the results obtained by the TV-based method.
(a) Original tongue images with specular reflections
(b) Inpainting results of our WNNM-based method
(c) Inpainting results of the TV-based method
Based on the results on synthetic and real tongue images, we showed that the proposed method was better than TV-based method, either in terms of PSNR or in the visual results. The proposed method filled the reflection area with more texture information of tongue image, which was useful in the tongue texture analysis applications. Compared with TV-based inpainting method, the proposed method was more suitable for the specular reflection removal of tongue images.
In this paper, we proposed a WNNM-based nonlocal inpainting method for specular reflections removal of tongue image. In the proposed method, superpixel segmentation was adopted to handle the specular reflections detection task. Then, based on the nonlocal self-similarity, we transformed the problem of specular reflections inpainting of tongue image to a matrix completion problem and further proved the convergence of the proposed WNNM-based tongue image inpainting method.
We evaluated the performance of the proposed approach by comparing with the TV-based tongue image inpainting method. The comparison experimental results on both synthetic tongue images and real tongue image showed that the proposed method could achieve not only higher PSNR, but also more satisfactory visual effects, which could involve more texture information of tongue body, and was more suitable for the specular reflections removal of tongue images.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is partially supported by the NSFC funds of China under Contracts nos. 61271093, 61401125, and 61471146.
J. K. Anastasi, L. M. Currie, and G. H. Kim, “Understanding diagnostic reasoning in TCM practice: tongue diagnosis,” Alternative Therapies in Health and Medicine, vol. 15, no. 3, pp. 18–28, 2009.View at: Google Scholar
B. Kirschbaum, Atlas of Chinese Tongue Diagnosis, Eastland, Seattle, Wash, USA, 2000.
Y. Cai, “A novel imaging system for tongue inspection,” in Proceedings of the 19th IEEE Instrumentation and Measurement Technology Conference, pp. 159–163, May 2002.View at: Google Scholar
Y. J. Jeon, K. Kim, J. Do, H. Ryu, and J. Kim, “The development of a tongue diagnosis system and the evaluation of reproducibility,” Korean Journal of Oriental Medicine, vol. 14, no. 3, pp. 97–102, 2008.View at: Google Scholar
K. H. Kim, J.-H. Do, H. Ryu, and J.-Y. Kim, “Tongue diagnosis method for extraction of effective region and classification of tongue coating,” in Proceedings of the 1st International Workshops on Image Processing Theory, Tools and Applications (IPTA '08), pp. 1–7, IEEE, Sousse, Tunisia, November 2008.View at: Publisher Site | Google Scholar
Z. Gao, L. Po, W. Jiang, X. Zhao, and H. Dong, “A novel computerized method based on support vector machine for tongue diagnosis,” in Proceedings of the 3rd IEEE International Conference on Signal Image Technologies and Internet Based Systems (SITIS '07), pp. 849–854, December 2007.View at: Publisher Site | Google Scholar
B. Huang, Tongue feature analysis and symptom diagnosis classification [Ph.D. dissertation], Harbin Institue of Technology, 2009.
D. Jia, N. Li, C. Li, S. Li, and W. Zuo, “Reflection removal of tongue image via total variation-based image inpainting,” in Proceedings of the International Conference on E-Product E-Service and E-Entertainment (ICEEE 2010), pp. 1–4, Henan, China, November 2010.View at: Publisher Site | Google Scholar
R. Bornard, E. Lecan, L. Laborelli, and J.-H. Chenot, “Missing data correction in still images and image sequences,” in Proceedings of the ACM International Conference on Multimedia, pp. 355–361, Juan les Pins, France, December 2002.View at: Google Scholar
I. Drori, D. Cohen-Or, and H. Yeshurun, “Fragment-based image completion,” ACM Transactions on Graphics, vol. 22, no. 3, pp. 303–312, 2003.View at: Google Scholar
N. Srebro and R. R. Salakhutdinov, “Collaborative filtering in a non-uniform world: learning with the weighted trace norm,” in Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS '10), pp. 2056–2064, December 2010.View at: Google Scholar
A. Nemirovski, Efficient Methods in Convex Programming, TECHNION, 2005.
K.-C. Toh and S. Yun, “An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems,” Pacific Journal of Optimization, vol. 6, no. 3, pp. 615–640, 2010.View at: Google Scholar
K. Lange, D. R. Hunter, and I. Yang, “Optimization transfer using surrogate objective functions,” Journal of Computational and Graphical Statistics, vol. 9, no. 1, pp. 1–20, 2000.View at: Google Scholar