Abstract

Learning the knowledge hidden in the manifold-geometric distribution of the dataset is essential for many machine learning algorithms. However, geometric distribution is usually corrupted by noise, especially in the high-dimensional dataset. In this paper, we propose a denoising method to capture the “true” geometric structure of a high-dimensional nonrigid point cloud dataset by a variational approach. Firstly, we improve the Tikhonov model by adding a local structure term to make variational diffusion on the tangent space of the manifold. Then, we define the discrete Laplacian operator by graph theory and get an optimal solution by the Euler–Lagrange equation. Experiments show that our method could remove noise effectively on both synthetic scatter point cloud dataset and real image dataset. Furthermore, as a preprocessing step, our method could improve the robustness of manifold learning and increase the accuracy rate in the classification problem.

1. Introduction

Since objects vary gradually in the real world, the manifold assumption indicates that the data points depict the state of an object should distribute on a smooth low-dimensional manifold embedded in high-dimensional observation space [1]. Dimensionalities of the manifold are key factors that control variation of the object state. For example, in Figure 1, the images of the rotational duck toy distribute on a one-dimensional manifold (a curve) embedded in high-dimensional pixel space. Each image depicts a particular state of the duck. Although the pixel values change dramatically at these images, humans could discover easily that they are controlled by one key factor: rotation of the duck.

Learning the knowledge hidden in the manifold-geometric distribution of a high-dimensional dataset is essential in many machine learning algorithms. For example, manifold learning algorithms aim to discover the nonlinear geometric structure dataset by preserving different local geometric properties [38]. The embedding results can be further used in data visualization, motion analysis, and classification [9, 10]. Moreover, much research takes manifold assumption as a constraint condition in its objective function [11, 12]. It is worth noting that manifold assumption is applied to explain why deep learning works well recently [1315]. This research indicates deep learning could capture the manifold structure of one kind of knowledge by powerful nonlinear mapping.

However, noise is inevitable in data acquisition. For example, in Figure 1, the noiseless images of the rotational duck toy (red points) should lie on a curve embedded in the pixel space. However, due to the long exposure time and camera shake, the duck becomes “brighten” and “small” in the image. The corresponding noise data point, which is marked by “N” and green color in Figure 1, does not lie on the curve because pixel values change dramatically in the noise image.

Noise makes machine learning models fragile and hard to train. For example, the outlier points are difficult to handle in the classification and clustering task. Machine learning model needs to become more complex to get proper results [13]. In manifold learning algorithms, noise points make recovered embeddings difficult to capture the true manifold-geometric distribution of the dataset. The reason is that the “short circuit” phenomenon arises easily in the noise dataset which destroys the local linear structure of the manifold [16].

In this paper, we propose a novel denoising method based on manifold assumption. Our aim is to obtain the data points that lie on the noiseless manifold through noise data points. Compared with the existing denoising methods, our method has two contributions worth being highlighted:(1)Our method makes use of manifold-geometric distribution information of the dataset. Therefore, this method works for a dataset rather than a single data point.(2)Our method improves the Tikhonov model to make the variational diffusion on the tangent space of the manifold for a high-dimensional nonrigid point cloud dataset.

Our method could capture the “true” geometric structure of the noise dataset. After denoising, the key factors that control the geometric distribution of the dataset are maintained and the characteristics of individual points are removed as noise. As a preprocessing step, our method could improve the robustness of manifold learning and increase the accuracy rate in the classification problem.

The rest of the paper is organized as follows: a brief review of the research on the manifold assumption is outlined in Section 1. Section 2 describes the motivation and details of the proposed method. In Section 3, experiments are conducted on both synthetic and real data to evaluate our method. Section 4 concludes remarks and a discussion of future work.

Existing denoising methods always work for the noise in a single data point, such as “Gaussian noise” or “pepper noise” [17, 18] in an image. However, these methods could not deal with the noise that distorts the geometric distribution of the dataset, such as the noise duck toy image (green point) caused by longer exposure time and camera shake in Figure 1.

Only a few studies exist to deal with this problem. Gong et al. [19] proposed a local linear denoising method. This method removed noise by projecting noise data points to the tangent space of manifold which is estimated by the principal component analysis method firstly. Then, local denoised patches are aligned to get the global denoising dataset. However, the principal components may be distorted because they are calculated by the neighborhood of noise data points, which could lead to a wrong denoising result. Hao et al. [16] also utilized principal component analysis and projection method to find the noiseless data points. Therefore, it has the same problem. Moreover, many machine learning methods proposed the noise-resistant model for outliers but did not discuss denoising as an independent problem [7, 20]. For example, Zhang et al. [7] proposed an adaptive neighborhood selection method by the shrink and expand strategy to resist noise on the neighborhood of manifold.

In this paper, we propose a denoising method for the dataset. This method improves the Tikhonov method by adding a local structure term. The optimal solution is obtained by minimizing the objective function through a variational diffusion approach.

3. Proposed Approach

Let be the noise dataset. is the x-th data point in F. D is the dimension number of . Let be the noiseless dataset we want to obtain. is the x-th data point in U. , is the noise of . The goal is to recover U from F.

We illustrate our method in three steps: firstly, introduce to inspiration and motivation; then, construct the objective function by improving the Tikhonov model; and finally, optimize the objective function and get the solution by taking discrete operators.

3.1. Inspiration and Motivation

Manifold assumption claims that the noiseless data point that depicts the object state (the blue points in Figure 2) should lie on a smooth manifold (blue surface in Figure 2) embedded in observation space. However, noise points (red points) distribute on the noise manifold . The denoising problem is how to obtain ux on from on .

3.2. Objective Function

The objective function is formulated in this part. Firstly, we illustrate the Tikhonov model briefly in image denoising which is similar to our problem. Then, the challenge of our problem is shown. Finally, we improve the Tikhonov model and construct the objective function for our problem.

3.2.1. Tikhonov Model in the Image Denoising Problem

Our problem is similar to the image denoising problem , where x and y are row and column numbers of a pixel in an image. and are pixel values at row and column in noise and noiseless image, respectively. is the noise. In Figure 2, if we regard the , , and coordinate of as row number, column number, and pixel value, then the red manifold F depicts the pattern of noise image. Therefore, the image denoising problem is to find a noiseless image from .

The Tikhonov model is one of the most classical variational models to deal with this problem [21]:where Ω is the image domain and dx is the area element (pixel) in . ∇u is the gradient of . The first term is “data term” that measures the Euclidean distance between and . The second term is “smooth term” that measures the noise strength of . Since these two terms have opposite effect, the parameter balances these two terms. If is small, is close to but the noise strength is large. On the other hand, the noise becomes small but the image pattern of is “unlike” .

3.2.2. The Challenge of Our Problem

In the image denoising problem, the gradient operator is defined as [21]

When minimizing the “smooth term” in (1), the pixel values in the image became the same, whereas the image area does not change since and are fixed.

However, in our problem, the dataset is nonrigid and high-dimensional cloud points. Let be a data point. is the dimension number of . Suppose is the neighborhood of which is determined by the method:

Naturally, the gradient operator is defined as

Therefore, the “smooth term” in (1) is

When minimizing an objective function, the “smooth term” makes and become the same point. Therefore, the “cluster” phenomenon arises in the dataset—some points are brought close together and the other points are pushed away. Therefore, the geometric structure of the manifold (blue surface in Figure 2) will shrink to a few point clusters rather than becoming smooth. Therefore, the Tikhonov model could not be applied directly to solve our problem.

3.2.3. Our Objective Function

To deal with this problem, we maintain the geometric distribution of by keeping the tangent linear structure when minimizing the objective function. Since the neighborhood of the manifold could be regarded as tangent space (the blue plane in Figure 3), we make the neighborhood structure of the same as .

The weight of local linear representation is utilized to depict the geometric structure of the neighborhood. The weight of data point is defined aswhere and is the -th component of between and . Similarly, the linear representation weight of is defined as .

The local linear structure can be maintained if we set the same as . Then, could only move along the normal space of manifold when minimizing the “smooth term” in the objective function because the tangent geometric structure is fixed by . Therefore, we add a “local structure term” in the Tikhonov model:where is the linear reconstruction of . Thus, our objective function iswhere and are balance parameters.

3.3. Optimal Solution

In this part, we get optimal by minimizing objective function (8). The solution in the continuous form is calculated firstly. Then, the discrete operator is defined and plugged to get a discrete solution.

3.3.1. Solution in Continuous Form

To get optimal , we calculate the derivative of (8) with respect to by variational approach and set it to zero:

Therefore, the Euler–Lagrange equation of is

Then,

And the boundary condition is

3.3.2. Solution in Discrete Form

To get the discrete solution, we define the discrete Laplacian operator in (11) by spectral graph theory [22]. Firstly, the gradient of is defined as

This gradient is a -dimensional vector because there are data points in . The subscript “” is abbreviated to “weighted graph.” is a weight vector. The component should be important if and are near. On the contrary, the component should be unimportant if and are far away. Therefore, we define aswhere is the vector of Euclidean distance between and . is the variance of . For the convenience of calculations, we set . Therefore, the discrete gradient of is

Consequently, the gradient of a vector is (the derivation procedure is listed at “Notice” at the end of this capture):

Let , therefore, the discrete Laplace operator of can be defined by

We plug the discrete Laplace operator into (11). The solution of our object energy function (8) iswhere the superscripts and +1 are the iteration step. The initial value of is set to . The optimal is obtained by iteration, which ends up when , where is the objective function value and is a small error we set. The boundary condition (12) could be ignored because the dataset is scattered and nonrigid cloud points.

Notice:

The gradient of a vector could be derived as follows:

Therefore,

4. Experiments

In this section, we evaluate our algorithm on both the synthetic scatter point cloud dataset and real image dataset. Then, this method is utilized as a preprocess step for manifold learning and classification task. The major parameters of our algorithm include (1) the neighborhood size ; (2) the smooth term weight ; and (3) the local structure term weight .

4.1. Experiments on Synthetic 3D Scatter Cloud Data

In this part, we test our algorithm on the classical “swiss roll” dataset. The data points are sampled from 2D manifold randomly embedded in the 3D space like a swiss roll cake. Figures 4(a) and 4(b) at first row are noiseless and noise dataset at [−8, 10] and [0, 0] viewpoint, respectively. It is obvious that noise data points distribute around the “swiss roll” manifold but do not lie on it exactly. Our goal is to recover the noiseless dataset in Figure 4(a) by the noise dataset in Figure 4(b). In this experiment, we set the number of data points , parameter  = 12, and the noise parameter . The MATLAB code of the swiss roll dataset is listed in Table 1.

The second, third, and fourth rows in Figure 4 are denoising results by our method with and equal to (1, 1), (3, 1), and (0.3, 1), respectively. For ease of viewing, we set the denoising datasets at [−8, 10] and [0, 0] viewpoints in the left and right columns. In the right column, it is easy to see that the denoising data points are closed to the tangent space of manifold compared with (b), which show that our method is effective. Among them, (f) seems to be the best result because the denoising points are the nearest to manifold compared with (d) and (h). However, the “cluster” phenomenon arises in the denoising dataset; some points are close together and the other points are pushed away, which is easy to see in (e). The reason is that the large smooth parameter () makes geometric distribution distort when minimizing the objective function. Conversely, the “cluster” phenomenon in (g) is not serious when we set a small parameter , but the noise is large.

To conduct a quantitative comparison between noise and denoising datasets, we assess the quality of the denoising datasets by mean square error () and tangent distance error (). is a widely used index which measures the average squared Euclidean distance difference between two datasets:where is the point number of the dataset. and are a noise data point and corresponding noiseless data point.

The tangent distance error () measures the distance of to the tangent space of the manifold. A small indicates that lies on the manifold and noise is weak. On the contrary, the noise strength is large if is big. For the convenience of calculations, we approximate as the Euclidean distance between and its nearest data point in the noiseless dataset. The tangent distance error () is defined aswhere is the number of data points, and represent the denoising data point and noiseless data point, respectively. is the noiseless dataset.

To evaluate our algorithm, we test seven sets of and ranging from 0 to 10. and are listed in Tables 2 and 3. When and equal , the “data term” is the only term remaining in the objective function (8). Therefore, the denoising dataset is the same as the noise dataset and the value at ( = 0,  = 0) is the errors of the noise dataset. While is small and is large, the “data term” and “local structure term” maintain the geometric structure of the noise dataset. Therefore, the errors at the upper right of the table are close to the errors of the noise dataset. While is large and is small, the “smooth term” plays a major role. It could lead to a “cluster” phenomenon which distorts the geometric structure of the dataset and make errors large at the bottom left of the table. It is able to see that the errors near the diagonal of tables are much smaller than the others.

4.2. Experiments on the Image Dataset

In this part, we test our method on two real image datasets: MNIST handwritten number dataset [23] and “LLE face” dataset. Image is regarded as a point in pixel space. For example, the image in the MNIST dataset could be regarded as a point in 784-dimensional space because it has 784 pixels. Therefore, the only difference between this part to experiment 3.1 is that the dimensionality of image-point is much higher than the synthetic scatter point in 3D space.

We analyze denoising images both from the subjective and objective aspects. Firstly, our method is applied to raw image datasets. Ideally, key factors that control the geometric distribution of the dataset could be maintained and the characteristics in individual images are removed as noise. Since there is no ground truth of the raw image dataset, we could only evaluate results by eyes subjectively. Secondly, we add several types of noise in an image and utilize to measure the denoising images by our method and classical image denoising methods objectively.

4.2.1. Experiments on the Raw Image Dataset

We select “number 3” and “number 4” datasets in MNIST which contain 1010 and 982 images, respectively. The size of each image is pixels. The “LLE face” dataset contains 1965 face images with different expressions and shooting angles. The size of each image is pixels.

Figure 5 shows 110 images in the “handwritten number 3” dataset. The left side is original images and the right side is the corresponding denoising images by our method. In this experiment, , , and . Four typical images are marked with a box and listed in Figure 5. It can be seen that the blurring strokes become clear and the posture of number in the image is maintained.

Figure 6 shows the 110 images in the “handwritten number 4” dataset. The left and right sides are original images and the corresponding denoising images by our method, respectively. In this experiment, , , and . It can be seen that the denoising images maintain the main factors, such as the angularity of number “4.” And the individual characteristics are removed after denoising; for example, the difference of stroke width becomes small after denoising. Four typical images are marked with a box and listed in Figure 6. It is obvious that the margin of “head” of number “4” becomes large in the first two images after denoising. In the third image, the stroke width becomes broad. In the fourth image, the “bend” at the upside of the stroke is removed.

Figure 7 shows the denoising result for the LLE face dataset. This dataset contains 1965 face images and the size of each image is pixels. In this experiment, , , and . [4] shows that this dataset distributes on the manifold that spans by two key factors: head pose and expression, where the expression reflects by lip shape in images.

It can be seen that these two factors are maintained after denoising and the characters in the individual image are removed as noise. Four typical images are marked with a box and listed in Figure 7. In the first two images, the head twists to the left and right slightly in the original dataset whereas the head pose is fixed after denoising. In the third image, the original head seems to be smaller than the other images which may be caused by camera shake. The corresponding denoising image enlarges the face, and the cheek and chin became “fat.” In the fourth image, the eyes are “open” after denoising.

4.2.2. Experiments on the Noise Image

In this part, we add several different types of noise to an LLE face image. Then, our method and three classical image denoising methods are applied to these noise images. Finally, is utilized to evaluate denoising images.

Figure 8 shows the denoising images by four denoising methods for five types of noise. The first column is a raw LLE face image. Brightness noise, Gaussian noise, salt and pepper noise, rotation noise, and scaling noise are added to the raw image which are shown in the second column, top to bottom row. The MATLAB code of noise model is listed in Table 4.

Three classical denoising methods, mean filtering, median filtering, and Tikhonov method are utilized to deal with these noise images. The corresponding denoising images are listed in the third, fourth, and fifth columns in Figure 8. The images in the last column are denoising results by our method. is listed below each image. In this experiment, the size of the raw LLE face image is pixels. In mean filtering, the size of the filter is pixels. In median filtering, the size of the filter is pixels. In the Tikhonov method, the smooth parameter is 0.3. The parameters in our method are set to , , and .

It can be seen that three classical denoising methods have no effect on brightness noise, rotation noise, and scaling noise. These noises still exist in denoising images. The even becomes larger after denoising in contrast to the noise image whereas our method has a good effect. For example, the rotation face is fixed at the fourth row and sixth column and becomes smaller.

The reason is that classical image denoising methods make use of the pattern information in a single image. They could not “see” the geometric distribution information of the whole image dataset whereas our method removes noise by drawing noise data points back to the noiseless manifold-geometric distribution of the image dataset.

4.3. Denoising Dataset for Manifold Learning

In this part, we utilize our method as a preprocessing step and compare the recovered low-dimensional embeddings of noise and denoising datasets on several manifold learning algorithms. In this experiment, , , and are 1, 0.8, and 13.

Figures 9(a) and 9(b) are noise “swiss roll” dataset and the ground truth of the noise dataset. Figures 9(c) and 9(d) are embeddings of the noise and denoising dataset by Isomap. Figures 9(e) and 9(f) are embeddings of the noise and denoising dataset by LTSA. Figures 9(g) and 9(h) are embeddings of the noise and denoising dataset by HLLE. It is obvious that embeddings of the noise dataset could not reflect the geometric distribution of manifold since the neighborhoods easy to result in the “short circuit” phenomenon. By taking the denoising dataset, all the three manifold methods could get the proper embeddings. The results of Isomap result in the “hole” phenomenon because the calculated geodesic distance is always larger than it really is.

To conduct a quantitative comparison, we assess the quality of the embeddings by three indexes: embedding error, trustworthiness error, and continuity error [8]. The embedding error measures the squared distance from the recovered low-dimensional embeddings to the ground truth coordinates which could be defined aswhere is the number of data points and and represent the embedding coordinates and ground true coordinates, respectively. This index tends to measure global structure distortion of the manifold.

The trustworthiness error and continuity error measure the local geometric structure distortion. The trustworthiness error measures the proportion of points that are too close together in the low-dimensional embedding and continuity error measures the proportion of points that are pushed away:where is the point number in the neighborhood, is the rank of the point in the ordering according to the pairwise distance from point in the high-dimensional space, and is the rank of the point in the ordering according to the pairwise distance from point in low-dimensional embedding. The variables and denote the neighborhood points of in low-dimensional embedding and high-dimensional space, respectively.

We test our method on several dimension reduction methods. The noise swiss roll dataset contains 1300 points. Here, we set , , and to 1, 0.8, and . The best embedding results among several trials are selected in this experiment. The embedding error, trustworthiness error, and continuity error are listed in Tables 57, respectively. To show the effectiveness of our method, the errors of noise dataset, denoising dataset, and noiseless dataset are listed in three rows. It could be seen that the errors become small by taking the denoising dataset in Isomap, LLE, HLLE, LTSA, and AML. However, LE and LPP have a poor performance by taking denoising dataset.

4.4. Classification Experiment

In this part, we utilize our method as a preprocessing step and compare the accuracy rate of the original dataset and denoising dataset in the classification task. MNIST handwritten number dataset is selected which contains 60000 images with ten classes from numbers 0 to 9. Each class has about 6000 images and the size of each image is pixels. To get the denoising dataset, we utilize our denoising method for these ten classes, respectively.

In this experiment, we specify different numbers of images in each class as training data and utilize the remaining images as test data both in the original dataset and denoising dataset. A simple one-hidden-layer neural network is adopted as a classifier. The input layer has 784 units corresponding to the pixels in an image. The output layer has 10 units corresponding to ten categories from number zero to nine. We set 25 units in the hidden layer including a bias unit. The parameters of the network are trained by the BP method.

For each classification task, we repeat 10 times and list the mean accuracy rate in Figure 10. The labels “original dataset” and “denoising dataset” are raw MNIST dataset and denoising dataset with our method. The -coordinate is the number of training images in each class and the -coordinate is the accuracy rate. The blue and red lines are the accuracy rate of the original dataset and denoising dataset, respectively. It is obvious that the accuracy rate goes down as the number of training images decreases in each class. The performance of the denoising dataset is much better than the original dataset, especially when the training number is less than 50 in each class. The accuracy is above 96% even when there are only 10 training images in each class for the denoising dataset.

The reason is that the individual characters are removed in the denoising dataset, which is shown in Figures 57 in Section 3.2.1. The denoising datasets that distribute on a “clean” manifold expanded by key factors of the dataset could make machine learning algorithm easy to learn the geometric distribution knowledge of the dataset. It also illustrates that there is some kind of essential features to the classifier that is captured by our method.

5. Conclusion and Future Work

We propose a denoising method for the dataset rather than a single data point. This method is inspired by the manifold assumption. A local structure term is added in the Tikhonov model to make the noise points diffuse on the tangent space of the manifold. Our method could prominent the major factors hidden in the dataset and remove characteristics of the individual data point. Experiments show that our method could eliminate noise effectively on both synthetic scatter point cloud dataset and real image dataset. And as a preprocessing step, our method could improve the robustness of manifold learning and increase the accuracy rate of the classification problem. However, the parameters are sensitive in this model because the optimal solution is calculated by iteration. The geometric distribution of the dataset is distorted when the smooth term parameter is large. On the contrary, the noise intensity is still large after denoising. Our future work will focus on this problem.

Data Availability

Some or all data, models, or codes generated or used during the study are available from the first author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant nos. 51705304 and 61671285) and the Natural Science Foundation of Shanghai (Grant no. 19ZR1420800).