The present work concerns the estimation of the probability density function (p.d.f.) of measured data in the Lamb wave-based damage detection. Although there was a number of research work which focused on the consensus algorithm of combining all the results of individual sensors, the p.d.f. of measured data, which was the fundamental part of the probability-based method, was still given by experience in existing work. Based on the analysis about the noise-induced errors in measured data, it was learned that the type of distribution was related with the level of noise. In the case of weak noise, the p.d.f. of measured data could be considered as the normal distribution. The empirical methods could give satisfied estimating results. However, in the case of strong noise, the p.d.f. was complex and did not belong to any type of common distribution function. Nonparametric methods, therefore, were needed. As the most popular nonparametric method, kernel density estimation was introduced. In order to demonstrate the performance of the kernel density estimation methods, a numerical model was built to generate the signals of Lamb waves. Three levels of white Gaussian noise were intentionally added into the simulated signals. The estimation results showed that the nonparametric methods outperformed the empirical methods in terms of accuracy.

1. Introduction

Structural health monitoring (SHM) is an emerging technology that merges with a variety of techniques related to diagnostics and prognostics. Monitoring the status of structural health can improve the safety and maintainability of critical structures in many fields, such as civil engineering, aerospace, and military industry. An ideal SHM system includes several subsystems in which the damage detection methodology is the key part. Therefore, numerous damage-detection methods have been researched in years [1]. The method based on Lamb waves has the apparent advantages of high sensitivity to structural damage compared with methods based on the mode shapes [2] or structure dynamic responses [3]. It has been verified that the Lamb wave-based damage detection methods can detect crack, delamination, surface corrosion, penetrate holes, weld defect, and many other kinds of damage in plate and shell structure [46]. Consequently, the Lamb wave is widely acknowledged as one of the most encouraging tools for SHM. The relevant research has been conducted intensively since the 1980s [7].

The portion of the SHM process that has received the least attention in the technical literature is the development of statistical models for discrimination between features from the undamaged and damaged structures. The algorithms, which analyze statistical distributions of the measured or derived features to enhance the damage identification process, have been developed [8, 9]. The probability-based diagnostic methods have also been introduced in Lamb wave-based damage detection area in recent years [10, 11]. However, the statistical modes using in the existing Lamb wave-based methods are relatively simple. Despite a number of literatures had been published, which focused on the consensus algorithm of combining all the results of individual sensors, the p.d.f. of the measured data was empirically determined. As a key part of statistical model, it is obvious that the accuracy of the p.d.f. has a significant effect on the precision of damage-detecting result. Compared with the estimating results by empirical formula, the results of statistical methods will be more accurate and reliable. Hence, the study of using statistical methods to estimate the p.d.f. is necessary in Lamb wave-based damage detection.

Elementary parametric estimation method has been adopted under the assumption that the p.d.f. of the measured data is normal distribution [12]. However, the assumption in parametric method limits the application of this method. If the extra assumption is correct, the results produced by parametric method can be more accurate than the results given by empirical formula. While if the assumption is incorrect, parametric methods can be very misleading.

Since the type of p.d.f. of measured data from field experiments is varied and can hardly be predicted, more robust approach methods should be considered. The nonparametric statistic methods can give the parameters of distribution and do not rely on assumptions that the data are drawn from a given probability distribution. Therefore, introducing the nonparametric statistic methods is crucial in Lamb wave-based damage detection.

The aim of this paper is to demonstrate the necessity and feasibility of application of kernel density estimation, which is the most popular nonparametric estimation method in Lamb wave-based damage detection. Two kinds of kernel density estimation methods, the one based on the Gaussian approximation and the one based on the smoothing properties of linear diffusion processes, were briefly introduced in this paper. The signals of Lamb waves with different levels of white Gaussian noise were acquired by using numerical simulation. The framework of applying nonparametric estimation method in Lamb wave-based damage detection was demonstrated by using the simulated signals. The characteristics of noise-induced error in the arriving time of damage-scattered Lamb waves, which is the index used to locate damage, was analyzed. Based on this analysis, the outcomes of two kinds of kernel density estimation method as well as the parametric estimation methods were compared. The results show that the nonparametric methods outperform the parametric method in terms of accuracy and reliability.

2. Lamb Wave-Based Damage Detection

2.1. Background

Lamb waves are a kind of elastic waves propagates in thin plate and shell structure. With a high susceptibility to interference on a propagation path, for example, damage or a boundary, Lamb waves can travel over a long distance even in materials with a high attenuation ratio, and thus a broad area can be quickly examined [13].

Lamb waves are made up of a superposition of longitudinal and shear modes, and its propagation characteristics vary with entry angle, excitation, and structural geometry. A Lamb mode can be either symmetric or antisymmetric, formulated by where , , , and , , , , , are the plate thickness, wavenumber, velocities of longitudinal and transverse modes, phase velocity, and wave circular frequency, respectively. Equations (2.1) and (2.2), correlating the propagation velocity with its frequency, imply that Lamb waves, regardless of its mode, are dispersive (velocity is dependent on frequency).

Lamb waves can be actively excited by a variety of means, such as ultrasonic probe, laser, interdigital transducer, and piezoelectric element. The piezoelectric element can also be used as sensor to collect signals of Lamb waves perfectly. The piezoelectric element is particularly suitable for integration into a host structure as an in situ generator/sensor, for their neglectable mass/volume, easy integration, excellent mechanical strength, wide-frequency responses, low power consumption and acoustic impedance, as well as low cost. Applications of piezoelectric element in Lamb wave-based damage detection are numerous.

Lamb mode selection is an important part for damage detection. The basic symmetric mode, , and the antisymmetric mode, , are normally used in practice. Although is preferred in many of studies [14], utilization of is increasing because that is the highly effective for detecting delamination and transverse ply cracks [15]. To implement the Lamb mode selection, a multielement transducer setup was proposed [16] to dominantly generate or .

The algorithms for Lamb wave-based damage identification can be roughly divided into two categories. The first category is the algorithms that identify and locate damage by observing the damage-reflected Lamb waves, such as Time-of-Flight (ToF) method [1719], embedded ultrasonic structural radar [20], and time of difference method [21]. The second category is the algorithms that analyze the changes in the characteristics of Lamb waves caused by the damage in its propagation path, such as tomography method [22] and virtual-sensing paths method [23].

For the algorithms that focus on the damage-reflected waves, the arriving time of the Lamb waves is the key index used to locate damage. Since the signal of Lamb waves is wave packet in the form, several methods have been developed to measure the arriving time of Lamb waves, such as threshold method, correlation method, wavelet method [24], and a novel cross-correlation analysis method based on a wavelet transform [25, 26]. Among those methods, the threshold method, which was adopted in this paper, has the advantage of simplicity. In threshold method, a threshold value was firstly set up on basis of experience. Once the amplitude of one or several peaks exceed , then the corresponding peaks were recorded. Depending on the magnitude of , one or more peaks could be recorded for a wave packet. If only one peak was recorded, the arriving time was the time corresponding to that peak. If more than one peak were recorded, then the arriving time will be the average of all recorded time. Usually, the threshold value is selected to let several peaks belong to one wave packet can be recorded. The benefit of recording several peaks instead of only the strongest peak is that the averaging process itself can reduce noise to some extent.

2.2. Time of Flight Method

ToF, defined as the time lag from the moment when a sensor catches the damage reflected signal to the moment when the same sensor catches the incident signal, was widely used to locate damage [1719].

Consider a sensor network consisting of piezoelectric wafers denoted by (). For convenience of discussion, hereinafter stands for the sensing path in which serves as the actuator and as the sensor. The center of the damage, if any, is presumed to be () in coordinate system. Then, the ToF can be defined in (2.1) as : In which , , and represent the distance from the actuator to the damage, from the damage to the sensor , and from actuator to the sensor , respectively. and are velocities of the damage-converted SH0 mode and the incipient mode, respectively.

Because there are two unknown damage parameters, (), in (2.3), the solution of (2.3) will be a root locus, which implies the possible locations of the damage for a certain ToF value. In traditional approaches, the damage location is given by seeking the intersections of two or more loci. As shown in Figure 1(a), in the case of using three sensor pairs, there will be three loci, each exhibiting a time delay due to the existence of damage. The point with which all three loci intersect was considered as the location of damage, while the points with which only two loci intersect were considered as pseudodamage location.

There is a prerequisite in the traditional approach. That is all of the measured ToF values were accurate. However, errors are always inevitable in any experimental result due to the reasons such as noises. Therefore, as shown in Figure 1(b), there is no point with which all three loci intersect if the loci were drawn based on noise contained instead of the theoretical value . It is suggested that the damage location can be given as the area where the density of intersections of two loci is relatively large. That leads to the research about the probability-based approach method, to give the precise damaged area based on the density of intersections.

2.3. Probability-Based ToF Method

The concept of probability-based approach was introduced by Zhao et al. [27] to improve the performance of Lamb wave-based method, and then it was adopted by Su et al. [28] in ToF method. In traditional ToF method, only the points on loci are considered as possible damage location. Other points, regardless of its distance to the loci, will all be excluded outside the possible damage location. In fact, due to the existence of errors in , the real damage may not be on the loci which were drawn based on . Therefore, in probability-based approach method, the points absent in the loci are also considered as possible damage location. The possibility of damage occurrence in those points will be determined by its distance to the loci. The mesh nodes right located on an above-established locus have the highest degree of probability of damage presence; for the others, the greater the distance to the locus, the lower the probability damage exists there. To quantify the probabilities at all nodes with regard to all loci, a function called as p.d.f. of damage occurrence was introduced. For each loci, a probability distribution map can be given for the detection target plant structure based on p.d.f. of damage occurrence. Combination of all the probability distribution maps can give the final damage detection result.

The main frame of data fusion-based method can be divided into two steps.(1)The inspection area of the structure was evenly meshed. For a certain measured ToF, each mesh node will be evaluated about its possibility for the presence of damage by using a probability density function.(2)All evaluated results for each measured ToF were combined to give the detection result in a matrix form. Each element of the matrix represents the probability of the presence of damage for one mesh node.

The detection result in matrix form can be illustrated in an image shown in Figure 2, where the lighter the greyscale, the greater the possibility of damage existing at that pixel (each pixel exclusively corresponds to a spatial point of the structure under inspection).

It is obvious that the p.d.f. of damage occurrence is the key part of probability based method. Su et al. [10] suggest the p.d.f. can be quantified in relation to the loci: where is the Gaussian distribution function, representing the p.d.f. of damage occurrence at node ( for the structure that is comprised of mesh nodes), perceived by a sensor, ( for the sensor network consisting of sensors). is the standard deviation and where is the location vector of node and is the location vector of the point on the locus provided by sensor that has the shortest distance to node .

Satisfied results have been obtained by using this kind of p.d.f. But it should be noticed that the standard variance was selected depending on experience.

The concept of probability-based approach was also adopted in some other Lamb wave-based damage detection methods rather than ToF method. Wang et al. [23] combine the concept of probability-based approach with virtual-sensing paths method. The p.d.f. in their work is an empirical formula and the parameters were given by experience.

There are mainly two disadvantages in the existing work. First, empirical formula usually are simpler to write down and faster to compute, but it depends heavily on the experimental environment. Any change which is inevitable in experiment may cause a big error in the estimated results. That is, the simplicity of empirical formula makes up for its nonrobustness. Since the data measurement work in the Lamb wave-based damage detection is not time consuming, it is reasonable that the density function should be estimated by using robust statistic method. Second, the p.d.f. used in existing work is the distribution function about the location of damage in the plane , where , and are the damage location corresponding to (the actual ToF data) and (the experimental ToF data), respectively. It should be noticed that the damage location cannot be directly measured in experiment. Thus, estimating directly will be difficult. Based on the estimation of the function about the distribution of experiment data in time domain, estimating by using the mapping relationship defined in (2.3) should be a better method.

Therefore, probability density estimation methods will be introduced in Section 3. The advantages and feasibility of applying probability density estimation methods in ToF method will be demonstrated.

3. Probability Density Estimation

In statistic, density estimation is the method that estimates the parameters of a distribution based on the observed samples. Depending on whether a priori knowledge about the type of the distribution is required, the density estimation methods can be divided into two categories: parametric estimation and nonparametric estimation.

3.1. Parametric Estimation

Parametric estimation mainly includes point estimation and interval estimation. In statistics, point estimation is the use of sample data to calculate a single number of possible values of an unknown population parameter, in contrast to interval estimation, which is an interval. Most commonly used point estimation methods are method of moment estimation, maximum likelihood estimation, and Bayesian estimation. For instance, if it is known that the sample data come from a normal distribution, then the two parameters of normal distribution, expectation and variance, can be calculated by using (3.1) and (3.2), which is derived by using maximum-likelihood estimation method: where is the number of samples.

3.2. Nonparametric Estimation

Nonparametric estimation is a method that estimates the parameters of an unknown distribution while does not rely on assumptions about the type of this distribution. Commonly, nonparametric estimation methods include histogram, nonparametric regression, and kernel density estimation, which is the most popular one.

3.2.1. Kernel Density Estimation Based on the Gaussian Approximation

Kernel density estimation is a nonparametric method to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite-data sample. In some fields such as signal processing and econometrics, kernel density estimation was also termed as the Parzen-Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating this method in its current form [29, 30].

Let be an independent and identically distributed sample drawn from some distribution with an unknown density . Estimating the shape of this function is interested. Its kernel density estimator is where is the kernel, a symmetric but not necessary positive function that integrates to one; and is positive and a smoothing parameter called the bandwidth. A kernel with subscript is called as the scaled kernel and defined as . A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov, normal, and others. As with the kernel regression, the choice of kernel function is not crucial, but the choice of bandwidth is important.

The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate [31, 32]. The most common optimality criterion used to select this parameter is the expected risk function, also termed as the Mean Integrated Squared Error (MISE); Under weak assumptions on and [29, 30], , where is the little notation of the family of Bachmann-Landau notations. denotes the function family in which every function grows much slower that [33]. The AMISE is the asymptotic MISE which consists of the two leading terms where for a function , , and is the second derivative of . The minimum of this AMISE is the solution to this differential equation: or Neither the AMISE nor the can be used directly since they involve the unknown density function or its second derivative . Therefore, a variety of automatic, data-based methods have been developed for selecting the bandwidth.

If the kernel function is normal and it is assumed that the distribution being estimated is Gaussian, then it can be derived from (3.7) that optimal choice for is where is the standard deviation of the samples. This approximation is termed as the normal distribution approximation, Gaussian approximation, or Silverman’s rule of thumb [32].

3.2.2. Kernel Density Estimation via Diffusion

Kernel density estimation is an ongoing research topic in statistics. Botev et al. [34] proposed an adaptive kernel density estimation method based on the smoothing properties of linear diffusion processes. This novel approach method includes two parts: first, a simple and intuitive kernel estimator with substantially reduced asymptotic bias and mean square error, and better boundary bias performance; second, an improved plug-in bandwidth selection method that completely avoids the Gaussian approximation. The new plug-in method is thus genuinely “nonparametric,” since it does not require a preliminary normal model for the data.

(I) The Diffusion Estimator
Given independent realizations from an unknown continuous p.d.f. on , the Gaussian kernel density estimator is defined as where is a Gaussian p.d.f. (kernel) with location and scale . The scale is the bandwidth in kernel density estimation.
Chaudhuri and Marron [35] had found that there is a link between the Gaussian kernel density estimator and the well-known Fourier heat equation which is a diffusion partial differential equation (PDE). The link is the Gaussian kernel density estimator defined in (3.9) in fact is the unique solution to the Fourier heat equation: with and initial condition , where is the empirical density of the data and is the Dirac measure at . In the heat equation interpretation, the Gaussian kernel in (3.9) is the so-called Green’s function [36] for the diffusion PDE in (3.11). Thus, the Gaussian kernel density estimator can be obtained by evolving the solution of (3.11) up to .
Because any bounded domain can be mapped onto by a linear transformation, there is no loss of generality in assuming that the domain of the data is known as . Then, the analytical solution of PDE (3.11) with initial condition and the Neumann boundary condition in this case is where the kernel is given by The Neumann boundary condition is and the target of this boundary condition is to ensure that (3.12) satisfies the requirements of p.d.f., such as should be a nonnegative Lebesgue-integrable function and integrates to unity.
It has been proved that the estimator given in (3.12) arises as the solution of the diffusion PDE is better in boundary bias properties compared with the traditional estimator given in (3.9).
Therefore, motivated by the idea of acquiring the estimator from the solution of diffusion PDE, Botev proposed that the most general linear time-homogeneous diffusion PDE can be a starting point for the construction of a better kernel density estimator. The simple diffusion model described in (3.11) can be extended on the basis of the smoothing properties of the linear diffusion PDE: where the linear differential operator is of the form , and and can be any arbitrary positive function on with bounded second derivatives, and the initial condition is .
The solution of (3.15) can be the diffusion kernel estimator and written as There is no analytical expression for the diffusion kernel satisfying (3.16), can be written in terms of a generalized Fourier series in the case that is bounded: where and are the eigenfunctions and eigenvalues of the Sturm-Liouville problem on : where is of the form ; that is, is the adjoint operator of .

(II) Improved Plug-In Bandwidth Selection Method
The novel plug-in bandwidth selection method for the diffusion estimator defined in (3.16) proposed by Botev is based on the improved plug-in bandwidth selection method for the Gaussian kernel density estimator defined in (3.9).
Assuming that is a continuous square-integrable function, the asymptotically optimal value of for Gaussian kernel density estimator is the minimize of the first-order asymptotic approximation of MISE [37] It is clear from (3.19) that to compute the optimal , one needs to estimate the functional . Consider the problem of estimating for any arbitrary integer . The identity suggests two plug-in estimators: For a given bandwidth, both estimators and aim to estimate the same quantity . Therefore, can be selected to make both estimators asymptotically equivalent in the mean square error sense: Computation of by using (3.21) involves which is unknown. Thus, each is estimated by Computation of requires the estimation of , which in turn requires the estimation of , and so on, as seen from (3.20) and (3.22). There is the problem of estimating the infinite sequence . However, for some , if can be given, then all can be estimated recursively. Based on this idea, the -stage direct plug-in bandwidth selector [37] has been proposed.
Denote the functional dependence of and as It is then obvious that . For simplicity of notation, the composition can be defined as The estimate of satisfies Then, for a given integer , the -stage direct plug-in bandwidth selector consists of computing where is estimated by assuming that in is a normal density with mean and variance estimated from the data.
It is noticed that the assumption in the -stage direct plug-in bandwidth selector method can lead to arbitrarily bad estimates of , when, for example, the true is far from being Gaussian. Therefore, Botev proposed to find a solution to the nonlinear equation: for some , using either fixed point iteration or Newton’s method with initial guess . The fixed-point iteration version is formalized in the following Improved Sheather-Jones algorithm:(1)Given , initialize with , where is machine precision, and ;(2)Set ;(3)if , stop and set ; otherwise, set and repeat from step ;(4)Deliver the Gaussian kernel density estimator in (3.9) evaluated at as the final estimator of , and as the bandwidth for the optimal estimation of .It has been proved that the recommending setting for is 5.
The above section explains how to estimate the bandwidth of the Gaussian kernel density estimator. Now, the algorithm that estimates the bandwidth of the diffusion estimator will be introduced.
Assuming that is as many times continuously differentiable as needed, then it has been proved that the square of the asymptotically optimal bandwidth is Computation of in (3.28) requires an estimate of and . The latter one can be estimated via the unbiased estimator . The identity suggests two possible estimators. The first one is
The second one is Just like the way that is derived for the Gaussian kernel density estimator, is selected to make both estimators and have the same asymptotic mean square error: Note that has the same rate of convergence to 0 as . In fact, since the Gaussian kernel density estimator is a special case of the diffusion estimator when , the plug-in estimator equation (3.30) for the estimation of reduces to the plug-in estimator for the estimation of . In addition, the in (3.31) and are identical when . Thus, the bandwidth for the diffusion estimator given in (3.16) can be selected by using the following algorithm:(1)Given the data ,…,, run the Improved Sheather-Jones algorithm to obtain the Gaussian kernel density estimator defined in (3.9) evaluated at and the optimal bandwidth for the estimation of . This is the pilot estimation step.(2)Let be the Gaussian kernel estimator from above step, and let for some .(3)Estimate via the plug-in estimator given in (3.30) using (4)Substitute the estimate of into (3.28) to obtain an estimate for .(5)Deliver the diffusion estimator in (3.16) evaluated at as the final density estimate.
The flow chart of the entire bandwidth selection algorithm was shown in Figure 3.

4. Numerical Simulation

Feasibility of using the kernel density estimation method to estimate the p.d.f. of experiment results was demonstrated in a thin plate structure via finite-element (FE) simulation. Eight PZT wafers were surface installed at an aluminium plate. The aluminium plate was 600 mm × 600 mm × 1.5 mm in size, supported with all its four edges. The elastic modulus, poission’s ration, and density of the aluminium are 71e9GPa, 0.35, and 2711 Kg/m3, respectively. The thin plate was three dimensionally modeled using eight-node brick solid elements. To ensure simulation precision, the largest dimension of FE elements was less than 1 mm and the plate was divided into multilayer in thickness, guaranteeing that at least ten elements were allocated per wavelength of the incident diagnostic wave, which has been demonstrated sufficiently to portray the characteristics of elastic waves in the thin plate [19]. A through-thickness hole of 16 mm in diameter was assumed in the plate, 200 mm and 200 mm away from the left and low edges of the plate, respectively (Figure 4). The mode of Lamb waves was used to detect damage. Five-cycle Hanning window-modulated sinusoid tone bursts at a central frequency of 300 kHz were activated as the incident diagnostic wave signal. The speed of mode is 5159.5 m/s in this simulation.

Gaussian noise is statistical noise that has its probability density function equal to that of the normal distribution, which is also known as the Gaussian distribution. A special case is white Gaussian noise, in which the values at any pairs of times are statistically independent (and uncorrelated). It is well known that noise comes from many natural sources is Gaussian noise. Therefore, in order to simulate the environment noise, three signal-to-noise ration (SNR) levels (20 dB, 30 dB, and 40 dB) of white Gaussian noise were intentionally added into the numerical simulated Lamb waves signals.

In numerical model, four sensor pairs are used to locate the damage. The sensor pairs are s2-6 formed by sensor 2 and sensor 6; s4-8 by sensor 4 and 8; s3-7 by sensor 3 and 7; s3-5 by sensor 3 and 5. The process of adding three levels white Gaussian noise in the signals captured by the four sensor pairs repeated 30 times. That is, there are 30 ToF results for each sensor pair under each level of noise.

5. Results and Discussion

5.1. The Characteristics of Noise-Induced Error in ToF

It can be expected in theory that the nonparametric estimation methods should have a better performance than parametric estimation method when deal with the distribution without a priori knowledge about its type. The advantage of kernel density estimation method will be demonstrated in this paper by estimating of s4-8. In statistic, the performance of density estimation methods is usually verified through comparing the estimation results with the bona fide p.d.f of some well-known datasets. That is, in order to show the accuracy of estimation results, one needs to know the real p.d.f. of the distribution to be estimated. It is difficult to give the analytical expression of about ToF measured by threshold method. However, partial understanding about the characteristics of noise-induced error in ToF still can be obtained by analyzing the process of threshold method. That will be helpful to prove the advantage of nonparametric estimation methods in ToF method.

ToF is given by comparing the arriving time of incident waves and damage-scattered waves. Since the incident waves is strong, the errors in arriving time of incident waves can be neglected. Without loss of generality, the errors in ToF was considered to be caused entirely by the errors in the arriving time of damage-scattered waves.

As mentioned in Section 2.1, the existence of wave packet is determined by whether the amplitude of signal is bigger than the threshold value. Once a wave packet is detected, the arriving time of entire wave packet is given by the time of recorded peaks. The process of threshold method suggests there are two kinds of noise-induced errors in ToF: where denotes the variance in the arriving time of single peak, denotes the error caused by misidentification of peaks. While is easy to understand, is relatively complex. The signal received by s4-8 which shown in Figure 5 is taken as example to explain the existence of . Noise not only can change the time of peaks, but also can change the relative magnitude relationship of peaks. That means the sequence of peaks on its magnitude may be changed by noise. If there were no noise and the arriving time was measured by recording the strongest peak, then the second peak of the damage-scattered waves shown in Figure 5 should be recorded. However, the strongest peak may change to other peaks, such as the third or the fourth peak, in noise-contaminated signals. The same problem exists in the method of recording several peaks. For example, if there is no noise and the arriving time is measured as the average of four peaks. Then, the first four peaks (the second, the third, the fourth, and the fifth in this case) should be recorded. However, the first peak in noise-contaminated signals is likely to become stronger than the fifth peak. That leads to the error in ToF.

It is obvious that is larger than , but it appears only in strong noise environment.

5.2. Density Estimation Results

Parametric estimation method, the kernel density estimation based on the Gaussian approximation, and the adaptive kernel density estimation via diffusion were used to estimate . The sample data is ToF measured by s4-8 with three levels noise.

The estimation results for the signal with 40 dB SNR noise was shown in Figure 6. The symbol “+” in Figure 6 and the following Figures 7, 8, and 9 were used to give an intuitive understanding about the distribution of samples. Each “+” represented a sample. It could be seen that samples were distributed around the two values. Most of the samples (26 samples of total 30 samples) were distributed in the range from second to second. 4 samples were distributed in the range from second to second. The p.d.f. given by the kernel density estimation based on the Gaussian approximation and the adaptive kernel density estimation via diffusion was the functions with two peaks. The p.d.f. given by parametric estimation method was undeniably a normal density function. Based on the conclusion drawn in the above section about the characteristics of noise-induced errors in ToF, the distribution of samples could be easily understood. Because the noise was weak in this case, most of the samples, which were only affected by , were distributed around the analytic value of ToF ( second). The other 4 samples which were relatively far from the analytic value were affected by both and . Therefore, it could be learnt that two kinds of kernel density estimation make correct estimating about the p.d.f. of . Because the assumption about the type of distribution to be estimated was incorrect, parametric estimation method was very misleading in this case.

The fact that only 4 samples were affected by both and in this case could be utilized to learn the characteristic of . Since these samples could be easily distinguished from the samples which were only affected by , these samples could be excluded from the data set. Then, the density function was estimated with the refined dataset. The results were shown in Figure 7. It could be seen that the results of two kinds kernel density estimation methods were similar to normal distribution.

Lilliefors test was adopted to check whether the refined samples came from a normal distribution. In statistics, the Lilliefors test, named after Hubert Lilliefors, was an adaptation of the Kolmogorov-Smirnov test [38]. It was used to test the null hypothesis that data came from a normally distributed population, when the null hypothesis did not specify which normal distribution; that is, it did not specify the expected value and variance of the distribution.

The calculated value from the Lilliefors test was 0.1373, which was less than the critical value 0.1699 corresponding to 5% significance level. The null hypothesis that the refined data came from a normally distributed population was accepted. It explained why the empirical formula given in the previous work was a normal distribution type and why the damage detection results based on the empirical formula was satisfied. Since the noise in previous work [12] was weak and the data was only affected by , its distribution was actually normal distribution.

The estimation results for the signals with 30 dB SNR noise were shown in Figure 8. It could be seen that as in the case of 20 dB SNR noise, parametric estimation method failed to give correct estimation.

The estimation results for the signals with 20 dB SNR noise were shown in Figure 9. It could be seen that, with the increase of noise level, the kernel density estimation based on the Gaussian approximation, which was traditional kernel density estimation, failed to give correct estimation. Only the novel and completely data-driven method, the kernel density estimation via diffusion-, could give correct estimation.

5.3. Damage Detection Results

The damage localization under 20 dB noise environment was selected as the example to show that an accurate estimation was important for the localization result. The p.d.f. estimation results given by three kinds of density estimation methods introduced in Section 2 were used to calculate the location of damage. The results were shown in Figures 10, 11, and 12. It could be seen that the locating process which employed the kernel density estimation via diffusion has the most accurate localization result. This indicated that the an accurate estimation could ensure an better localization result.

6. Conclusion

The characteristics of noise-induced error in ToF data measured by using threshold method were analyzed.

The empirical formula method and the parametric estimation method presented in existing work had the same assumption that the experimental data came from a normal distribution. This assumption had been verified by real experiments and numerical simulation. The results in this paper revealed that the type of distribution of ToF data was related to the noise level. The empirical formula method and the parametric estimation method were developed in laboratory environment where the noise was weak. It had also been proved in this paper that the ToF data measured from high SNR signal (SNR > 40 dB) were distributed normally. Therefore, the density estimation method with the normality assumption presented in existing work can work well in laboratory environment.

However, the signals of field experiment usually contained much more strong noise. The results in this paper showed that even for the signal with 40 dB SNR, the distribution of measured ToF data were not normal distribution. In this case, nonparametric estimation method must be emplyed to estimate the p.d.f. correctly. Further, investigating about the signals with 30 dB and 20 dB noise showed that, with the increasing noise, only the kernel density estimation via diffusion, which is purely data driven, can give a satisfied estimating result.

The damage localization under 20 dB noise environment had been carried out. Parametric estimation method with the normality assumption, the kernel density estimation based on the Gaussian approximation and the kernel density estimation via diffusion were adopted to estimate the p.d.f. of measured data. Three different p.d.f. were obtained by employing the above-motioned three kinds of density estimation methods. By using each p.d.f, a damage location result can be calculated. Through comparing the three results of damage location, it can be seen that an accurate estimation of p.d.f. has a direct effect on the accuracy of the results. Applying kernel density estimation in Lamb wave-based damage detection was necessary.

The noise studied in this paper was the white Gaussian noise. The noise in the real field experiment was much more complex. Further study was needed to reveal the characteristic of errors in ToF data caused by noise in field experiment. However, the complex nature of noise in field experiment could not be a trouble for the application of kernel density estimation method, instead, it could be a reason to apply this method. It had been proved that when deal with simple noise, the kernel density estimation method introduced in this paper performed better, in comparison with empirical methods. Since the kernel density estimation method did not rely on any assumption about the distribution to be estimated, it could be expected that the kernel density estimation method could demonstrate a greater advantage in a complex noise environment.


This work was financially supported by National Natural Science Foundation of China under grant no. 50905141, the Program for New Century Excellent Talents in University of China, and the NPU Foundation for Fundamental Research under grant no. NPU-FFR-JC20110258.