#### Abstract

In order to avoid the false alarm and alarm failure caused by sensor malfunction or failure, it has been critical to diagnose the fault and analyze the failure of the sensor measuring system in major infrastructures. Based on the real time monitoring of bridges and the study on the correlation probability distribution between multisensors adopted in the fault diagnosis system, a clustering algorithm based on -medoid is proposed, by dividing sensors of the same type into clusters. Meanwhile, the value of is optimized by a specially designed evaluation function. Along with the further study of the correlation of sensors within the same cluster, this paper presents the definition and corresponding calculation algorithm of the sensor’s validation. The algorithm is applied to the analysis of the sensor data from an actual health monitoring system. The result reveals that the algorithm can not only accurately measure the failure degree and orientate the malfunction in time domain but also quantitatively evaluate the performance of sensors and eliminate error of diagnosis caused by the failure of the reference sensor.

#### 1. Introduction

In recent years, with a series of catastrophic accidents of infrastructure damage and collapse, the research of infrastructure structural health monitoring and early warning technology for the public safety has been promoted [1–3]. The structural complexity of infrastructure and the diversity of influential factors lead to typical application of multisensor systems in these monitoring systems. As such large permanent infrastructures, like bridge, dam, and nuclear power plant, are designed with a service life of decades, a severe challenge to the sensor system with service life of only a few years is posed. As the structure of monitoring systems utilizes the sensor data for evaluation of structural health, it is important that the data acquired are accurate and reliable. A faulty sensor cannot perform its function properly but instead may provide false information for evaluation, thus leading the system to produce the wrong diagnosis. This kind of false-negatives not only severely interferes with the normal work of the monitoring system and increases the system maintenance cost but also can lead to the ignorance of the real fault information and eventually cause leakage alarm of sudden disaster accident [4, 5]. So, accurately identifying failure data and sensor fault is of important significance.

Some researches on fault diagnosis of sensor have been published. There are two main approaches: analytical redundancy and hardware redundancy. The analytical redundancy approach utilizes a mathematical model of the sensor output and the measured quantity, for example, a finite element model, and the redundancy is provided by the model [6–8]. On the premise that the analytical model has been created, the analytical redundancy approach can identify sensor fault ideally. However, since most of the structure health monitoring system is complex and the factors that affect the sensor output are various, it is hard for most of the multisensor systems to create the analytical models. The hardware redundancy approach adopts the fact that several sensors measure the same quantity. By analyzing the diversity of the redundant information, the sensor fault is identified [9]. This approach has been widely employed in the critical equipment such as aircraft [10]. Due to the limitation of budget and construction, the hardware redundancy approach is almost infeasible for infrastructures such as bridges and dams.

Owing to the characteristic of the bridge structure, there are underlying nonlinear correlations between the data acquired from the different monitoring points. Thus, the data can provide “correlation redundancy.” How to identify the sensor fault by recognizing and utilizing the correlation between the data acquired by the different sensors becomes a valuable research field [11, 12]. The research on sensor fault diagnosis and validity judgment algorithm based on -neighbor algorithm, which effectively improved the robustness of fault diagnosis algorithm [13], made a preliminary exploration in this field. This algorithm, however, still needs to first specify the target sensor to be diagnosed; if the neighboring sensor as the reference has failure or fault, it will decrease the accuracy of the diagnosis. For the structure health monitoring system including hundreds of sensors, it is necessary to explore a universal approach for sensor fault diagnosis.

In this paper, based on the study on the correlation between the sensors in the bridge health monitoring system, a clustering algorithm based on -medoids is proposed (Figure 2), which divides the sensors of the same type into clusters. The sensors in the same cluster are neighbors to each other. Meanwhile, the evaluation function is defined to determine the optimal value, minimizing the correlation between the divided clusters and maximizing the correlation between the sensors in one cluster. By studying the correlation between the sensors in one cluster, the concept and corresponding algorithm of “support rating” and “sensor validity” are proposed. The algorithm has been applied to the data analysis of sensors in actual bridge health monitoring system. The results show that the abnormal sensor can be found in multiple sensors and the fault can be located in time domain. In addition, the performance of the sensor can be evaluated quantitatively, and its change curve and performance degradation process can be described. In the case that the reference sensor has failure or fault, the failure components can be eliminated, effectively ensuring the robustness of algorithm.

#### 2. Correlation Characteristics between Sensors

##### 2.1. Correlation between Sensors

The bridge, with a complex structure system, reacts differently to external forces. However, due to the structural relationship of the bridge itself, a certain degree of correlation between the structural parameters of different monitoring points is a must. For the monitoring points which are more closely correlated, the data collected by the sensors show the tendency of more consistency.

Suppose that there are a total of monitoring points in a bridge and that is the set of all the corresponding sensors. ; is the sampling time interval; and are -dimensional vectors of monitoring point and , which are composed of data collected in same time interval . Higher similarity between vector and indicates stronger correlation between monitor point and .

For two -dimensional vectors and , their Euclidean distance is [14]The traditional Euclidean distance is always positive. It cannot represent the comparison of two vectors. As shown in Figure 1(a), , and on the contrary, in Figure 1(b).

**(a)**

**(b)**The Euclidean distance between vectors and in Figures 1(a) and 1(b) calculated by (1) is the same, but they cannot reveal the comparison of and . Thus, the sign is added to (1):Suppose that is the set of all sensors of a bridge, in time window , each sensor collects data; in time windows, each sensor collects data. For any sensor , in th time window, the Euclidean distance between the collected data and the data collected by isHigher correlation between monitoring point and monitoring point means higher similarity between data vector and data vector , measured by the corresponding sensors, respectively. In different time windows, the fluctuation of will be smaller. In all time windows, the fluctuation of is measured by its sample variance, [15]:Let represent the differentiation degree between data vector and data vector measured by sensor and sensor ; the smaller is, the smaller fluctuation of distance between and is. The consistency of the change tendency between and will be higher, which indicates a higher correlation between monitoring point and monitoring point .

##### 2.2. -Medoid Clustering Algorithm

Suppose that is the set of all sensors for a bridge, because of the correlation between the monitoring points, there must be some sensor clusters as well as close correlations between the data acquired by the sensors in the same cluster. sensor clusters existing in can be found by the strategy of clustering analysis.

First, the number of clusters “” is determined. sensors are chosen randomly as the central point of each cluster. And the remaining sensors are assigned to the cluster represented by a central point with the highest degree of correlation (i.e., the sample variance of the data vector distance is the smallest) according to their correlation degree with each central point. And then the noncentral point sensor is adopted to replace the existing central point repeatedly to improve the clustering quality. The clustering quality is measured by cost function COST: represents the sum of all differentiations between the sensors in th cluster “.” The smaller is, the closer the correlation between the sensors in this cluster is. The smaller the value of COST is, the higher the quality of the current clustering result is.

In order to determine whether noncentral point “” is a good alternative to current central point “,” the following four cases are considered for each noncentral point :(1)currently belongs to cluster represented by central point “.” If is replaced by as the new central point and the degree of correlation between and is the highest (), is reallocated to the cluster represented by .(2) currently belongs to the cluster represented by central point . If is replaced by as the new central point and the degree of correlation between and is the highest, is reallocated to the cluster represented by .(3) currently belongs to the cluster represented by the central point of (). If is replaced by as the new central point and the degree of correlation between and is the highest, the affiliation of is unchanged.(4) currently belongs to the cluster represented by central point (). If is replaced by as the new central point and the correlation between and is the highest, is reallocated to the cluster represented by .The cost function will change whenever the reallocation takes place. If current central point is replaced by noncentral point and generated new COST (new) is larger than current COST (current), it is considered that current central point is acceptable. Then, nothing changed in this iteration. If new COST (new) is smaller than current COST (current), the new central point is better than , so is replaced.

Therefore, the clustering algorithm of sensor clusters is obtained.

##### 2.3. Optimization of Value

For -medoid clustering algorithm, the key problem is the determination of value. That is the optimal number of clusters that all the sensors are divided into. Specifically, the correlation between the clusters will be minimal, and the correlation between the sensors in the clusters will be maximal.

Let represent the difference between clusters and :where and are, respectively, the central point of clusters and :where is the number of pairwise combinations of clusters. DIF is the average of the difference between each pair of clusters. A higher DIF indicates more difference among clusters of the current division.

To any cluster ,where is the number of pairwise combinations of the sensors in cluster . is the average of the difference between each pair of sensors in cluster . A lower indicates more correlation among sensors in a certain cluster.

Let where is the average of of clusters of the current division. A lower indicates more correlation in each cluster of the current division.

Therefore, the evaluation function is defined as follows:The effect of the current division is improved as the value of EVA rises. Therefore, we can firstly select different values for clustering by rule of thumb, calculate each EVA value, and finally select with the highest EVA value as the final number of clusters.

#### 3. Support Rating and Validity

Through using -medoid clustering algorithm, the set of all sensors can be divided into sensor clusters. Each sensor belongs to one cluster, and high degree of correlation can be found between the sensors in the same cluster. That is to say, the sensors in the same cluster are mutually the neighboring sensors.

The support rating refers to the reliability of the data measured by the target sensor in some time window. it is supported by the data measured by some neighboring sensor. The validity of the data measured by the target sensor is supported by all other neighboring sensors in the same cluster. To ensure the validity of data from a target sensor, the weighted sum of support ratings must be taken into consideration.

In different time windows, the data vector distance between two neighboring monitoring points is varying. However, in the case that the damage to the bridge structure is not severe, the data vector distance between two monitoring points with close correlation will be fluctuated within the reasonable range. If so, it can be considered that the sample vector distances of two monitoring points in different time windows will fluctuate randomly around the expectation value.

*Hypothesis 1 (this hypothesis is verified by using K. Pearson method in Section 5). *The data vector distance of any two closely correlated monitoring points and in different time windows obey normal distribution; that is, , where is the value of expectation of and is its variance. When sample size is large enough, and , where is the sample mean of the data vector distances in different time windows and is the sample variance of data vector distances in different time windows.

According to normal distribution, the sample value is more likely to occur as it is closer to the mathematical expectation of the mean value. Therefore, it can be considered that the closer ( is the time window number) is to , the higher the probability of reliability of the data of monitoring point measured in this time window supported by the neighboring monitoring point is. That is, the validity of sensor supported by its neighbor monitoring point . According to the distribution probability of normal distribution, the probability of is about 0.3% [16]. Generally, the sample point with the error of less than is acceptable, compared to the expected mean value. That is, in case of , it can be considered that the validity of sensor is supported by its neighboring monitoring point in the time window .

In case of , in th time window, the validity of sensor supported by its neighboring monitoring point , that is, the support rating of for in the time window isFormula (10) meets the following requirements: In this equation, is the confidence threshold of ; in case of , it can be considered that the validity of sensor is supported by its neighboring monitoring point in th time window; otherwise, the validity of is not supported by .

is the monotonically decreasing function of ; when is closer to , is closer to 1. In case of , , and vice versa, is closer to 0.

In order to allow (11) to meet the requirements of (12), letwhere .

Figure 3 shows the characteristic curve of .

In th time window, , the validity of sensor is the weighted sum of support ratings of all neighboring monitoring points in the same cluster:where weight will meet the following requirements:(1).(2)For the monitoring point having closer correlation with monitoring point , its weight is larger. That is, the smaller the value of is, the larger is.Thus, is defined as follows:

#### 4. Fault Diagnosis Method

Based on the definition of support rating and validity, the algorithm of sensor fault diagnosis and validity measurement is presented:(1)The data are selected as the samples when the system is in good working condition and the sensor performance has no recession; the reasonable time window length is set, through using -medoid clustering algorithm, the set of all sensors is divided into sensor clusters.(2)For each sensor, on the basis of taking other sensors in the same cluster as neighboring sensors, sample mean and sample variance of the data vector distance between it and each nearest neighboring monitoring point are obtained.(3)For the diagnostic data, the support rating of each neighboring monitoring point within each time window for target sensor () is calculated with (13). The validity of is calculated in accordance with (14). In the case of larger sample size, , ; so the mean value of the sample is used to replace the mathematical expectation of mean value; the sample variance is used to replace the variance for calculation.

When , it is considered that sensor has serious fault, so it needs examination, repair, and replacement. At this time, the support rating of for any neighboring sensor in the same cluster is . As is only one of components of the validity of , is still bigger than . In order to eliminate the influence of faulty sensor on the validity evaluation of the neighboring sensor, the validity of other neighboring sensors in the same cluster can be calculated after the faulty sensor is removed.

#### 5. Verification

Located at the central zone of main urban area of Chongqing, Caiyuanba Yangtze River Bridge is a key bridge to connect between Yuzhong district in the north and Nanan district in the south (Figure 4). It is the half-through thrust-free steel box girder tied-arch bridge. The main arch is nearly 100 meters in height with the main bridge span of 420 meters. The main girder is a two-story truss structure: the upper concrete bridge deck is a highway bridge deck with daily traffic flow of 60 thousand vehicles; the lower part is an urban light rail train passage with single-line weight of 400 tons. The span and height of the main arch are among the best in the world. The combined structure of steel box girder in the upper part and concrete structure in the lower part, the combined use of highway transportation and city light rail, the ultralong rail joints, and the whole assembly process are original designs around the globe.

The Caiyuanba Bridge health monitoring system includes five classes, eight kinds and seven subsystems of sensors concerning strain, cable force, vibration, deflection, expansion joint displacement, and three-dimensional deformation and temperature. They also monitor the strain, displacement, vibration, cable force, temperature and other structural status parameters, and environmental parameters of some key parts such as main girder, main arch, structure, pier, sling, and tie rod. Health monitoring system analyzes the monitoring data and implements the safety assessment and early warning of the bridge damage.

The deformation of main girder deflection mainly directly reflects the mobile load condition of the bridge as well as the linear change of longitudinal girder under mobile load condition, which is one of the most important parameters of bridge structure. The Caiyuanba Bridge health monitoring system adopts the connected-pipe photoelectric deflection sensor system to capture the deflection change of main girder, setting 15 deflection monitoring points, respectively, on the downstream and upstream sides of the steel truss girder (Figure 5). Therefore, there are 30 photoelectric deflection sensors in total. Thus, the deflection data are acquired once per hour.

Strain is an important parameter reflecting the condition of the bridge. When considering the layout of measuring points, the focus is on the stress and fatigue condition of truss girder, and the main arch as main bearing components as well as the bridge pier and structure. A total of 20 strain monitoring cross sections are set for the total bridge, each of which has 4 measuring points (Figure 6); in total, there are 80 strain sensors. As for deflection and strain acquisition, the data are acquired once per hour on a regular basis.

##### 5.1. Verification of Clustering Algorithm

The data collected at the sampling time interval between September 1, 2013, and April 30, 2014, is taken as the sample; in addition, the data are collected at each monitoring point. The time window length is 24 hours; the number of samples at each monitoring point is ; vector distance between the acquisition data vectors of any sensor in each time window and their sample variance are, respectively, calculated. -medoid clustering algorithm is used to cluster 30 deflection sensors. Generally, if the number of the sensors is , then , the number of the clusters, should be at interval . Furthermore, according to the analysis of the bridge structure, set , and the sensors are clustered, respectively, by using -medoids clustering algorithm; moreover, the evaluation function EVAs are calculated, respectively. The results are also listed in Table 1.

Therefore, in case of , the clustering result is optimal. The result is shown in Table 2.

Similarly, the data collected in the sampling time interval between September 1, 2013, and April 30, 2014, is taken as the sample. -medoid clustering algorithm is used to cluster 80 strain sensors. According to the analysis of the bridge structure, when , and 22, the sensors are clustered (Table 3), respectively; moreover, the evaluation function EVAs are calculated. The result shows that when , the clustering result is optimal, which is just equal to the number of strain monitoring cross sections. Furthermore, four sensors for each cross section are in the same cluster. It can be seen that such clustering algorithm can better reflect the structural correlation between sensors.

##### 5.2. Verification of Hypothesis 1

According to the idea of -medoid clustering algorithm, there is close correlation between the sensors in the same cluster. The test method of K. Pearson is used to test Hypothesis 1 [15].

For example, sensors 06A and 10A are neighboring sensors in the same cluster. The sample mean of the data vector distances of the 242 time windows from September 1, 2013, to April 30, 2014, is . Meanwhile, the sample variance is . Thus, the hypothesis is : , and its verification is shown as follows.

According to the probability density, , , the probability of at different time intervals and the diversity are calculated.

In Table 4, is the frequency of at different intervals, is the total number of time windows, and . Because the number of intervals is , and the number of parameters estimated by the sample data is 2, the free degree of is . Under the condition with a significant level of 0.05 and a free degree of 9, , . Consequently, the hypothesis is acceptable.

In total, 44 pairs of sensor combinations in 8 clusters are tested. A total of 43 tests are conducted under the condition with a significant level of 0.05. It can be considered that Hypothesis 1 proves right.

##### 5.3. Verification of Fault Diagnosis Algorithm

After the division of sensor clusters is determined, the support degree between the neighboring sensors in the cluster can be calculated according to (13), and the validity of each sensor is calculated according to (14).

Taking the deflection sensor cluster 1 as an example, the data collected for 1440 times in 60 time windows in the interval from June 1, 2014, to July 30, 2014, are used in the test, which is shown in Figure 7.

According to (12) and Figure 3, if the value of confidence threshold is too high, the curve of will be too flat when and too steep when . On the contrary, if is too low, the curve of will be too steep when and too flat when . Generally speaking, is set in the interval .

In this test, the confidence threshold value is set: . On this basis, the support ratings between various sensors in the cluster are calculated, and then the validity of each sensor is obtained too.

As shown in Figure 8, the validity of each sensor is larger than 0.9; the sensors can be considered to be in good working condition.

The data collected by strain sensors in cluster 3 from June 1, 2014, to July 30, 2014, is shown in Figure 9.

According to (13) and (14), the support ratings between various sensors in the cluster are calculated, and then the validity of each sensor is obtained.

It can be seen from Figure 10 that the validity of strain sensor N11_B began to have a significant decline in late July, even down to in mid-June. It can be caused by the failure of strain sensor N11_B or the damage of the bridge structure. After the on-site inspection, the sensor’s failure was confirmed.

At this time, as the data measured by sensor N11_B become unbelievable, the degree of support for other sensors in strain cluster 3 is also no longer reliable. Affected by it, validities of N11_A, N11_C, and N11_D decline to some degree. Therefore, before the validities of N11_A, N11_C, and N11_D are calculated, the support rating component of N11_B for its target sensor will be removed in (14). The obtained results are shown in Figure 11. After the support rating component of the failed sensor is eliminated, the validity of other sensors in the cluster resumes to be more than 0.9 again.

#### 6. Conclusion

Based on the correlation of the data collected by sensors, the sensors in the bridge monitoring system can be divided into multiple sensor clusters through -medoid clustering algorithm. The data collected by sensors in the same cluster are closely correlated. With the evaluation function, the number of clusters can be evaluated and optimized to ensure the optimal division (most correlated inside the cluster; most sparse between the clusters).

According to the results of experiments on various bridge sensors in the actual bridge health monitoring system, the validity of target sensor can be obtained by combining the support ratings of neighboring sensors for the target sensor. Therefore, we can measure the performance of sensor and describe its change track and performance of decline process as well as locate the faults of sensors in the time domain. No need to specify the target sensor to be diagnosed in advance helps effectively avoid the diagnostic error caused by the failure of the reference sensor and the robustness of the algorithm. When the validity of sensor is calculated, the accuracy and robustness in quantitative evaluation on sensor performance are further improved owing to eliminating the support component of failure sensor.

#### Competing Interests

The authors declare that they have no competing interests.