The Scientific World Journal

Volume 2014 (2014), Article ID 727359, 7 pages

http://dx.doi.org/10.1155/2014/727359

## Fault Detection of Aircraft System with Random Forest Algorithm and Similarity Measure

^{1}Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China^{2}Aviation Technical Division, Aviation Safety Technology Center, 557 Yongyu-ro, Jung-gu, Incheon 400-420, Republic of Korea^{3}Department of Aerospace, Automobile & Mechanical Engineering, Chung Cheong University, 38 Wolgok-Gil, Gangnae-Myeon, Cheongwon-Gun, Chungcheongbuk-Do 363-792, Republic of Korea

Received 4 April 2014; Accepted 10 June 2014; Published 26 June 2014

Academic Editor: T. O. Ting

Copyright © 2014 Sanghyuk Lee et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Research on fault detection algorithm was developed with the similarity measure and random forest algorithm. The organized algorithm was applied to unmanned aircraft vehicle (UAV) that was readied by us. Similarity measure was designed by the help of distance information, and its usefulness was also verified by proof. Fault decision was carried out by calculation of weighted similarity measure. Twelve available coefficients among healthy and faulty status data group were used to determine the decision. Similarity measure weighting was done and obtained through random forest algorithm (RFA); RF provides data priority. In order to get a fast response of decision, a limited number of coefficients was also considered. Relation of detection rate and amount of feature data were analyzed and illustrated. By repeated trial of similarity calculation, useful data amount was obtained.

#### 1. Introduction

Fault-tolerance and adaptation of aircraft system with actual faults/healthy data have been studied. In order to process the adaptation of the pilot or the flight control system under abnormal condition, critical mission or return to a safe region should be followed with clear and right decision [1]. First of all, data for fault decision should be available; data acquisition system is required to take data accurately and quickly for the guarantee of precise fault decision. A lightweight fault detection model of DURUMI-II, which is an unmanned aerial vehicle (UAV), was considered and fault decision process was carried out with random forests algorithm (RFA) [2]. RFA is a state-of-the-art classification algorithm and has shown high classification accuracy. Additionally, RFA generates priority for each feature. The proposed approach enables one to figure out stable, important features with high detection rates. As a result, parameter optimization and feature selection were performed to make guaranteed high detection rates.

In order to get fault detection result, discrimination measure has to be considered. Similarity measure [3–6] represents the degree of similarity between comparable data; it has also been done by numerous works [1, 2, 7–9]. Similarity measures between two vague datasets mean that it roughly depends on inverse value of distance. Therefore, similarity can be considered as common information between two data distributions; hence, the obtained similarity measure was based upon the distance measure. Hence, the computation of similarity between two fuzzy sets could be followed with the obtained similarity measure.

12 cases operating data are used to design discriminating measure for healthy/faults condition. Hence, it is represented as multivariate dataset; it is also understood as multidimensional data. Each component has different importance to determine either faulty or healthy. To get more accurate decision, it is needed to consider weighting factor to each 12 coefficients. Depending on expert’s opinion could be a strong candidate to solve the problem. It seems, like the heuristic approach, results depend on individual. It means that results cannot guarantee the consistency. RFA provides important values with respect to each feature. It means that more rationale can be obtained through RFA.

In the previous research [1, 10, 11], fault detection and isolation (FDI) operation were accompanied by the fault-tolerant control system to control process failure. With the assumption of the controllability and the trimmability of the UAV at postfailure conditions, it means that the aircraft can keep on flying with the help of 6 flight control computer even the control surface stuck happen. It makes restructure and reconfigure controllers according to the grade of system failure. To get a decision or analysis for faults, statistical information of elevator deflection, aileron deflection, and others has been analyzed.

By applying RFA to multivariate similarity measure, different weighting factors were considered. Detection results showed good performance with actual data. Specially, when data was placed within overlapping region, RFA showed satisfactory performance compared to unitary weighting case [7, 8]. In the next chapter, RFA was introduced. Normal operation and longitudinal faulty operation were performed, and operation data were obtained. Considered airplane model and state equation were also illustrated in Chapter 3. RFA procedure was introduced in Chapter 4, and the importance value was obtained through RFA. Computation result was compared with similarity measure calculation. Similarity was also weighted with importance variable. Finally, conclusions are derived.

#### 2. Random Forests Algorithm

A lot of interest in ensemble learning algorithm generated many results about classifier and regression by way of boosting and bagging [2, 7, 8, 12–14]. Random forests algorithm was proposed by Breiman [2], the method is categorized under “ensemble learning” method, and it adds an additional layer of randomness to bagging. Applying ensembles of trees can achieve important gains in classification and regression accuracy and each tree in the ensemble is developed according to the random parameter. Applying an injection of randomness, each of these trees is generated. Dietterich proposed random split selection approach, where the split was selected randomly from the best splits at each node [8].

The common element in all RFA procedures is multiple trees. Random vector is generated from past random vectors , each has the same distribution independently. By the derivation of Breiman, random vector is generated as the counts in boxes. It is resulting from darts thrown at random at the boxes, where is the number of examples in the training set. Then, consists of a number of independent random integers between 1 and . A tree is also grown with the training set and and resulting in a classifier where is an input vector [2]. With the definition of Breiman, RFA is defined in Definition 1.

*Definition 1 (see [2]). *A random forest is a classifier consisting of a collection of tree-structured classifiers where the are independent identically distributed random vectors and each tree casts a unit vote for the most popular class at input .

With the property of Definition 1, two missions are needed to design RFA, one is a tree structure and the other is the number of trees. Tree structure has no rule for the design; it depends on computational requirement and designer himself/herself. The number of trees should be decided through heuristic consideration such as by using trial and error. Well-known tree algorithm is considered in reference [12].

Total structure of RFA is shown in Figure 1. Trees are repeated times. As in reference [12], RFA has several advantages and disadvantages. It is one of the effective learning algorithms, because it is convenient to process large database, and possible to handle many input variables. However, it often shows overfit for some datasets, and not easy to interpret, and others. Even some uncomfortable RFA results provide useful information such as variable importance, because decision result could be obtained through probability.

Majoring could be obtained by averaging each class result. By overlapping the results of each tree decision, it also provides ensemble mean of each decision result. Effect of deleting uncertainty was done by averaging node outputs. Furthermore, if feature vectors have wide characteristics, then it is obviously guarantee ergodicity.

#### 3. Model for Similarity Measure

##### 3.1. Airplane Model

Consider the aircraft system of combining with longitudinal mode and lateral-directional mode. Then, the state space equation is as follows [1, 10, 11, 15–17]: where and denote state vector and elevator control input variable, respectively. Output vector is identified by itself [11, 15–17]. Consider where is elevator control input. In the state vector, , , and are the angular velocities and and are the angle of attack and the sideslip angle. Finally, and represent the roll and pitch angle, respectively.

In Figure 2, an experimental model consisted of one elevator, one rudder, and two ailerons. In order to get two types of data, normal and fault, two times of data acquisitions were carried out. It is notified that one or two of elevator, aileron, and rudder were broken intentionally. It was made after taking normal operating data. In order to simulate our fault detection procedure, the experimental model also has been equipped with one-piece elevator. It was separated into two—one is normal and the other is faulty. Hence, it was difficult to know whether it was faulty or not. Therefore, control surface has been added at the other vertical stabilizer. Considered UAV is illustrated in Figure 2; fault was applied artificially. Two different flight tests were carried out for normal and faulty operations separately [1, 10, 11].

In order to carry out the experiment, intentional damages were applied to the right elevator, the left rudder, and the left aileron stuck and the combination of this surface stuck was considered. Without the uncontrollability and the untrimmability of the aircraft at postfailure conditions, the flight test was scheduled. For left rudder only and right elevator with the left rudder stuck cases were considered. Stuck angles of the first case (left rudder only) were considered from −10°, −5°, 0°, and +5°, to +10°. Same stuck angles of the second case (right elevator with the left rudder stuck) were also considered.

In the first flight test, the control input for the real-time parameter estimation was applied with the knob and switching method, and the flight data was obtained from the exciting dynamics of UAV operation with the mentioned method [1, 10, 14]. However, as it was pointed out in the result, result showed that the applied time interval was slightly inaccurate. So, in the second flight test, for the purpose of constant control realization and the time interval, the control input device was developed to use an RF modem and a R/C transmitter [1, 10, 11].

##### 3.2. Parameters of Longitudinal Mode

In order to control UAV during the occurrence of surface stuck and combination stuck, the aircraft should possess the controllability and the trimmability under the postfailure conditions. Because flight test procedure contains the ability to recover to the normal state from the fault state [1, 10, 11], Table 1 shows that the experimental results of UAV (DURUMI-II) show the trim value of available primary control surface at postfailure conditions. Blank was considered as the uncontrollability and the untrimmability cases.

Research on longitudinal mode fault detection was carried out in the previous research [10]. In order to obtain the failure status of the elevator, analyses of , , and were essential to obtain information on the aircraft performance characteristics. In the research, instead of statistical information such as mean and variance similarity measure was proposed to overcome the ambiguity of big standard deviation. Because it invokes scattering results, the analysis results of the stability and controllability derivatives are not clear.

#### 4. Numerical Results and Their Analysis

Fault detection algorithm was proposed in this chapter. Total of 89 data were considered, 38 normal and 51 fault. Twelve features were also included within dataset.

##### 4.1. Random Forest with Numerical Interpolation

Considering RFA, the number of variables in the random subset at each node and the number of trees in the forest are needed. In order to get the best classification rate (correction decision rate), determination of optimal number of two parameters is required.

Requirements to decide operation condition are considered as follows.

*Requirement 1*. Parameters optimization is conducted to guarantee high detection rates.

*Requirement 2*. Fault detection model is built using RFA with all features. The RFA generates variable importance values in numerical form.

*Requirement 3*. Rank whole features and eliminate the irrelevant features.

*Requirement 4*. Rebuild a fault detection model with only selected features.

*Requirement 5*. Evaluate the rebuilt detection model. If the detection rates and error rates satisfy requirements, terminate the computation. Otherwise, iterate with less number of features.

To evaluate the feasibility of our approach, longitudinal experiments on the flight test data of Table 3 are considered [18].

In Figure 3 sequence, the highest detection rate was satisfied when was used only with 2 features. For , there is no specific function that figures out the optimal value as . Thus, the optimal value of was considered by choosing the value as high and stable detection rates.

As results of experiments, two optimized parameter values were considered when and . With these two parameters, feature selection of the flight test data has been carried out by employing the feature selection algorithm supported by RFA.

RFA provides the variable importance of each feature; its results are illustrated in Table 2. The proposed approach shows reasonable context information by their important feature. Here, the meaning of , , and and other parameters are expressed in reference [10, 11]. By the results of Table 1, the pitching moment coefficient with changes in the elevator deflection shows bigger difference than the other parameters. However, to get more reliable data, detection ratio versus number of parameters was also carried out.

This approach shows that the feature selection should be important to decide decision performance because the number of features determines detection rate. With combinations of the highest importance variable, decision rate was obtained. Results were illustrated in Figures 4 and 5. By numerical conclusion, the highest decision rate, 97.75%, was obtained when the highest two features were used.

Interpolating the data with sixth order polynomial was obtained as Intersecting value and coefficients to are illustrated in Table 3.

By differentiating (3), three maxima satisfy 1.67, 6.00, and 11.22.

These values mean number of features. Hence, it is definite that two features selection guarantees highest detection rate as in Figure 4. Similarly, 9th order polynomial interpolation was obtained as follows: Next, three maxima are also obtained as 2.11, 6.11, and 10.11 (Table 4).

It shows the same result with 6th order polynomial interpolation.

By considering multidimensional scaling (MDS), it provides a method for discovering “hidden” structures in multidimensional data. MDS is designed by considering similarity measure and mapped on a lower dimensional spatial representation [12, 13]. With coefficients , , and , normal/fault patterns are implemented by multidimensional scaling (MDS) methodology in Figure 6 [7]. It is obtained with open source -project [12].

Above results provide two parameters, and , which are enough and most efficient to decide whether it is faulty or not. Now, normal and fault patterns are illustrated via MDS. Normal and fault patterns are shown via blue triangles and red circles in Figure 6. This indicates that the fault monitoring and flight control system organization can be feasible by visualization, without expert’s knowledge.

##### 4.2. Comparison with Similarity Measure Results

Similarity measure is designed through using the definition of Liu [19]. Following similarity measure will be used as the calculation of the degree of similarity between normal and fault operating conditions. Proposed similarity measure has strong point by the point of computation in comparison to the result of the literatures [3–6, 20–22]. They required at least comparisons and additions by the formations of

Theorem 2. *For any sets ,
**
is the similarity measure between set and set , where satisfies Hamming distance measure.*

*Proof. *Commutative property of (S1) is easy to prove; it is clear from (6) itself. To show (S2);
is obtained. Because and are complements, difference always satisfies one. (S3) is rather easy to prove:

From above statement, it is rational that satisfies maximal value. Finally, triangular equality is obvious by the definition of Liu [19]; hence (S4), is also satisfied.

By applying this similarity measure, calculation reduced comparisons and additions. Two parameter and membership functions are illustrated in Figures 7 and 8. Normal and fault distributions are also shown. With similarity measure (6), decision results are clearly discriminative. In Table 5, calculation results of (6) are emphasized by considering variable importance of Table 2 as weighting factor.

#### 5. Conclusions

Fault detection problem of aircraft system was carried out with RFA; it was applied to build a fault detection methodology for unmanned aircraft system, named URUMI-II. With obtained performance of RFA, results provide importance of each parameter or feature. The feasibility of fault detection algorithm with RFA was validated.

With experimental data on the flight test of DURUMI-II, fault decision algorithm showed the approach is able to detect faults with high detection rates (Figure 5). Additionally, the visualization of normal and fault patterns using MDS was able to easily figure it out with context information. Similarity measure weighting calculation with importance variable was applied for detection problem. Decision results were emphasized more than with only similarity measure.

By the calculation of RFA, meaningful result was provided that detection algorithm was effective even with a limited amount of operation data. Consequently, the flight supporting control system with fault detection algorithm is reconfigured. Then, the reliability increases without additional sensors such as a potentiometer on the control surface.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This research was supported by Grant no. (13-ASTRP-A02) from Aviation Safety Technology Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.

#### References

- W. J. Park, S. H. Lee, and J. I. Song, “Fault detection and isolation of DURUMI-II using similarity measure,”
*Journal of Mechanical Science and Technology*, vol. 23, no. 2, pp. 302–310, 2009. View at Publisher · View at Google Scholar · View at Scopus - L. Breiman, “Random forests,”
*Machine Learning*, vol. 45, no. 1, pp. 5–32, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus - S. Lee and T. O. Ting, “The evaluation of data uncertainty and entropy analysis for multiple events,” in
*Advances in Swarm Intelligence*, vol. 7332 of*Lecture Notes in Computer Science*, pp. 175–182, Springer, Berlin, Germany, 2012. View at Publisher · View at Google Scholar - S. Park, S. Lee, S. Lee, and T. O. Ting, “Design similarity measure and application to fault detection of lateral directional mode flight system,” in
*Advances in Swarm Intelligence*, vol. 7332 of*Lecture Notes in Computer Science*, pp. 183–191, Springer, New York, NY, USA, 2012. View at Google Scholar - S. Lee and T. O. Ting, “Uncertainty evaluation via fuzzy entropy for multiple facts,”
*International Journal of Electronic Commerce*, vol. 4, no. 2, pp. 345–354, 2013. View at Google Scholar - S. Lee, W. He, and T. O. Ting, “Study on similarity measure for overlapped and non-overlapped data,” in
*Proceedings of the International Conference on Information Science and Technology (ICIST '13)*, pp. 48–53, March 2013. - R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,”
*The Annals of Statistics*, vol. 26, no. 5, pp. 1651–1686, 1998. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - T. G. Dietterich, “Experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization,”
*Machine Learning*, vol. 40, no. 2, pp. 139–157, 2000. View at Publisher · View at Google Scholar · View at Scopus - F. W. Young and R. M. Hamer,
*Theory and Applications of Multidimensional Scaling*, Eribaum Associates, Hillsdale, NJ, USA, 1994. - W. J. Park, E. T. Kim, K. J. Seong, and Y. C. Kim, “A study on the parameter estimation of DURUMI-II for the fixed right elevator using flight test data,”
*Journal of Mechanical Science and Technology*, vol. 20, no. 8, pp. 1224–1231, 2006. View at Publisher · View at Google Scholar · View at Scopus - W. Park, E. Kim, Y. Song, and B. Ko, “A study on the real-time parameter estimation of DURUMI-II for control surface fault using flight test data,”
*International Journal of Control, Automation and Systems*, vol. 5, no. 4, pp. 410–418, 2007. View at Google Scholar · View at Scopus - Random forest, http://en.wikipedia.org/wiki/Random_forest.
- The R Project for Statistical Computing, http://www.r-project.org/.
- G. Biau, “Analysis of a random forests model,”
*Journal of Machine Learning Research*, vol. 13, pp. 1063–1095, 2012. View at Google Scholar · View at MathSciNet · View at Scopus - Y. D. Kim, “A Study on Fault Detection and Redundancy Management System,” SUDP-P1-G4, 2005.
- M. R. Napolitano, Y. Song, and B. Seanor, “On-line parameter estimation for restructurable flight control systems,”
*Aircraft Design*, vol. 4, no. 1, pp. 19–50, 2001. View at Publisher · View at Google Scholar · View at Scopus - R. C. Nelson,
*Flight Stability and Automatic Control*, McGraw-Hill, New York, NY, USA, 1998. - W. J. Park, S. M. Lee, S. K. Lee, and J. S. Park, “Lightweight fault detection of DURUMI-II using random forests,” in
*Proceedings of the International Conference on Materials and Reliability*, Busan, Republic of Korea, 2011. - X. C. Liu, “Entropy, distance measure and similarity measure of fuzzy sets and their relations,”
*Fuzzy Sets and Systems*, vol. 52, no. 3, pp. 305–318, 1992. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - S. H. Lee, W. J. Park, and D. Y. Jung, “Similarity measure design and similarity computation for discrete fuzzy data,”
*Journal of Central South University of Technology (English Edition)*, vol. 18, no. 5, pp. 1602–1608, 2011. View at Publisher · View at Google Scholar · View at Scopus - J. Fan and W. Xie, “Distance measure and induced fuzzy entropy,”
*Fuzzy Sets and Systems*, vol. 104, no. 2, pp. 305–314, 1999. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - S. Lee, Y. Kim, S. Cheon, and S. Kim, “Reliable data selection with fuzzy entropy,” in
*Proceedings of the 2nd International Confernce on Fuzzy Systems and Knowledge Discovery (FSKD '05)*, vol. 3613 of*Lecture Notes in Computer Science*, pp. 203–212, August 2005. View at Scopus