Mathematical Problems in Engineering

Volume 2014 (2014), Article ID 706178, 8 pages

http://dx.doi.org/10.1155/2014/706178

## Optimization ELM Based on Rough Set for Predicting the Label of Military Simulation Data

Science and Technology on Information Systems Engineering Laboratory, Nanjing 210007, China

Received 19 April 2014; Revised 26 July 2014; Accepted 18 August 2014; Published 25 September 2014

Academic Editor: Yi Jin

Copyright © 2014 Xiao-jian Ding and Ming Lei. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

By combining rough set theory with optimization extreme learning machine (OELM), a new hybrid machine learning technique is introduced for military simulation data classification in this study. First, multivariate discretization method is implemented to convert continuous military simulation data into discrete data. Then, rough set theory is employed to generate the simple rules and to remove irrelevant and redundant variables. Finally, OELM is compared with classical extreme learning machine (ELM) and support vector machine (SVM) to evaluate the performance of both original and reduced military simulation datasets. Experimental results demonstrate that, with the help of RS strategy, OELM can significantly improve the testing rate of military simulation data. Additionally, OELM is less sensitive to model parameters and can be modeled easily.

#### 1. Introduction

Information warfare, which is the competitive use of information in modern forces, has recently become of increasing importance to the military domain. Information superiority and knowledge dominance are key points to winning the future war. There has been a substantial growth in the amount of stored information in military systems, mainly due to the rapid development of military information construction. However, the large amount of stored data becomes impracticable for specialists to analyze through conventional methods.

Military data research has gained great importance and interest in recent years, especially in operational areas. Data learning technology provides important analysis and decision services to aid the commander. To provide decision support for commanders, it is worth studying how to effectively learn the massive data that emerges in operations. The data used in the equipment war gaming system was classified on the basis of introducing terms about verification, validation, and certification (VV&C) and a sort criterion of authoritative data sources (ADS) was also put forward, as described by Liguo et al. [1]. Li et al. [2] analyzed the characteristics of military data, put forward methods to classify data based on their characteristics and functions, and discussed data sources for combat simulation. Gao et al. [3] introduced a military simulation data aggregation framework for extraction, transformation, transportation, and loading based on message oriented middleware. As seen in [4], preliminary study on simulation data mining by rough set (RS) theory was done. RS enables us to find the dependencies and to reduce the number of attributes contained in military simulation data. With the help of RS, valuable results can be obtained by mining the battle simulation data, and the attributes which make a strong impact on performance of datasets learning can be found. However, given newly generated battle data, one cannot predict the classification results with the method of [4].

In this study, a new regularization algorithm-optimization extreme learning machine (OELM) [5], together with RS theory, is selected as main components for constructing an integrated predictor. OELM is an optimization algorithm based on conventional extreme learning machine (ELM) [6–11], which has the following properties: first, compared to the traditional ELM, the minimization norm of output weights enables OELM to get better generalization performance; second, compared to support vector machine (SVM) [12], OELM finds the optimal solution in the search space of , where SVM always searches the suboptimal solution of OELM due to equation constraints, so OELM usually achieves better generalization performance than a traditional SVM; third, OELM is less sensitive to specified parameters and can be implemented easily.

In this paper, the performance of an integrated method of multivariate discretization, RS theory, and OELM for military simulation data analysis is investigated and compared with state-of-the-art classifiers: ELM and SVM. As there exist some interactions between different attributes of military simulation data, multivariate discretization attempts to find the correct cuts by taking into account one attribute independently from the others. RS theory is adopted to reduce redundant attributes of military simulation data, and then OELM algorithm is employed to train this new dataset with optimal subset attributes.

We start this work with description of the integrated method in Section 2. Military simulation data preprocessing method is discussed in Section 3. Military simulation data analysis is shown in Section 4. Section 5 presents the experimental setup and details the results arrived at. Finally, Section 6 gives the conclusion.

#### 2. Integrated Method

In this section, RS theory is adopted to reduce redundant attributes of the military simulation dataset, and then OELM algorithm is employed to train this new dataset with optimal attributes subset. The main contributions of this work are as follows: multivariate discretization method is used to preprocess the original military dataset; rules generated by RS method are used to create a reduced military dataset; OELM, together with two other popular algorithms, is used to measure the generalization performance of both original dataset and reduced dataset.

##### 2.1. OELM Review

For arbitrary distinct samples , where and , the decision function given by an ELM is where is the output weight from the th hidden node to the output node and is the output of the th hidden node. is the output vector of the hidden layer, and it can map the training data from the input space to the -dimensional ELM feature space.

Based on the theory of Bartlett [13], feedforward neural network with smaller norm of weight, not only has smaller training error, but also obtains better generalization performance. Therefore, ELM amounts to minimizing the training error with the minimized norm of the output weight:

Any set of training data transformed from the input space to the ELM feature space with the mapping is linearly separable. In order to prevent the classification problem of overfitting, variables , are introduced and one can minimize the testing error as follows: Two inequalities in (3) can be combined into one set of inequalities: Thus, OELM works based on solving the following quadratic program mathematical model:

Compared to SVM’s optimization problem, there are three differences between SVM and OELM:(1)The mapping used in SVM is usually unknown and cannot be computed directly. The function used in OELM can be any bounded nonconstant piecewise continuous activation function for additive node, which can be calculated exactly.(2)Kernel parameters used in SVM need to be tuned manually, whereas all parameters of for OELM are selected randomly.(3)The bias is not needed in the constraints of OELM’s optimization problem, because the separating hyperplane of OELM tends to pass through the origin.

##### 2.2. Karush-Kuhn-Tucker Conditions

According to Lagrange optimization theory, minimizing the norm of output weights is equivalent to solving the dual optimization problem of (5): ELM kernel matrix can be defined as

For the sake of convenience, (6) can be written compactly as where is a convex function, is positive semidefinite matrix, , and .

Let . Let denote the th element of ; that is, . The Lagrange function of (8) is where and are the Lagrange multipliers and are nonnegative values.

In order to find the optimal solutions of (9) we should have Based on the Karush-Kuhn-Tucker (KKT) theorem, the KKT conditions of (10) should be primal feasibility dual feasibility complementary slackness Furthermore, consider the following.(1)If , from (15), we have . Thus, from (10) and (12), we have (2)If , from (14) and (15), we have and . Thus, from (10), we have (3)If , from (14), we have . Thus, from (10) and (13), we have

The KKT conditions are both necessary and sufficient for optimality. Thus, (8) is solved when for all

Active set approach is the best choice to solve (8) because it is an iterative algorithm and it maintains feasibility of the vector (i.e., ) while iteration continues until KKT conditions (19) are satisfied.

##### 2.3. Rough Set

Rough set (RS) term, which was first introduced by Pawlak et al. [14] has become a popular pattern recognition tool for generating logical rules for prediction. Attribute reduction, which is one of the core parts of RS theory, means to find out the most informative attributes and remove the irrelevant or unimportant attributes with minimal information loss.

RS is the approximation of an uncertain set by a pair of precise concepts called lower and upper approximations. The lower approximation comprises elements belonging to it, whereas the upper approximation of the set includes elements which are possible members of the set.

RS model is defined in terms of an information system, which can be defined formally as 4-tuple in [15, 16]: where is a nonempty set of objects called the universe, is a finite set of attributes, is the domain of attribute , and . is an information function that for each , . Each object of universe is described by a vector where . To every nonempty subset of attributes is associated with an indiscernibility relation on , denoted by :

Equation (22) is an equivalence relation. The family of all the equivalence classes of the is denoted by and class containing an element by .

Formally, let be a nonempty set of . Set is approximated by means of -lower and -upper approximations of : The -boundary of is denoted by The following relation holds: .

There are three regions of interest, they are the inner region for objects in , the outer region for objects not in , and the boundary region for the objects that are uncertain.

#### 3. Military Simulation Data Preprocessing

The performance of the integrated method is evaluated on countermeasure simulation data of tank formation [4], which was exercised on medium undulating ground, hills, and Taiwan highland for digital armored battalion simulation system. The specification of the dataset is presented in Table 1.

From Table 1, items “W1,” “T1,” …, and “R” represent “Weather,” “Terrain,” “Speed,” “Vegetation,” “Wind,” “Time,” “Formation,” “Psychology,” and “Shooting rate,” respectively. ELM algorithm can be applied only to datasets composed of categorical attributes but attributes in Table 1 are continuous variables. Discretization process is important for char variables because it is less prone to variance in estimation from small fragmented data and it provides better performance for rule extraction. The main discretization process methods are supervised [17], hierarchical [18], top-down [19], and multivariate [20]. The last one is employed in our work.

Multivariate discretization quantifies simultaneously many features. In the preprocessing phase, attribute “Weather” is converted into two new attributes “fine” and “fog,” which is shown in Table 2.

Based on examples shown in Table 2, other attributes are converted in Table 3.

#### 4. Military Simulation Data Analysis

Generally speaking, it is a data mining task to distinguish the probability of shooting rate. The goal of military data analysis is to predict the unknown value of shooting rate, such as high or low. More specifically, military simulation data analysis process can be modeled as a classification problem.

A typical military dataset is sparse compared to a traditional classification dataset. It usually contains small samples but with many features. This kind of sparseness often descends the performance of some classifiers. To improve the classification rate, one can remove the irrelevant or redundant features. Intuitively speaking, a classifier trained in lower-dimensional feature space is expected to capture the inherent data distribution and performs better in such feature space.

There are many advantages of variable and feature selection, such as facilitating data visualization and data understanding, reducing training and testing time, and so forth. We give a 3-dimensional visualization of dataset discussed in Section 4, as seen in Figure 1.

In Figure 1, only 4 samples can be seen. As there are 12 samples for entire dataset, we conjecture that some samples overlap each other. It can be obviously concluded that much redundant information is contained in this dataset.

RS method is a well-established method for feature selection. It is widely used in many applications and more flexible than other methods, such as principal component analysis (PCA) [21]. One can use RS method to discover the data dependencies and reduce the number of attributes. For comparison purpose we used five other widely used feature selection methods (i) PCA, (ii) multidimensional scaling (MDS), (iii) generalized discriminant snalysis (GDA), (iv) factor analysis (FA), and (v) Isomap.

After feature selection, more or less overlapping phenomenon appears in Figure 2, besides the “Rough set” subfigure. The main goal of feature selection method is to choose a subset of useful variables, which exclude many redundant, but relevant variables. As seen in Figure 2, RS method obtains the most relevant variables, because “Rough set” subfigure has the most overlapping region. The usefulness of these variables is directly judged by the estimated accuracy of the learning method. Learning method based on features chosen with RS method achieves error rates lower than other methods.

#### 5. Performance Evaluation

In this section we report the performance of the proposed integrated method both on original and reduced military simulation datasets and compare the performance with other state-of-the-art classifiers. The reduced dataset is prepossessed using the procedure described in Section 4. After converting, 8 attributes in the original data are scattered to 16 attributes. In our classification application, the inputs (attributes) are normalized into the range , while the outputs (targets) are normalized into the range . All the simulations are running in the MATLAB R2008a (Windows version) environments with Intel 3.0 GHZ and 2 G RAM.

For comparison purpose we used two other widely used classification methods, ELM and SVM. For the implementation of SVM, we use the Spider library for MATLAB, which is publicly available from [22]. Both MATLAB codes of ELM and OELM can be downloaded from ELM host site: http://www.ntu.edu.sg/home/egbhuang/. To train SVM, radial basis function (RBF) kernel is used in our experiments. ELM with sigmoidal activation function is tested. Both the Gaussian RBF activation function and the sigmoidal additive activation function have been used in the simulation of OELM. For both ELM and OELM with sigmoidal additive activation function and RBF activation function, the input weights and biases are randomly generated from based on the uniform probability distribution.

##### 5.1. Reduced Dataset

According to the RS strategy in [4], the number of attributes is reduced from 16 to 8 on military simulation dataset, which is shown in Table 4.

##### 5.2. Selection of Parameters

Two parameters for RBF kernel of SVM, the cost constant and the kernel parameter , are both sensitive to the generalization performance. A straightforward way for model selection is the grid search strategy [23]. According to the method of [24], two parameters are tuned on a grid space of {10^{−3}, 10^{−2}, 0.05, 10^{−1}, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 10^{2}, 10^{3}, 10^{4}} × {10^{−3}, 10^{−2}, 10^{−1}, 0.2, 0.4, 0.8, 1, 2, 5, 10, 20, 50, 10^{2}, 10^{3}, 10^{4}}. Therefore, for each problem we try 15 × 15 = 225 combinations of parameters for SVM. Figure 3 shows the generalization performance of SVM for each combination.

As can be seen in Figure 3, the specific and kernel parameter value form a grid in the parameter space. It is clear that the wider the search space is, the more possibility the grid search method finds the best parameter combination. However, the grid search method exhaustively searches the grid to find a best combination. Figure 3(a) indicates that the best generalization performance is achieved for 18 combinations on original dataset, and the performance of SVM highly depends on the combination of parameters. Similar results are obtained in Figure 3(b).

For ELM, there is only one parameter (optimal number of hidden nodes) that needs to be determined. During our simulation it is found that ELM with 5 hidden nodes can generally obtain good performance for both datasets. Generally speaking, the generalization performance of OELM is not sensitive to the number of hidden nodes [5]. We set the number of hidden nodes 1, 2, 3, 4, 5, and 6. Moreover, there is one more parameter that needs to be determined, which is the number of cost parameter . Similar to [5], we set this parameter 10^{−3}, 10^{−2}, 0.05, 10^{−1}, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 10^{2}, 10^{3}, and 10^{4}. The optimal parameter(s) are selected to obtain the best testing performance. Table 5 shows the optimal parameters of three classifiers on two datasets.

##### 5.3. Performance Comparison on Two Datasets

We evaluate the performance of three classifiers by repeated random splitting; that is, two datasets are equally partitioned into one training set and one test set. To avoid selection bias, there are at least one positive sample (labeled as “1”) and one negative sample (labeled as “−1”) in the training set. Such splitting is repeated 50 times. Training time, testing rate (TR) averaged over the 50 trials, and the corresponding standard deviations (Dev) are reported.

Table 5 draws a comparison between three classifiers, as well as some comparative results on different kernels. All of results represent the average, taken over 50 trials. OELM-S and OELM-R mean OELM using Sigmoidal and RBF kernel, respectively. As observed from Table 5, the performance of OELM-S and that of OELM-R is similar to each other except that OELM-S requires twice training time taken by OELM-R. We can see that the training time taken by ELM is much less than other classifiers, and the worst performance is also obtained by ELM. The performance of OELM-S or OELM-R is obviously higher than other classifiers. In order to further study this phenomenon, an extra experiment is conducted in Table 6.

Seen from Table 6, among 10 trials, the number of training and test set is fixed according to the above setting, but the order of training set is randomly shuffled for each trial. Although 100% testing rate is obtained for SVM on 4 trials, testing rate on other trials is no more than 33.33%. These results seem very unstable due to the randomness. Interestingly, OELM-S runs more stable than the other two classifiers, and this is why the performance of OELM-S is much higher than SVM and ELM.

Strictly speaking, it seems that, after data reduction, the performance of SVM cannot be improved on all trials. Moreover, OELM-S is exactly a good choice employed in our integrated strategy for improving the generalization performance of military simulation data.

#### 6. Conclusions

This paper has introduced an integrated learning method of RS and OELM for military simulation dataset. The generalization performance of OELM is less sensitive to the user specified parameters especially for the number of hidden nodes. Compared to SVM, one can use OELM easily and effectively without parameter(s) tuning process. Significantly better results for the reduced dataset are obtained by employing OELM classifier over crisp discretization. This method has the potential for further military application because it can provide real-time decision support for commanders in battlefield.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### References

- W. Liguo, X. Qing, and M. Xianquan, “The research of the data VV&C for the equipment war gaming based on the HLA,”
*Computer Engineering and Applications*, vol. 21, pp. 200–202, 2006. View at Google Scholar - M.-Z. Li, J.-K. Zhang, and W.-F. Che, “Research on data for combat simulation,”
*Command Control & Simulation*, vol. 32, no. 4, pp. 71–74, 2010. View at Google Scholar - H. Gao, H. Zhang, G. Chen et al., “Research and implementation of military simulation,”
*Fire Control & Command Control*, vol. 34, no. 2, pp. 150–153, 2009. View at Google Scholar - W.-M. Zhang and Q. Xue, “Application of rough set in date mining of warfare simulation,”
*Journal of System Simulation*, vol. 18, no. 2, pp. 179–181, 2006. View at Google Scholar - G.-B. Huang, X. Ding, and H. Zhou, “Optimization method based extreme learning machine for classification,”
*Neurocomputing*, vol. 74, no. 1-3, pp. 155–163, 2010. View at Publisher · View at Google Scholar · View at Scopus - G.-B. Huang and C.-K. Siew, “Extreme learning machine: RBF network case,” in
*Proceedings of the International Conference on Control, Automation, Robotics and Vision*, pp. 1651–1663, 2004. - G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: theory and applications,”
*Neurocomputing*, vol. 70, no. 1–3, pp. 489–501, 2006. View at Publisher · View at Google Scholar · View at Scopus - G.-B. Huang, L. Chen, and C.-K. Siew, “Universal approximation using incremental constructive feedforward networks with random hidden nodes,”
*IEEE Transactions on Neural Networks*, vol. 17, no. 4, pp. 879–892, 2006. View at Publisher · View at Google Scholar · View at Scopus - J. W. Cao, T. Chen, and J. Fan, “Fast online learning algorithm for landmark recognition based on BoW framework,” in
*Proceedings of the 9th IEEE Conference on Industrial Electronics and Applications*, Hangzhou, China, June 2014. - J. W. Cao and L. Xiong, “Protein sequence classification with improved extreme learning machine algorithms,”
*BioMed Research International*, vol. 2014, Article ID 103054, 12 pages, 2014. View at Publisher · View at Google Scholar - J. W. Cao, Z. Lin, G.-B. Huang, and N. Liu, “Voting based extreme learning machine,”
*Information Sciences*, vol. 185, pp. 66–77, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - C. Cortes and V. Vapnik, “Support-vector networks,”
*Machine Learning*, vol. 20, no. 3, pp. 273–297, 1995. View at Publisher · View at Google Scholar · View at Scopus - P. L. Bartlett, “The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network,”
*IEEE Transactions on Information Theory*, vol. 44, no. 2, pp. 525–536, 1998. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - Z. Pawlak, J. Grzymala-Busse, R. Slowinski, and W. Ziarko, “Rough sets,”
*Communications of the ACM*, vol. 38, no. 11, pp. 88–95, 1995. View at Publisher · View at Google Scholar · View at Scopus - Z. Pawlak and A. Skowron, “Rudiments of rough sets,”
*Information Sciences*, vol. 177, no. 1, pp. 3–27, 2007. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - S. Greco, M. Benedetto, and R. Slowinski, “New developments in the rough set approach to multi-attribute decision analysis,”
*Bulletin of International Rough Set Society*, vol. 2, no. 2-3, pp. 57–87, 1998. View at Google Scholar - J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsuperivised discretization of continuous features,” in
*Proceedings of the 12th International Conference on Machine Learning*, pp. 194–202, 1995. - R. kerber, “Discretization of numeric attributes,” in
*Proceedings of the 10th National Conference on Artificial Intelligence*, pp. 123–128, MIT Press, Cambrige, Mass, USA, 1992. - F. Hussain, H. Liu, C. L. Tan, and M. Dash, “Discretization: an enabling technique,” Tech. Rep., School of Computing, Singapore, 1999. View at Google Scholar
- S. D. Bay, “Multivariate discretization of continuous variables for set mining,” in
*Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, pp. 315–319, August 2000. View at Scopus - T. C. Lei, S. Wan, and T. Y. Chou, “The comparison of PCA and discrete rough set for feature extraction of remote sensing image classification—a case study on rice classification, Taiwan,”
*Computational Geosciences*, vol. 12, no. 1, pp. 1–14, 2008. View at Publisher · View at Google Scholar · View at Scopus - The Spider Library for MATLAB, http://www.kyb.tuebingen.mpg.de/bs/people/spider/.
- N. Ancona, C. Cicirelli, E. Stella, and A. Distante, “Object detection in images: run-time complexity and parameter selection of support vector machines,” in
*Proceedings of the 16th International Conference on Pattern Recognition*, vol. 2, pp. 426–429, August 2002. View at Publisher · View at Google Scholar - P. Ghanty, S. Paul, and N. R. Pal, “NEUROSVM: an architecture to reduce the effect of the choice of kernel on the performance of SVM,”
*Journal of Machine Learning Research*, vol. 10, no. 3, pp. 591–622, 2009. View at Google Scholar · View at Scopus