Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2018 / Article
Special Issue

Optimization Algorithms Combining (Meta)heuristics and Mathematical Programming and Its Application in Engineering

View this Special Issue

Research Article | Open Access

Volume 2018 |Article ID 3145947 |

Ahmad M. Karim, Mehmet S. Güzel, Mehmet R. Tolun, Hilal Kaya, Fatih V. Çelebi, "A New Generalized Deep Learning Framework Combining Sparse Autoencoder and Taguchi Method for Novel Data Classification and Processing", Mathematical Problems in Engineering, vol. 2018, Article ID 3145947, 13 pages, 2018.

A New Generalized Deep Learning Framework Combining Sparse Autoencoder and Taguchi Method for Novel Data Classification and Processing

Academic Editor: Nibaldo Rodríguez
Received12 Mar 2018
Revised03 May 2018
Accepted13 May 2018
Published07 Jun 2018


Deep autoencoder neural networks have been widely used in several image classification and recognition problems, including hand-writing recognition, medical imaging, and face recognition. The overall performance of deep autoencoder neural networks mainly depends on the number of parameters used, structure of neural networks, and the compatibility of the transfer functions. However, an inappropriate structure design can cause a reduction in the performance of deep autoencoder neural networks. A novel framework, which primarily integrates the Taguchi Method to a deep autoencoder based system without considering to modify the overall structure of the network, is presented. Several experiments are performed using various data sets from different fields, i.e., network security and medicine. The results show that the proposed method is more robust than some of the well-known methods in the literature as most of the time our method performed better. Therefore, the results are quite encouraging and verified the overall performance of the proposed framework.

1. Introduction

Machine learning (ML) is a popular branch of artificial intelligence (AI) that does not need to be explicitly programmed but allows machines to obtain new skills and predict results with high accuracy. Deep learning (DL) is a new version of ML which recently have been applied in many fields from computer vision to high dimension data processing. DL achieved the state-of-the-art results [1, 2]. Essentially, DL achieves great improvement in solving problems that have resisted the trials of the AI society for more than three decades. It should be noted that DL can predict comprehensive outcomes by requiring little engineering, which cannot be compared by the conventional AI based approaches. DL will be applied to different fields in the near future due to its flexible and generic structure. Development of innovative learning algorithms and new structures for deep neural networks will merely speed up this progress [3]. Recently, deep autoencoders have shown state-of-the-art achievement on different machine learning tasks which relies on unsupervised learning algorithms [4]. Deep autoencoders have been widely used in different fields from image recognition to computer network, etc. Lore et al. (2017) proposed a deep autoencoder-based method to separate features of signal from images having low light and also modify glare images without over saturating the lighter accessories in images with a high variety [5]. K. Sun et al. proposes that, a divergence of the stacked sparse denoising autoencoder, synthetic data used for training it, the new proposed extreme learning machine autoencoder (ELM-AE) called generalized extreme learning machine autoencoder (GELM-AE) adds the forked regularization to the aim of ELM-AE [6]. In [7] Yihui Xiong et al. trained an autoencoder network to encode and remodel a geochemical pattern population with strange complex multivariate probability division. In [8] Lyle D. Burgoonet et al. trained the autoencoder to predict estrogenic chemical substances (APECS). APECS consists of two deep autoencoder models which is less convoluted than the USEPA’s method and performs at least the same achievement. However, proposed idea implements accuracies of 91% versus 86% and 93% versus 93% on the in vitro and in vivo datasets used in validating the US EPA method. Chaoqun Hong et al. proposed a new pose retrieval technique which focuses on multimodal integration feature extraction and backpropagation deep neural network by using multilayered deep neural network with nonlinear mapping [9]. In [10] Tzu-Hsi Song et al. focused on bone marrow trepan biopsy images and proposed a hybrid deep autoencoder (HDA) network with Curvature Gaussian method for active and exact bone marrow hematopoietic stem cell detection via related high-level feature correspondence. In [11] Yosuke Suzuki et al. proposed a collaborative filtering based recommendation algorithm that employs the variation of similarities among users derived from different layers in stacked denoising autoencoders. Yu-Dong Zhang et al. presented a novel system counting on susceptibility-weighted imaging as computer-aided detection application which increased in the last years. Unsupervised feature learning was done by using SAE. Then, a deep autoencoder neural network was formed using the learned features and stacked autoencoders for training all of them together as supervised learning. The proposed approach produced a sensitivity of “93.20±1.37%”, a specificity of “93.25±1.38%”, and an accuracy of “93.22±1.37%”, the results obtained over “10x10-fold” cross validation [12]. As presented above, deep autoencoders have gathered lots of attention from researchers recently.

Taguchi Method is a statistical technique proposed by Taguchi and Konishi, which was essentially proposed for optimizing the quality manufacturing process development [13]. Especially in recent years, this method is used in number of critical studies to design experiment with best performance by different disciplines such as Engineering, Biotechnology, and Computer Science. For instance, Mei-Ling Huang et al. (2014) combined a feature selection technique with SVM recursive feature elimination approach to validate the classification accuracy for Dermatology and Zoo databases [14]. In this study, the Taguchi Method was adapted and combined with a SVM classifier so as to increase the overall classification accuracy by optimizing ‘’ and ‘’ parameters respectively. Authors claim that the proposed method can produce more than 95% accuracy for Dermatology and Zoo databases. A study includes multistage metal forming process by considering workability and also employs Taguchi Method for optimization [15]. For this study, the Taguchi Method is combined with artificial neural network to minimize the objective functions with respect to the forming process that the combinations of parameters used in finite element simulation are determined by orthogonal array in statistical design of experiments. The train data for artificial neural networks are obtained from orthogonal array and the result of simulation process. Huimin Wang et al. (2014) adopted the Taguchi Method to analyze the effect of “inertia weight”, “acceleration coefficients”, “population size”, “fitness evaluations”, and population topology on particle swarm optimization algorithm (PSO) and to determine the best mix of them for various optimization problems. The experimental results illustrate that all the benchmark functions have their optimum solutions after the tuning process. Furthermore, acceptable results are also presented by the article when dealing with the optimization design of a “Halbach permanent magnet” motor. The paper concludes that the PSO based Taguchi Method is quite appropriate for such popular engineering problems [16]. A recent study published in (2016) proposed a new predictive modelling of material removal rate (MRR) by employing Taguchi-entropy weight based GRA to optimize an artificial neural network [17]. Further recent studies using Taguchi Method can be also seen in the corresponding articles [1822].

This paper introduces a novel deep autoencoder based architecture optimized by Taguchi Method (see Section 2.1). The proposed architecture was employed in four different fields to show its performance, the presented architecture shows satisfactory results, and this encouraged authors to employ this framework in other different fields. The structure of paper consists of the proposed framework, experimental results, and conclusion.

2. The Proposed Framework

This study proposes a new method for optimizing deep autoencoders structure for processing data. The proposed deep learning architecture employs stacked autoencoders supported by Taguchi Method for parameter optimization in a reasonable amount of time. First a brief explanation of the stacked sparse autoencoder and Taguchi Method is presented respectively. Afterwards the proposed architecture, shown in Figure 2, is discussed.

2.1. Stacked Sparse Autoencoder

Supervised learning is one of the most powerful tools of AI. The stacked sparse autoencoder (SSAE) is essentially a neural network consisting of multiple layers of sparse autoencoders and mainly used as an unsupervised feature extraction method that automatically learns from unlabelled data. Output of each layer is wired to the inputs of the succeeding layer. Having a trained autoencoder essentially refers to estimate optimal parameters by reducing the divergence between input and output An example autoencoder is illustrated in Figure 1. The mapping between input and output . is given following equations: where M () is an activation using sigmoid logistic function.

The final expression can be shown as follows:The discrepancy between the input and output is defined by using a cost function. This functions’ first term refers to the MSE whereas the second one is the regularization term. Different algorithms are preferred to solve the optimal parameters of the network; the details can be seen in [45].

2.2. Taguchi Method

Taguchi Method is a statistical robust design method that was first proposed to improve the quality of manufactured product and more recently also applied to a variety of fields from engineering to marketing [13, 22, 46]. Three concepts were considered by the Taguchi concepts, namely, Taguchi loss function, offline quality control, and orthogonal arrays for experimental design. Taguchi Method offers a methodology for designing of experiments. For instance, if an experiment is aimed at heating of wire by passing the electricity through it, then different control parameters from material type to diameter of wire are considered. Those parameters may have various values. DOE allows you to obtain the parameters and their values in an efficient manner. An example orthogonal selection table is illustrated in Table 1.

Number of Parameters (NoP)

Number of Levels23456789

Essentially those arrays tend to adopt a methodical way to permute and combine the collaboration among different parameters. Besides, unlike the full factorial experiment, there is no need to carry out each experiment respectively. To obtain the objective value or best accuracy, Taguchi Method decreases the number of necessary experiments by using orthogonal arrays (OA). This reduces the number of experiments to be performed and also reduces the overall cost. This arrays are essentially predefined matrices, including control parameters and number of experiments. The purpose of the Taguchi Method is to design an experiment that reduces the effect of the operator that cannot be controlled with a least amount of experiments [46, 47]. The selection of an appropriate orthogonal array is mainly based on the number of control parameters and corresponding levels. Orthogonal arrays are varied from L4 to L50 (see Table 1). The more numbers of control parameters yield the higher the numbers after “L”. Design of experiments is performed by employing the defined orthogonal array [47]. The iterations of experiments can be performed once the OA is carefully chosen. The number of iterations is then confirmed based on the complexity of the experiments. As aforementioned, the purpose of Taguchi Method to design an experiment is to reduce the effect of the operator that cannot be controlled with a least amount of experiments [47]. Taguchi Method is a powerful technique for supplying the best set among different stages of various parameters. The measure used in Taguchi Method is signal-to-noise (S/N) ratio to measure and esteem the superiority features that is the ratio of signal (S) to the operator of noise (N). Various S/N ratios were presented but three of them are considered standard [48]. The first standard is “smaller-is-better”, when the objective account of the quality variable y is zero. In this case, the S/N ratio can be defined as follows:In (1), x is the account of the experimental control and k is the number of experiments. The second standard is “larger-is-better” when the zero account of the quality variable y is unlimited and in this case, the S/N ratio can be realized as follows:Here, x is experimental surveillance account and k is the number of experiments. The last standard is “nominal-is-best”: in these styles of problems, the objective account of the quality variable x is specific. According to which, the S/N ratio can be realized as follows:Here, x is the average account for the experimental surveillance and σ is the criterion variation of the experimental surveillance [49, 50].

Overall, the average values of the signal-to-noise (S/N) ratio for each level of each of the parameter are calculated. The maximum and minimum values of differences are presented that the appropriate S/N ratio is decided based on the experimental strategy. This principally has a great influence on assessing the experiments.

2.3. Deep Learning Framework Combining Sparse Autoencoder and Taguchi Method

As illustrated in Figure 1, deep neural network is designed from two autoencoders and SoftMax layers, each one of them was trained alone as unsupervised training without using labelled data; the purpose of these first two layers is essentially to extract appropriate features; automatic feature extraction is one of the powerful characteristics of deep learning based architectures. The following section will briefly introduce the Taguchi Method whereas the following subsection will introduce the proposed method and the corresponding deep learning based architecture. The third layer is the SoftMax layer, which is one of the leading feature classifiers and is responsible for classifying the features that are extracted from the previous layers. The final layer is to stack all layers and train them together by using labelled data in supervised fashion. This basically allows converting an unsupervised learning architecture into a supervised learning architecture. To obtain the best performance from the first autoencoder, Taguchi Method is integrated into the model aiming to estimate optimized combination of five parameters of first autoencoder, namely, L2 Weight Regularization, Sparsity Regularization, Sparsity Proportion, Hidden Size, and Max Epochs. The effect of an L2 regularizer for the weights of the network is controlled by L2 Weight Regularization but not control the biases.

L2 Weight Regularization parameter should be very small and is represented in the following: where number of hidden layers is represented by , the number of observations is represented by , and the training data variables number is represented by .

The sparsity regularizer effect is controlled by a Sparsity Regularization parameter, dealing to force a chain on the sparsity of the output from the hidden layers. This is different from applying a sparsity regularizer to the weights that Sparsity Regularization term can be the Kullback-Leibler divergence (KL) function as illustrated in the following:where represents the desired value, represents the average output activation of a neuron , and KL is the function that measures the variation between two probabilities distribution through the same data. As it can be inferred that the equation result value gets close to zero between and when input and output data resemble each other. On the other hand, when those values are not close to each other, the sparsity will take a larger value [20].

Alternatively, sparsity regularizer parameter is controlled by Sparsity Proportion (SP) parameter. The sparsity of the output from each hidden layer is controlled by the Proportion parameter. A low value for SP normally leads all neurons in the hidden layer specialized by only producing a high output value for a small amount of training examples. For instance, if SP value is selected as “0.2”, an average output for each neuron becomes “0.2” in the hidden layer over the training examples. The optimum value of SP varies depending on the nature of the problem between 0 and 1. Therefore, the technique for selecting the optimal value is very significant to improve the overall performance of the sparse autoencoder [21]. In addition, Hidden Size (HS) is a parameter which controls the size of the feature on each layer so; it affects the performance of the autoencoder. The last parameter is Maximum Epochs; one epoch represents one entire training cycle on the training data. Every sample in the training data is seen once, you start with the epoch. However, the Maximum Epochs mean, for example, if maximum epoch equals 10, this means the weights will be updated at least 10 times.

All previously defined parameters are employed in the training phase and directly influence the success of the training process. The cost function of training sparse autoencoder is also illustrated in (10). The training algorithm tries to reduce the cost function by finding the optimal parameters that essentially aims to reduce the value of Here, represented the loss rate (error rate), is represented the input features, is the reconstructed features, is the coefficient for the L2 Weight Regularization, and is coefficient for the Sparsity Regularization.

The given problem includes two autoencoders. Each of those autoencoders has 5 parameters and each parameter can be defined with 5 different levels. Consequently, the traditional method for finding best combination of parameters for two autoencoders requires 55+55 = 3125 +3125 = 6250 trails so as to test all parameter combinations by using full factorial design. This means that each autoencoder entails 55 trails to obtain the best combination of parameters. Hence, a more optimized approach has been proposed in this study. According to which Taguchi Method was utilized for finding the optimal parameters for the system by performing only 25 experiments, select L25 orthogonal index (5 parameters and 5 levels in each parameter); see Section 2.2. As the first autoencoder is performed by doing 25 experiments, the most optimum parameters were also determined by Taguchi Method and best performance for the second autoencoder as well. This means that the total experiments for the first two layers in our system are 25+25= 50. As mentioned above, at the last step, all three components are stacked and trained in a supervised fashion by using backpropagation on multilayer network for improving the network performance. In order to validate the performance of the proposed system, a series of experiments were conducted.

3. Experimental Results

A computer with Intel Core i7–6700 CPU @ 2.60-GHz and 8-GB RAM is used for running the proposed framework which is used in several applications to detect computer network attacks including DDoS and IDS attacks and Epileptic Seizure Recognition and Handwritten Digit classification. The results obtained with the proposed method are compared to a number of studies in the respective field. In addition, some of the techniques implemented in this paper to compare the results with our proposed method are SVM, neural network, SoftMax and stacked sparse autoencoder based support vector machine (SSAE-SVM). Each dataset and corresponding result will be detailed in the following subsections respectively.

3.1. DDoS Detection Using the Proposed Framework

Distributed Denial of Service attack is an offensive and threatening intrusive threats to online servers, websites, networks, and clouds. The purpose of DDoS attack is to exhaust exchequer and to expend bandwidth of a network system. Due to the harmonious nature of DDoS attack, an attacker can generate massive amount of attack traffic using a huge number of compromised machines to smash a system or website [51, 52]. Many organizations such as Amazon, eBay, CNN, and Yahoo were the victims of DDoS attacks in the recent past. In this paper, our new framework was used to detect DDoS attack proposed in [23], which presented four attacks types (Smurf, UDP Flood, SIDDOS, HTTP Flood, and normal). This dataset consists of 27 features (SRC ADD, DES ADD, PKT ID, FROM NODE, TO NODE, PKT TYPE, PKT SIZE, FLAGS, FID, SEQ NUMBER, NUMBER OF PKT, NUMBER OF BYTE, NODE NAME FROM, NODE NAME TO, PKT IN, PKTOUT, PKTR, PKT DELAY NODE, PKTRATE, BYTE RATE, PKT AVG SIZE, UTILIZATION, PKT DELAY, PKT SEND TIME, PKT RESEVED TIME, FIRST PKT SENT, LAST PKT RESEVED). In Table 2, parameters are classified into 5 classes and we recognize the upper and lower boundaries of the parameters. The upper and lower boundaries of these parameters are determined by using trial and error approach. This approach considers the results of the predefined experiments and studies.

FactorsLower LimitUpper Limit

Hidden Size (HS)1922
Max Epochs (ME)300500
L2 Weight Regularization (L2)0.00350.0045
Sparsity Regularization (SR)46
Sparsity Proportion (SP)0.130.16

As mentioned above, dataset consists of five classes that each class consists of 800 samples. 50% of them were used for training, and also the other % 50 were used for testing. Consequently, the proposed framework was trained by employing 2000 samples and then it was tested by employing another 2000 samples. Moreover, in Table 3, the operators’ level values are presented. Minitab program experiments results are presented in Table 4. The error accounts obtained from stratifying the parameters to the autoencoder 1 are represented in Table 5. Root mean square error (RMSE) is used to measure the performance of the autoencoder 1, the smallest value which is closed to zero means that the performance is well. where spotted rate is represented by and modelled rate represented by at time/place ‘i’. The experiment results acquired by using the Taguchi experimental design were estimated by transforming them into S/N ratios. The results acquired by using the Taguchi experimental design were predestined by transforming the results into signal/noise (S/N) ratios (See Table 5).

FactorsLevel 1Level 2Level 3Level 4Level 5





Delta 0.80 0.46 0.48 0.51 0.65
Rank 1 5 4 3 2

Now, Table 6 presents the best parameters for autoencoder 1 which represented the first layer in the deep autoencoder neural network. The parameters of autoencoder 2 which represented the second layer can be obtained by using the same steps in different ranges for each parameter Table 7.



FactorsLower limitUpper limit


By following Tables 8, 9, and 10, the best parameters are obtained in Figure 4 and Table 11; the same procedures in Table 3, Table 4, Table 5, respectively, were used to find the best parameters that are represented in Figure 3 and Table 5. This means that the best parameters of each autoencoder were determined in minimum number of tests.

FactorsLevel 1Level 2Level 3Level 4Level 5








After finding the best parameters, for each autoencoder, this leads to obtaining the best performance for training each autoencoder by using the best parameters. On the other hand, the results that were obtained from the system presented by using confusion matrix for detailed analysis for each type of DDoS attack is seen in Figure 5. The experimental results show that proposed method has satisfactory results when compared to other methods.

Detection accuracy of 99.6% makes the proposed method slightly better than the other methods as shown in Table 12. The other feature of proposed method is that this system can learn effectively by using only 2000 samples which is very little when compared to previous methods. Data collection is very difficult and expensive procedure so that the system that learns faster by using less number of data sample is more practical from others. The confusion matrix notation is used to present results in a more detailed fashion and to be more understandable. The proposed framework results is compared with number of methods proposed in [23], also with number of methods proposed by us to detect DDoS attacks such as SSAE-SVM [24], SVM, and SoftMax classifiers. Table 12 illustrates that the proposed framework produces the best results compared with the state-of-the-art methods for this problem.

MethodsAccuracy %

MLP [23]98.63
Random Forest [23]98.02
Naïve Bayes [23]96.91
SSAE-SVM [24]97.65
Proposed Framework99.60

3.2. IDS Attack

In computer security systems, Intrusion Detection Systems (IDS) have become a necessity because of the growing demand in unlawful access and attacks. In computer security systems, IDS is a prime part that can be classified as Host-based Intrusion Detection System (HIDS) which superheats a confirmed host or system and Network-based Intrusion detection system (NIDS), which superheats a network of hosts and systems. In this paper, our framework is used to detect IDS attack by using new dataset [53], which consists of 47 features and 10 attack types. We will examine the UNSW-NB15 intrusion dataset in our research, as well as real-time captured dataset. This dataset is a hybrid of intrusion data collected from real modernistic normal and abnormal activities of the network traffic. This dataset is newer and more efficient than KDD98, KDDCUP99, and NSLKDD which are the common and older features datasets because they were generated two decades ago. By following the same procedures in the Figure 1, and the tables such as in the DDoS detection procedures, the best parameters were determined as shown in the Tables 13 and 14 for each autoencoders 1 and 2 to find the best parameters that produces the best performance to detect IDS attacks. 10000 data points were used to train and test the system (5000 data used for training and 5000 for testing). Dividing half of the data for testing is also a challenging issue that previous studies employ more than %50 data for training. However, in order to reduce overall training time, training data percentage is pulled down. Experimental results for this dataset and configuration is illustrated in Figure 5. According to those results, the framework detection rate reaches 99.70% success rate which is satisfactory when compared with previous studies, as illustrated in Table 15. This proves that even such a small percentage training set is employed for this problem. Satisfactory results can be obtained. Figure 6 also demonstrates results based on the corresponding confusion matrix of the output results.





MethodsAccuracy %

DT [25]85.56
LR [25]83.15
NB [25]82.07
ANN [25]81.34
Ramp-KSVCR [25]93.52
GA-LR [26]81.42
SSAE-SVM [24]84.71
Proposed Framework99.70

3.3. Epileptic Seizure Recognition

According to the latest results, 1-2% inhabitancies of the world suffer from epilepsy which is a neurological trouble [54]. It is distinguished by surprised frequent and evanescent troubles of perception or demeaning our produce from immoderate coincidence of cortical neural networks. Epileptic Seizure is a neurologic status which is caused by detonation of electrical discharges in the brain. The epileptic seizures mean lineament of epilepsy is recurrent seizures. Observation of brain performance over the EEG has become a serious agent in the detection of epilepsy [55]. There are two kinds of abnormal actions: interictal, abnormal EEG recorded between epileptic crisis and ictal that occurs in the patient’s EEG records. The EEG subscription of an interictal action is accidental passing waveforms, as either separated trainer, sharp waves, or spike wave complexes [56]. Commonly, veteran physicians by visual surveying of EEG records for interictal and ictal actions can detect the epilepsy crises. However, visual survey of the huge size of EEG data has business-like disadvantages and weaknesses. Visual search is very time-consuming and inactive, essentially in the situation of long size of data [57]. In addition, contention among physicians on the many EEG results in some time leading to individual decision of the analysis due to the set of interictal spikes morphology. Therefore, computer-aided systems are developed to detect blood diseases [58], heart disease recognition [59], and epilepsy detection systems which are listed in Table 15. Epileptic dataset [60] is used to train and test in the proposed method. Two parts vector matrices are generated with the size of (100 × 4096) datasets, A representing (healthy) and E representing the (epileptic activity condition). A, E are divided into two parts, each of them is 50% of the vector matrices, and then two (50 × 4096) vector matrices are generated for training and another one for testing. Epileptic Seizure dataset consists of 4096 features by using 2 autoencoders, the first one reduces the number of features to 2004 and 103 in the second autoencoder which means reducing the time consumption. The best parameters for autoencoder 1 and autoencoder 2 that were obtained from our system are listed in Tables 16 and 17. This leads to obtaining the best results for Epileptic Seizure Recognition which is represented in Figure 7. The proposed method results compared with previous results in Epileptic Seizure Recognition are presented in Table 17. SVM, Nlp, and SoftMax were implemented by us to obtain results that are compared with our proposed method.





The comparison in Table 18 shows that there are a number of methods that have same accuracies with proposed method such as Tzallas et al. [28] and Srinivasan [30], but our proposed method has a good feature which uses deep learning techniques that give advantage when there are huge numbers of instances of epilepsy data for classification and uses only 50% of data in training when other methods used 60%.

MethodsAccuracy %

Srinivasan et al. [27]99.60
Subasi and Ercelebi [28]92.00
Subasi [29]94.5
Kannathal et al [30]92.22
Tzallas et al. [31]100
Polat et al. [32]98.72
Acharya et al. [33]99.00
Acharya et al [34]99.70
Musa Peker et al.[35]100
SSAE-SVM [24]98.80
Proposed Framework100

3.4. Handwritten Digit Classification

The proposed framework is finally tested by employing MNSIT dataset which was proposed for handwritten digit classification problem [40]. The framework is trained by using “5000” images that is “500” for each example. Each image consists of “28x28” pixels, meaning there are “784” values for each image when converted to vectors to build the matrices of vectors. In the second stage, the matrix of arrays becomes input to the first autoencoder in which parameters are also optimized by using Taguchi Method, as illustrated in Table 19. Besides Table 20 illustrates the optimized parameters for the second autoencoder. According to the characteristics of the proposed framework, extracted features from the second autoencoder are conveyed to the SoftMax layer that classify them into ten separate classes. Overall, the two autoencoders and SoftMax layer are stacked and trained in a supervised manner. The confusion matrix of the system obtained according to the experimental results is illustrated in Figure 8. These results are compared with the state-of-the-art studies regarding this problem and satisfactory results are obtained, as illustrated in Table 21.





ReferanceMethodsAccuracy %

Anupama Kaushik et al. [36]J4870.0
Anupama Kaushik et al. [36]NaiveBayes72.65
Anupama Kaushik et al. [36]SMO89.95
Olarik Surinta et al. [37]Hotspot + SVM92.70
U Ravi Babu et al. [38]Hotspot + k-NN96.94
Hinton GE et al. [39]Deep Belief Network98.75
LeCun Y et al. [40]Deep Conv. Net LeNet-599.05
Wan L [41]Deep Conv. Net (dropconnect)99.43
Zelier MD [42]Deep Conv. Net (stochastic pooling)99.53
Goodfellow IJ [43]Deep Conv. Net (maxout units and dropout)99.55
Lee CY [44]Deep Conv. Net (deeply-supervised)99.61
Proposed FrameworkDeep Autoencoder based on Taguchi Method99.80

4. Conclusion

This paper proposes a new deep learning framework that essentially combines sparse autoencoder and Taguchi Method, which is an organized approach for parameter optimization in a reasonable amount of time. Experimental results reveal that applying this method allows the proposed framework to optimize numerous factors and extract more quantitative data from fewer experimental trials simultaneously. This novel framework is tested with different experimental data sets and compared to state-of-the-art methods and studies in terms of overall accuracy. For instance, proposed framework achieves satisfactory results: 99.6% in DDoS Detection, 99.7% for IDS Attack, 100% in Epileptic Seizure Recognition, and finally 99.8% precision result for handwritten digit classification problem. The results verify the validity of the proposed framework. Also authors are encouraged to improve overall performance of this architecture for more complex problems such as 3D image processing and real-time robotic system. Accordingly, different heuristic optimization algorithm, including genetic algorithms, particle swarm optimization, or colony optimization algorithms, will be used to estimate autoencoder parameters and compared with the Taguchi Method in future works. It is also noticed that the proposed architecture can also be employed for comprehensive recognition and estimation problems, including gesture recognition, URL reputation, and SMS spam collection.

Data Availability

The IDS attack data that support the findings of this study are available with “UNSW-NB15” reference name at “”. Epilepsy recognition dataset that also support the findings of this study with “SETS A and B” references is available at “”. The Digit Classification dataset with “MNIST” reference is available at “”. The DDoS detection dataset that support this study are available at “”.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


The authors gratefully acknowledge the support to this work by Ankara Yıldırım Beyazıt University.


  1. B. Chandra and R. K. Sharma, “Deep learning with adaptive learning rate using laplacian score,” Expert Systems with Applications, vol. 63, pp. 1–7, 2016. View at: Publisher Site | Google Scholar
  2. S. Ding, N. Zhang, X. Xu, L. Guo, and J. Zhang, “Deep extreme learning machine and its application in EEG classification,” Mathematical Problems in Engineering, vol. 2015, Article ID 129021, 11 pages, 2015. View at: Publisher Site | Google Scholar
  3. G. Urban et al., Do Deep Convolutional Nets Really Need to be Deep and Convolutional? 2016.
  4. T. orde, “Grozdića.pdf”. View at: Google Scholar
  5. K. G. Lore, A. Akintayo, and S. Sarkar, “LLNet: A deep autoencoder approach to natural low-light image enhancement,” Pattern Recognition, vol. 61, pp. 650–662, 2017. View at: Publisher Site | Google Scholar
  6. K. Sun, J. Zhang, C. Zhang, and J. Hu, “Generalized extreme learning machine autoencoder and a new deep neural network,” Neurocomputing, vol. 230, pp. 374–381, 2017. View at: Publisher Site | Google Scholar
  7. Y. Xiong and R. Zuo, “Recognition of geochemical anomalies using a deep autoencoder network,” Computers & Geosciences, vol. 86, pp. 75–82, 2016. View at: Publisher Site | Google Scholar
  8. L. D. Burgoon, “Autoencoder Predicting Estrogenic Chemical Substances (APECS): An improved approach for screening potentially estrogenic chemicals using in vitro assays and deep learning,” Computational Toxicology, vol. 2, pp. 45–49, 2017. View at: Publisher Site | Google Scholar
  9. C. Hong, J. Yu, J. Wan, D. Tao, and M. Wang, “Multimodal Deep Autoencoder for Human Pose Recovery,” IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5659–5670, 2015. View at: Publisher Site | Google Scholar
  10. T.-H. Song, V. Sanchez, H. Eidaly, and N. M. Rajpoot, “Hybrid deep autoencoder with Curvature Gaussian for detection of various types of cells in bone marrow trephine biopsy images,” in Proceedings of the 14th IEEE International Symposium on Biomedical Imaging, ISBI 2017, pp. 1040–1043, aus, April 2017. View at: Google Scholar
  11. Y. Suzuki and T. Ozaki, “Stacked Denoising Autoencoder-Based Deep Collaborative Filtering Using the Change of Similarity,” in Proceedings of the 2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 498–502, Taipei, Taiwan, March 2017. View at: Publisher Site | Google Scholar
  12. Y. Zhang, X. Hou, Y. Lv, H. Chen, Y. Zhang, and S. Wang, “Sparse Autoencoder Based Deep Neural Network for Voxelwise Detection of Cerebral Microbleed,” in Proceedings of the 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), pp. 1229–1232, Wuhan, December 2016. View at: Publisher Site | Google Scholar
  13. S. Athreya and Y. D. Venkatesh, Application Of Taguchi Method For Optimization Of Process Parameters In Improving The Surface Roughness Of Lathe Facing Operation, vol. 1, no. 3, pp. 13–19, 2012.
  14. M.-L. Huang, Y.-H. Hung, W. M. Lee, R. K. Li, and B.-R. Jiang, “SVM-RFE based feature selection and taguchi parameters optimization for multiclass SVM Classifier,” The Scientific World Journal, vol. 2014, Article ID 795624, 2014. View at: Publisher Site | Google Scholar
  15. D.-C. Ko, D.-H. Kim, and B.-M. Kim, “Application of artificial neural network and Taguchi method to preform design in metal forming considering workability,” The International Journal of Machine Tools and Manufacture, vol. 39, no. 5, pp. 771–785, 1999. View at: Publisher Site | Google Scholar
  16. H. Wang, Q. Geng, and Z. Qiao, “Parameter tuning of particle swarm optimization by using Taguchi method and its application to motor design,” in Proceedings of the 2014 4th IEEE International Conference on Information Science and Technology (ICIST), pp. 722–726, Shenzhen, China, April 2014. View at: Publisher Site | Google Scholar
  17. G. K. Dhuria, R. Singh, and A. Batish, “Application of a hybrid Taguchi-entropy weight-based GRA method to optimize and neural network approach to predict the machining responses in ultrasonic machining of Ti–6Al–4V,” Journal of the Brazilian Society of Mechanical Sciences and Engineering, vol. 39, no. 7, pp. 2619–2634, 2017. View at: Publisher Site | Google Scholar
  18. H.-F. Yang, T. S. DIllon, and Y.-P. P. Chen, “Optimized Structure of the Traffic Flow Forecasting Model with a Deep Learning Approach,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2371–2381, 2017. View at: Publisher Site | Google Scholar
  19. L. Wang, C. Wang, W. Du et al., “Parameter optimization of a four-legged robot to improve motion trajectory accuracy using signal-to-noise ratio theory,” Robotics and Computer-Integrated Manufacturing, vol. 51, pp. 85–96, 2018. View at: Publisher Site | Google Scholar
  20. R. Huang, C. Liu, G. Li, and J. Zhou, “Adaptive Deep Supervised Autoencoder Based Image Reconstruction for Face Recognition,” Mathematical Problems in Engineering, vol. 2016, 2016. View at: Google Scholar
  21. D. Jha and G. Kwon, “Alzheimer's Disease Detection Using Sparse Autoencoder, Scale Conjugate Gradient and Softmax Output Layer with Fine Tuning,” International Journal of Machine Learning and Computing, vol. 7, no. 1, pp. 13–17, 2017. View at: Publisher Site | Google Scholar
  22. F. Cui, Y. Su, S. Xu, F. Liu, and G. Yao, “Optimization of the physical and mechanical properties of a spline surface fabricated by high-speed cold roll beating based on taguchi theory,” Mathematical Problems in Engineering, vol. 2018, Article ID 8068362, pp. 1–12, 2018. View at: Publisher Site | Google Scholar
  23. M. Alkasassbeh, G. Al-Naymat, A. B.A, and M. Almseidin, “Detecting Distributed Denial of Service Attacks Using Data Mining Techniques,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 1, 2016. View at: Publisher Site | Google Scholar
  24. Y. Ju, J. Guo, and S. Liu, “A Deep Learning Method Combined Sparse Autoencoder with SVM,” in Proceedings of the 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 257–260, Xi'an, China, September 2015. View at: Publisher Site | Google Scholar
  25. S. M. Hosseini Bamakan, H. Wang, and Y. Shi, “Ramp loss K-Support Vector Classification-Regression; a robust and sparse multi-class approach to the intrusion detection problem,” Knowledge-Based Systems, vol. 126, pp. 113–126, 2017. View at: Publisher Site | Google Scholar
  26. C. Khammassi and S. Krichen, “A GA-LR wrapper approach for feature selection in network intrusion detection,” Computers & Security, vol. 70, pp. 255–277, 2017. View at: Publisher Site | Google Scholar
  27. V. Srinivasan, C. Eswaran, and A. N. Sriraam, “Artificial neural network based epileptic detection using time-domain and frequency-domain features,” Journal of Medical Systems, vol. 29, no. 6, pp. 647–660, 2005. View at: Publisher Site | Google Scholar
  28. A. Subasi and E. Erçelebi, “Classification of EEG signals using neural network and logistic regression,” Computer Methods and Programs in Biomedicine, vol. 78, no. 2, pp. 87–99, 2005. View at: Publisher Site | Google Scholar
  29. A. Subasi, “EEG signal classification using wavelet feature extraction and a mixture of expert model,” Expert Systems with Applications, vol. 32, no. 4, pp. 1084–1093, 2007. View at: Publisher Site | Google Scholar
  30. N. Kannathal, M. L. Choo, U. R. Acharya, and P. K. Sadasivan, “Entropies for detection of epilepsy in EEG,” Computer Methods and Programs in Biomedicine, vol. 80, no. 3, pp. 187–194, 2005. View at: Publisher Site | Google Scholar
  31. A. T. Tzallas, M. G. Tsipouras, and D. I. Fotiadis, “Automatic seizure detection based on time-frequency analysis and artificial neural networks,” Computational Intelligence and Neuroscience, vol. 2007, Article ID 80510, 13 pages, 2007. View at: Publisher Site | Google Scholar
  32. K. Polat and S. Güneş, “Classification of epileptiform {EEG} using a hybrid system based on decision tree classifier and fast Fourier transform,” Applied Mathematics and Computation, vol. 187, no. 2, pp. 1017–1026, 2007. View at: Publisher Site | Google Scholar | MathSciNet
  33. U. Rajendra Acharya, S. Vinitha Sree, A. P. C. Alvin, and J. S. Suri, “Use of principal component analysis for automatic classification of epileptic EEG activities in wavelet framework,” Expert Systems with Applications, vol. 39, no. 10, pp. 9072–9078, 2012. View at: Publisher Site | Google Scholar
  34. U. R. Acharya, S. V. Sree, P. C. A. Ang, R. Yanti, and J. S. Suri, “Application of non-linear and wavelet based features for the automated identification of epileptic EEG signals,” International Journal of Neural Systems, vol. 22, no. 2, Article ID 1250002, pp. 565–579, 2012. View at: Publisher Site | Google Scholar
  35. M. Peker, B. Sen, and D. Delen, “A novel method for automated diagnosis of epilepsy using complex-valued classifiers,” IEEE Journal of Biomedical and Health Informatics, vol. 20, no. 1, pp. 108–118, 2016. View at: Publisher Site | Google Scholar
  36. A. Kaushik, H. Gupta, and D. S. Latwal, “Impact of Feature Selection and Engineering in the Classification of Handwritten Text,” pp. 2598–2601, 2016. View at: Google Scholar
  37. L. Schomaker and M. A. Wiering, “Handwritten Character Classification Using The Hotspot Feature Extraction Technique,” in Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, pp. 261–264, Vilamoura, Algarve, Portugal, Feburary 2012. View at: Publisher Site | Google Scholar
  38. U. R. Babu, Y. Venkateswarlu, and A. K. Chintha, “Handwritten digit recognition using K-nearest neighbour classifier,” in Proceedings of the World Congress on Computing and Communication Technologies (WCCCT '14), pp. 60–65, Trichirappalli, India, March 2014. View at: Publisher Site | Google Scholar
  39. G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets , Neural computation,” in Neural Computation, vol. 18, pp. 1527–1554, 2006. View at: Publisher Site | Google Scholar
  40. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2323, 1998. View at: Publisher Site | Google Scholar
  41. L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus, “Regularization of neural networks using DropConnect,” in Proceedings of the 30th International Conference on Machine Learning, vol. 28, Atlanta, Georgia, USA, 2013. View at: Google Scholar
  42. M. F. Zelier and R. Fergus, “Stochastic pooling for regularization of deep CNN,” in Proceedings of the In proc. International Conference on Learning Representations, Scottsdale, USA, 2013. View at: Google Scholar
  43. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, Cambridge, Mass, USA, 2016. View at: MathSciNet
  44. C. Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervised nets,” Deep Learning and Representation Learning Workshop, 2014. View at: Google Scholar
  45. Y. Yan, Z. Tan, N. Su, and C. Zhao, “Building extraction based on an optimized stacked sparse autoencoder of structure and training samples using LIDAR DSM and optical images,” Sensors, vol. 17, no. 9, 2017. View at: Google Scholar
  46. Q.-C. Hsu and A. T. Do, “Minimum porosity formation in pressure die casting by taguchi method,” Mathematical Problems in Engineering, vol. 2013, 2013. View at: Google Scholar
  47. M. Peker, “A new approach for automatic sleep scoring: Combining Taguchi based complex-valued neural network and complex wavelet transform,” Computer Methods and Programs in Biomedicine, vol. 129, pp. 203–216, 2016. View at: Publisher Site | Google Scholar
  48. M. Peker, B. Şen, and P. Y. Kumru, “An efficient solving of the traveling salesman problem: The ant colony system having parameters optimized by the Taguchi method,” Turkish Journal of Electrical Engineering & Computer Sciences, vol. 21, no. 1, pp. 2015–2036, 2013. View at: Publisher Site | Google Scholar
  49. M. Nandhini, B. Suchithra, R. Saravana¬thamizhan, and D. G. Prakash, “Optimization of parameters for dye removal by electro- -oxidation using Taguchi Design,” Journal of Electrochemical Science and Engineering, vol. 4, no. 4, 2014. View at: Publisher Site | Google Scholar
  50. L. Ivanović, B. Stojanović, J. Blagojević, G. Bogdanović, and A. Marinković, “Analysis of the flow rate and the volumetric efficiency of the trochoidal pump by application of taguchi method,” Tehnički vjesnik, vol. 24, pp. 265–270, 2017. View at: Google Scholar
  51. N. Hoque, H. Kashyap, and D. K. Bhattacharyya, “Real-time DDoS attack detection using FPGA,” Computer Communications, vol. 110, pp. 48–58, 2017. View at: Publisher Site | Google Scholar
  52. B. B. Gupta, “Predicting number of zombies in DDoS attacks using pace regression model,” Journal of Computing and Information Technology, vol. 20, no. 1, pp. 33–39, 2012. View at: Google Scholar
  53. N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” in Proceedings of the Military Communications and Information Systems Conference, pp. 1–6, 2015. View at: Google Scholar
  54. S. Yücel, P. Terzioğlu, and D. Özçimen, “World™ s largest Science , Technology & Medicine Open Access book publisher c,” RFID Technol. Secur. Vulnerabilities, Countermeas, 2016. View at: Google Scholar
  55. F. Mormann, R. G. Andrzejak, C. E. Elger, and K. Lehnertz, “Seizure prediction: the long and winding road,” Brain, vol. 130, no. 2, pp. 314–333, 2007. View at: Publisher Site | Google Scholar
  56. P. Evidence, B. Eb, and P. Neprovociranom, “Evidence Base (Eb) Aproach to The First Unprovoked”. View at: Google Scholar
  57. C. J. James and B. E. Eng, Detection of epileptiform activity in the electroencephalogram using artificial neural networks, 1997.
  58. A. M. Karim, F. V. Çelebi, and A. S. Mohammed, “Software Development for Blood Disease Expert System,” Lecture Notes on Empirical Software Engineering, vol. 4, no. 3, pp. 179–183, 2016. View at: Google Scholar
  59. A. Khemphila and V. Boonjing, “Heart disease classification using neural network and feature selection,” in Proceedings of the 21st International Conference on Systems Engineering (ICSEng '11), pp. 406–409, IEEE Computer Society, Las Vegas, Nev, USA, August 2011. View at: Publisher Site | Google Scholar
  60. R. G. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David, and C. E. Elger, “Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 64, no. 6, Article ID 061907, 8 pages, 2001. View at: Publisher Site | Google Scholar

Copyright © 2018 Ahmad M. Karim et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

1938 Views | 1065 Downloads | 10 Citations
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.