Abstract

Extreme learning machine algorithm proposed in recent years has been widely used in many fields due to its fast training speed and good generalization performance. Unlike the traditional neural network, the ELM algorithm greatly improves the training speed by randomly generating the relevant parameters of the input layer and the hidden layer. However, due to the randomly generated parameters, some generated “bad” parameters may be introduced to bring negative effect on the final generalization ability. To overcome such drawback, this paper combines the artificial immune system (AIS) with ELM, namely, AIS-ELM. With the help of AIS’s global search and good convergence, the randomly generated parameters of ELM are optimized effectively and efficiently to achieve a better generalization performance. To evaluate the performance of AIS-ELM, this paper compares it with relevant algorithms on several benchmark datasets. The experimental results reveal that our proposed algorithm can always achieve superior performance.

1. Introduction

In recent years, many computational intelligence techniques, such as neural networks and support vector machines (SVMs) [1], have been widely used in many real-world applications. However, those algorithms face some defects such as slow learning speed, trivial human intervention, and poor computational scalability.

Recently, to solve the drawbacks mentioned above, Huang et al. [25] proposed a new method named extreme learning machine (ELM) which has attracted ever-growing research attention. In contrast to the traditional neural networks such as BP [6], ELM is a tuning-free algorithm with fast learning speed by randomly generating input weights and hidden biases. With the help of least square method and Moore-Penrose generalized inverse, the ELM is transferred as a linear learning system. In addition, ELM is theoretically proved to have a good generalization performance with least human intervention. Therefore, ELM is widely used in many fields [5]. For example, Chaturvedi et al. [7] extended the extreme learning machine (ELM) paradigm to a novel framework that exploits the features of both Bayesian networks and fuzzy recurrent neural networks to perform subjectivity detection. Gastaldo et al. [8] addressed the specific role played by feature mapping in ELM. Cambria et al. [9] explored how the high generalization performance, low computational complexity, and fast learning speed of extreme learning machines can be exploited to perform analogical reasoning in a vector space model of affective common-sense knowledge. Recently Ragusa et al. [10] tackled the implementation of single hidden layer feedforward neural networks (SLFNs), based on hard-limit activation functions, on reconfigurable devices.

It is known that an appropriate selection of initial weight sets is very vital for training a neural network model [11]. There is a strong correlation between the final solution and the initial weight. However, due to the random determination of some learning parameters, some nonoptimal parameter may be introduced to the model [5], which may put negative impact on the final performance. To solve such a drawback, many relative works have been proposed in the past ten years. A straightforward way is to combine evolutionary methods with ELM [12]. For instance, Zhu et al. [13] utilized differential evolutionary algorithm (DE) to optimize ELM’s generated parameters to achieve better performance. In [14], Xue et al. combined genetic algorithm (GA), ELM, and ensemble learning to get a better and stable result. Rather than using GA or DE method, Sarasw athi et al. presented a PSO driven ELM [15], combining with Integer Coded Genetic Algorithm (ICGA) to solve gene selection and cancer classification. In [16], Cao et al. proposed an improved learning algorithm named self-adaptive evolutionary extreme learning machine (SaE-ELM). Similarly, Wu et al. presented a novel algorithm named dolphin swarm algorithm extreme learning machine (DSA-ELM) [17] to solve optimization problems.

However, all the above evolutionary algorithms have different search efficiency to optimize the problem. There is still much space to improve. For example, it is one of the biggest challenges in ELM that some nonoptimal parameters may be introduced to ELM algorithm due to the random generation of parameters. To overcome that challenge, in this paper we propose a new extreme method named artificial immune system extreme learning machine (AIS-ELM). Because artificial immune system (AIS) [1820] has global search ability [21] and good convergence [22], it can solve some difficulties like slow convergence, getting stuck in local minima, etc. Therefore, we use AIS to optimize ELM to get a better initial weight sets capable of avoiding the training process falling into the local optimum. The original version and preliminary results of this paper’s method were proposed by us in ELM2017 [23]. In this paper we have revised the original formulas, compared the AIS-ELM with more algorithms and added new expressions, regression validation and more datasets.

The rest of the paper is arranged as follows. Sections 2 and 3 briefly describe the traditional ELM and AIS methods. Section 4 proposes the detailed description of AIS-ELM. Section 5 carries out corresponding experiment: AIS-ELM algorithm is compared with traditional ELM, PSO-ELM, SaE-ELM, and DSA-ELM on five regression problems and eight classification benchmark problems obtained from the UCI Machine Learning Repository [24]; the training times between AIS-ELM and BP and SVM and traditional ELM are compared on three benchmark classification problems. The last section gives a conclusion of this paper.

2. Extreme Learning Machine

This section will introduce the extreme learning machine [25] proposed by Professor Huang. ELM is developed from a single hidden layer feedforward network and is extended to a generalized single hidden layer feedforward network. Compared to other conventional learning algorithms, the extreme learning algorithm’s advantage is that the nodes of the single hidden layer feedforward network need not be adjusted.

Compared with the traditional learning algorithm, the extreme learning machine not only has the smaller error but can reach the smallest norm of weights [5]; because the hidden layer need not to be adjusted in the limit learning machine algorithm, the output weight matrix can be solved by the least squares method.

For arbitrary training samples , where and , and given activation function , the standard mathematical model of SLFNs with hidden nodes is modeled as follows:where is the weight vector connecting the input neurons and hidden neuron, is the weight vector connecting the   hidden neuron and the output neurons, and is the threshold of the hidden neuron.

That standard SLFNs with hidden neurons given activation function can approximate these samples with zero error which means thatThere exist , , and such thatThe above N equations can be written compactly aswhereHere, is called the hidden layer output matrix [3].The column of is the hidden node’s output vector with respect to inputs and the row of H is the output vector of the hidden layer with respect to . Then the vector (connecting the hidden layer with the output layer) is estimated using the Moore-Penrose generalized inverse of the matrix :

ELM algorithm can be summarized as shown in Algorithm 1.

Step 1 Randomly generate the input weights and hidden biases
Step 2 Calculate the hidden-layer output matrix
Step 3 Compute the output weights matrix as

3. Artificial Immune System

A relatively new area of bioinspired computing is Artificial Immune Systems (AIS). It is inspired by biological models of the natural immune system which has the properties of diversity, distributed computation, dynamic learning, error tolerance, adaptation, and self-monitoring. AIS can be applied to many domains [19] in principle for the reason that it is a general framework for a distributed adaptive system. This section will introduce the AIS in three aspects. Firstly, the clonal selection algorithm is briefly described in Section 3.1. Secondly, Section 3.2 introduces the mathematical model of BCA. Finally, the mathematical model describing the interaction of Antigen-Antibody is represented in Section 3.3.

3.1. Mathematical Model of BCA

Each B cell is modeled as binary strings of fixed length for simplicity of calculation. One of the most important design choices in developing an Artificial Immune Systems algorithm is similarity measure or matching rule [25], and it is closely coupled to the encoding scheme. Hamming distance and edit distance are the obvious approximate matching rules.

However, there is a more immunologically plausible rule, called r-contiguous bits [26]: two strings will match if they have r-contiguous bits in common (see Figure 1). The value is a threshold which serves as indication of the size of the subset of strings that a single string can match. For example, if , the matching is completely special; i.e., the string will match only a single string (itself), but if , the matching is absolutely general; that is, the string will match every single string of length .

Besides, in the algorithm a contiguous region hypermutation operator [27] is used and its form iswhere is the probability of transition from zero to some number ; is the length of the binary string; is the bit position of the first “flip” bit starting from the most significant bit; is the bit position of the last “flip” bit starting from the most significant bit; is the number of bits that must be flipped to mutate from 0 to . is the mutation probability of a bit given a contiguous region.

3.2. Clonal Selection Theory

The clonal selection theory (CST) [28] is used to explain how the adaptive immune system responds to an antigenic stimulus basically. It establishes the theory that only cells that are capable of recognizing an antigen will proliferate, while those that are incapable of doing so will be eliminated.

Both T cells and B cells can operate clonal selection. In the case of B cells, when the antigen receptors bind with an antigen, B cells begin to clone themselves and undergo somatic hypermutation to introduce diversity into the B cell population. After that B cells become activated and differentiate into plasma or memory cells. Plasma cells produce numerous antigen-specific antibodies leading to the removal of the antigen in a successful immune response. Memory cells remain within the host and promote a rapid secondary response when encountering the same (or similar) antigen. This is the operation of acquired immunity [19].

The B Cell Algorithm (BCA) as a simple clonal selection method was introduced in [22]. An outline of BCA is shown in Algorithm 2.

Step 1 Initialization: create an initial random population of individuals P
Step 2 Main loop: :
 (a) Affinity Evaluation: evaluate g(v);
 (b) Clonal Selection and Expansion:
 (i) Clone each B-cell: clone v and place in clonal pool C;
 (ii) Select a random member of and apply the contiguous region hypermutation operator
 (iii) Evaluate ; if then replace by clone
Step 3 Cycle: repeat step until a certain stopping criterion is met.
3.3. Shape Space

An abstract model describing the interaction of Antigen-Antibody is introduced by Perelson & Oster [29]. In this model, it is assumed that the characteristics of the antibody receptor (combined region) associated with antigen binding can be described by specifying a total of L shape parameters. It is also assumed that the same L parameters can be used to describe the antigen. These L parameters are incorporated into the vector, and the antibody receptor and antigenic determinant are described as Ab and Ag points, respectively, in an L–Euclidean shape space. Each molecule can be considered as a point in the L-dimensional real space mathematically and the affinity of Ag-Ab is related to the reciprocal of the Euclidean distance between them.

It is assumed that the antibody is capable of binding to any antigenic complement in the distance (stimulus region). Each dimensional ball of radius takes up a volume , where is a constant which depends upon the dimensionality of N (for arbitrary , , where is the Gamma function). If there are a total of antibodies, its total coverage volume would not be greater than since balls would overlap. Let us assume is an n-dimensional cube with edge length R. The total volume of is then .

The goal is to maximize the coverage of antibody which can make the immune approach more reliable. Then the following equation must come into existence:Therefore, the range of is as follows:where .

4. Proposed Extreme Learning Machine Based on Artificial Immune System

This section proposes an Extreme learning machine based on artificial immune system, namely, AIS-ELM. Traditional ELM algorithm randomly generates input weights and hidden biases, and among them there may be some sets of nonoptimal input weights and hidden biases. It is necessary to optimize these nonoptimal input weights and hidden biases. Two methods can be used to solve this problem. One is to increase hidden neurons which is time-consuming and may not get a good result. The other is to optimize the input weights and hidden biases.

This paper combines AIS with ELM to optimize the input weights and hidden biases. AIS-ELM has three main phases: clone phase, mutation phase, and substitution phase. After the three phases, an optimal antibody will be produced. And the performance of ELM will be improved if the optimal antibody is used as the input weights and hidden biases.

One set of input weights and hidden biases are modeled by an antibody; the antibody is represented by

where , is the number of training data and the number of population members. is the number of hidden nodes and is the dimension of input samples. are the input weights. are the hidden biases. The initial values of and are randomly generated within the range of . Then we calculate each antibody’s fitness according to the following equation with the validation data .where is the validation data. The reason for using validation data instead of training data is to alleviate possible overfitting. The corresponding output weights are computed by using the MP generalized inverse by (7).

The clone section creates a clone pool having N-1 clonal antibodies for every antibody , and each clonal antibody is identical to the original antibody, i.e., .

In the mutation procedure, each clonal antibody in the clone pool is mutated by the following formula:where and is the mutation probability of the clonal antibody.Where the following holds.

(1) avoids the situation in which the directions of mutation are the same and the result falls into local optimal.

(2) is as follows:The above equation is an application of (8), where is the total elements of the antibody and ; is the probability of transition from zero to some number ; is the bit position of the first “on” bit starting from the most significant bit; is the bit position of the last “on” bit starting from the most significant bit; is the number of bits that must be flipped to mutate from 0 to ; is the mutation probability of a bit given a contiguous region.

(3) is used to adjust the range of mutation. The smaller the value of fitness is, the smaller the error is, so the requirement of mutation changes is tinier. On the other hand, the greater the value of fitness is, the bigger the need for mutation changes will be.

(4) is stimulus region in which the antibody is capable of binding to any antigenic complement [29].

The substitution phase is to calculate each clonal antibody’s fitness in the clone pool, and to compare with the cloned antibody’s fitness . If is smaller than , corresponding fitness and antibody will be replaced. For instance, if , and , it is necessary to replace and with and . After this iterative process, the antibody population evolves forward global optimization. Then an antibody with minimal fitness which indicates smallest error is the optimal antibody.

In the above process, our algorithm uses the clonal selection principle to ensure diversity which has been proved by De Castro et al. [30]. In addition, the ELM is optimized by BCA to get a better convergence which has been proved by Clark et al. [22] through an exact Markov chain model. Besides, using to adjust the mutation matches up the theory of immune network. Last but not least, the requirement of shape space is also satisfied.

In the specific experiment process, a number of other algorithms have to be compared, so the input data should be normalized to ensure fairness.Then the stop criterion is as follows:where is the mean of the group of z.

All in all, the AIS-ELM have three parts. The first part is initialization including input data, normalization, and set parameters. The second part applies AIS to ELM. After cloning, mutation, and substitution phase, an optimal antibody meeting the requirements is acquired. Then the antibody can be used as the input weights and hidden biases in ELM. The AIS-ELM is presented in Algorithm 3.

Step 1 Initialization
Randomly generate the initial antibody population
where .
Then calculate the fitness of by Eq.(12),
and get .
Step 2 Clone Selection
while the stop criterion is not met with do
    Step 2.1 Clone Phase
    For each antibody clone N-1 antibody, the clone pool named where
    =
    Step 2.2 Mutation Phase
    For each clone antibody , where and
    
    
    
    where is computed by Eq.(10).
    Then compute the fitness of by Eq.(12).
    And get .
    Step 2.3 Substitution Phase
    For each antibody , compare and
    For
    
    
end while
The final antibody population is and with the smallest fitness is the best antibody . Then
is used to opitimize the weight.
Step 3 ELM
Calculate the hidden-layer output matrix with the set of input weights and hidden biases represented by the .
Compute the output weights matrix .

5. Performance Verification

In this section, AIS-ELM is compared with DS-ELM, PSO-ELM, SaE-ELM, traditional ELM, SVM, and BP. The experiments are divided into two parts. In the first part, the first five algorithms are tested on eight benchmark classification problems; next we compare AIS-ELM with SVM, BP, and traditional ELM on training time on three benchmark classification problems. In the second part, five benchmark regression problems are carried out. The experimental environment is MATLAB R2014b running on a windows pc with Intel 2.7 GHz CPU and 8GB RAM.

All the inputs have been normalized into the range for fairness. The number of hidden neurons depends on different problems and it will be listed in specific experiment. Besides, the parameters for AIS-ELM are set as follows: .

5.1. Classification

In this subsection, five algorithms’ performances on eight benchmark classification problems are evaluated. The eight datasets are Ecoli, Pima Indians Diabetes (Diabetes), Epileptic Seizure, Iris, Heart Disease, Glass Identification (Glass), Image Segmentation (Image), and Statlog (Satellite), respectively. The detailed description of the eight datasets is listed in Table 1.

Attributes of all the dataset have been normalized to , and the output is the training time, testing accuracy’s mean and variance. A 20-fold cross validation method is taken to get the average of 20 repeated experiments to minimize the error. The whole dataset is divided into training set, validation set, and testing set without overlap. And the three sets are kept coincident for each trial of the five algorithms. The results are shown in Table 2, and the best results are emphasized in bold font.

Considering the training time, it is obvious that ELM is the fastest one because all the other four algorithms transfer ELM repeatedly. Besides, AIS-ELM’s training time is slightly shorter than the other three methods because the times of ELM iteration are smaller than other three algorithms.

Then, focusing on the testing accuracy, it is easy to find that AIS-ELM has the highest mean testing accuracy in all the classification datasets. As for variance, AIS-ELM is the smallest in most instances and is slightly worse than the best one in a few cases. In addition, the good convergence property of clone selection algorithm shows that AIS-ELM outperforms the DSA-ELM, PSO-ELM, SaE-ELM, and ELM.

In addition, we have done the Wilcoxon’s signed-rank test [31], and the W-value is 0, which is less than the critical value at p<=0.05. Therefore, the results show that AIS-ELM is significantly different from DS-ELM, PSO-ELM, SaE-ELM, and ELM, which indicates that AIS-ELM outperforms the other four approaches on eight classification datasets.

Besides, we compare the training time between AIS-ELM and BP, SVM, and traditional ELM on three benchmark classification problems. The results are shown in Table 3.

From Table 3, although AIS-ELM is slower than traditional ELM because of iterations, its training speed is significantly faster than that of BP and SVM.

5.2. Regression

In this subsection, the five algorithms are compared on the five regression benchmark problems. The five datasets are Breast Cancer, Parkinson, SinC, Servo, and Yacht Hydro (Yacht), respectively. Detailed description of the five datasets is shown in Table 4.

Attributes of all the datasets have been normalized to and we focus on the training time and testing accuracy’s means and variance. A 20-fold cross validation method is taken to get the average of 20 repeated experiments to minimize the error. The whole dataset is divided into training set, validation set, and testing set without overlap. The results are shown in Table 5.

Considering the training time, it is also obvious that traditional ELM is the fastest because of the same reason in 5.1. As for the testing accuracy, AIS-ELM, DS-ELM, PSO-ELM, and SaE-ELM obtain better results with less hidden nodes than ELM, which means that AIS-ELM, DS-ELM, PSO-ELM, and SaE-ELM can achieve better generalization performances with more compact networks. And the RMSE of AIS-ELM is smaller than other four algorithms. Therefore, it can be concluded that AIS-ELM can achieve better performance than other four algorithms on regression problems.

6. Conclusion

In this paper, first we introduce the standard ELM and artificial immune system; then we propose a new approach named artificial immune system extreme learning machine (AIS-ELM). In AIS-ELM, AIS is used to optimize the input weights through clone, mutation, and substitution process.

In the experiment part of this paper, AIS-ELM is compared with DS-ELM, PSO-ELM, SaE-ELM, and the traditional ELM on thirteen well-known benchmark datasets (eight classification datasets and five regression datasets) obtained from UCI Machine Learning Repository. Besides, the training times between AIS-ELM and BP, SVM, and traditional ELM are compared on three benchmark classification problems. Experimental results show that AIS-ELM can achieve better testing results (smaller RMSE on regression and higher accuracy on classification) than other DS-ELM, PSO-ELM, SaE-ELM, and the traditional ELM in most cases, and its training speed is significantly faster than that of BP and SVM. According to the global search ability [21] and good convergence [22] of AIS, our artificial immune system extreme learning machine is superior to the other methods both on the eight classification datasets and five regression datasets in the experiments. In addition, there are six medical datasets among the thirteen datasets, which can prove that AIS-ELM can also play an excellent role in healthcare. Future research works will be concentered on how to apply the current immune system to some new directions, such as NLP and computer vision.

Data Availability

The classification datasets and regression datasets supporting the findings of this study are from previously reported studies and datasets, which have been cited. The processed data are available at UCI Machine Learning Repository [Online] (http://archive.ics.uci.edu/ml).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work is supported by National Key Research and Development Plan under Grant no. 2016YFB1001203.