Abstract

This paper presents a grammatical evolution (GE)-based methodology to automatically design third generation artificial neural networks (ANNs), also known as spiking neural networks (SNNs), for solving supervised classification problems. The proposal performs the SNN design by exploring the search space of three-layered feedforward topologies with configured synaptic connections (weights and delays) so that no explicit training is carried out. Besides, the designed SNNs have partial connections between input and hidden layers which may contribute to avoid redundancies and reduce the dimensionality of input feature vectors. The proposal was tested on several well-known benchmark datasets from the UCI repository and statistically compared against a similar design methodology for second generation ANNs and an adapted version of that methodology for SNNs; also, the results of the two methodologies and the proposed one were improved by changing the fitness function in the design process. The proposed methodology shows competitive and consistent results, and the statistical tests support the conclusion that the designs produced by the proposal perform better than those produced by other methodologies.

1. Introduction

Artificial neural networks (ANNs) have been successfully used in theoretical and practical fields to solve several kinds of problems (e.g., classification [1, 2], robotic locomotion [3, 4], and function approximation [5, 6]). Basically, ANNs are characterized by computing units which are interconnected through communication links that serve to send and/or receive messages of some data type [7]; these elements define what is known as their architecture or topology. There can be distinguished three generations of ANNs according to their computing units [8], which are capable to solve problems of digital (ANNs from 1st to 3rd generation), analogical (ANNs from 2nd to 3rd generation), and spatiotemporal (ANNs from 3rd generation) nature. The first generation is based on threshold units such as McCulloch–Pitts neurons [9] or perceptrons [10]. The second generation is based on computing units that apply continuous activation functions (e.g., sigmoid or hyperbolic tangent functions); ANNs of this generation can be trained with gradient descent-based algorithms such as the backpropagation learning rule [11]. The third generation is based on spiking neurons (see [12] for a detailed reference) such as integrate and fire model [13] or Hodgkin–Huxley neuron [14]; ANNs of this generation are known as spiking neural networks (SNNs), and these are the kinds of ANNs worked in this paper.

Usually, the implementation of an ANN to solve a specific problem, regardless the generation it belongs to, requires of human experts who define the ANN’s topological elements, the learning rule, and its parameters, among other design criteria. The experts perform such a design process either empirically, following some rule of thumb or by trial and error; this is due because there is a lack of a well-established methodology to set up the ANN design for a given problem. It is well-known that the good performance of ANNs is strongly related to their design and related criteria; thus, design of an ANN may stand for a challenge. Several studies have explored the learnability issues of ANNs related to their design; for example, combinatorial problems that arise related to the design of feedforward ANNs [15] or the problems that ANNs with fixed architectures may face when learning a specific problem [7, 1619]. Insights have been given to ease or enhance learnability of ANNs, for example, by applying constraints to the task to be learned or to the ANN’s architecture [7, 15]. As an example of constraints applied to the ANNs’ architecture, partially connected ANNs have shown equal or better performance than their fully connected version; among other interesting benefits, there are reduction of the network complexity and its training and recall times [20]. Another insight is to develop algorithms capable of changing the architecture of an ANN during the learning process [7, 15].

Nowadays, evolutionary artificial neural networks (EANNs) are a special class of ANNs which are the result of using evolutionary algorithms (EAs), or other kinds of metaheuristic methods, for adapting the design of ANNs according to a specific task or problem; this is achieved by optimizing one or several of their design criteria (also the term Neuroevolution has been used to refer to this kind of design method). Thus, the EANNs, in some manner, allow us to avoid or overcome the learnability issues related to ANN architectures and to prescind, partially or completely, of human experts (see [2124] for comprehensive reviews). There are four main approaches of deploying EANNs [25] by means of weight optimization [2628], topology structure optimization [25, 2931], weight and topology structure optimization [3238], and learning rule optimization [39, 40]. Most of the work made on EANNs is focused on deploying ANNs from the first and second generations.

Recently, efforts to use SNNs for solving real problems from engineering and industry are increasing because of interesting characteristics of spiking neurons, such as their greater computational power than that of less plausible neuron models and SNNs can solve problems with fewer computing units than those of ANNs from previous generations [19, 41]. Although there are learning rules to adapt parameters of SNNs, such as SpikeProp [42], the use of metaheurstic algorithms is a common practice to adapt their parameters or define design criteria because they overcome drawbacks of such learning rules [43] and allow us to handle the greater variety of design criteria (parameters of neuron models and synapses, types of synapses, topology’s wiring patterns, encoding scheme, etc.) that these kinds of ANNs present; in this work, the combination of SNN and metaheuristic algorithms is referred as evolutionary spiking neural networks (ESNNs). In [4448], the synaptic weights of a single spiking neuron, e.g., integrate and fire model [13] or Izhikevich model [49], are calibrated by means of algorithms such as differential evolution (DE) [50], particle swarm optimization (PSO) [51], cuckoo search algorithm (CSA) [52], or genetic algorithm (GA) [53] to perform classification tasks; the spiking neuron performs the classification by using the firing rate encoding scheme as the similarity criterion in order to assign the class to which an input pattern belongs. Other works, in [43, 5457], three-layered feedforward SNNs with synaptic connections were implemented, which are formed by a weight and a delay, to solve supervised classification problems through the use of time-to-first-spike as a classification criterion; in these works, the training has been carried out by means of evolutionary strategy (ES) [58, 59] and PSO algorithms. An extension of previous works is made in [60, 61], where the number of hidden layers and their computing units are defined by grammatical evolution (GE) [62] besides the metaheuristic learning. More complex SNN frameworks have been developed and trained with metaheuristics (such as ES) to perform tasks such as visual pattern recognition, audio-visual pattern recognition, taste recognition, ecological modelling, sign language recognition, object movement recognition, and EEG spatio/spectrotemporal pattern recognition (see [63] for a review of these frameworks). The robotic locomotion is solved through SNNs designed by metaheuristics in [60, 64, 65]; in these works, both the connectivity pattern and synaptic weights of each Belson–Mazet–Soula (BMS) [66] neuron model into SNNs called spiking central pattern generators (SCPGs) are defined through GE or Christiansen grammar evolution (CGE) [67] algorithms; all individual designs are integrated to define the SCPGs that allow the locomotion of legged robots.

The present paper proposes a design methodology for three-layered feedforward ANNs of the third generation for solving supervised classification problems. The design methodology incorporates partial connectivity between input and hidden layers, which contribute to reduce the topological complexity of the ESNNs; in addition, partial connectivity may also contribute to reduce the number of features of the input vector, thus indirectly performing dimensionality reduction. The proposal explores the search space of three-layered feedforward topologies with configured synaptic connections; thus an explicit learning process is not required. This kind of design methodology has been previously proposed for ANNs from first and second generations, and they can be considered as a design of composed functions. To the best of the authors’ knowledge, this is the first attempt to perform the design of SNNs that define the number of computing units and their configured connectivity patterns (weights and delays). The rest of the paper is organized as follows: Section 2 explains the proposed methodology and its constituent methods. The experimental configuration of the proposal and other methodologies used for comparison and their results are in Section 3. In Section 4, the results of the proposed methodology are statistically compared to those of other methodologies. Finally, Section 5 contains the conclusion of the paper and future work based on it.

2. Design Methodology and Concepts

This paper proposes a framework to design partially connected spiking neural networks (SNNs) for solving supervised classification problems. Such proposed framework requires the following elements: a temporal encoding scheme to transform original input data into a suitable form for the network; a context-free grammar in Backus–Naur form (BNF grammar) to guide the generation of neural network words and a mapping process to transform genotype of individuals into functional network designs; a fitness function and a target definition to determine the performance of proposed networks, and a search engine to optimize the solutions. A general diagram of the methodology can be seen in Figure 1.

2.1. Spiking Neural Networks

The spiking neural networks (SNNs) constitute the third generation of ANNs because of the inclusion of the firing time component in their computation process [8].

2.1.1. Spike Response Model

The spike response model (SRM) is employed in this framework as basis for the SNN. The SRM fires (i.e., produces a spike) whenever the state of its membrane potential surpasses the firing threshold (θ). In the SRM, its membrane potential is calculated through time as a linear summation of postsynaptic potentials (PSPs) (excitatories and/or inhibitories), which are caused by impinging spikes arriving to a neuron through its presynaptic connections (Figure 2); each PSP is weighted and delayed by its synaptic connection.

The membrane potential x of neuron j at time t is calculated as the weighted () summation of contributions () from its connecting presynapses (), as in the following equation:

The unweighted contribution is described by equation (2), in which the function describes a form of the PSPs generated by impinging spikes coming from the presynaptic neuron i at the simulation time (t). The parameters of the presynaptic connection i are: the firing time and synaptic delay :

The spike response function describes the form of PSPs, and it is defined in the following equation, where τ represents the membrane potential time constant that defines the decay time of the postsynaptic potential:

2.2. Temporal Encoding

Due to the nature of the employed neural model, original features from the dataset must be transformed into spikes prior to introducing them into the network. For such purpose, the one-dimensional encoding in the following equation is employed [56]:where Y is the spike temporal value, f is the original feature value, are the lower and upper temporal interval limits of the encoding, whereas M and m hold the maximum and minimum values that the f variable takes, respectively, and r is the range between M and m. This encoding method preserves the dimension of the samples in the dataset, while providing a temporal representation of the scalar values of the dataset suitable for insertion in the network.

2.3. Grammatical Evolution (GE)

Grammatical evolution is an evolutionary algorithm based on the combination of genetic algorithms and context-free grammars [68]. It employs a BNF grammar relating to the problem, a mapping process to obtain the functional form of solutions, and a search engine to drive the search process.

2.3.1. BNF Grammar

The Backus–Naur form (Figure 3) is employed to define the topology of the network and its parameters. Any word produced by this grammar includes an arbitrary number of hidden neurons and some specific pre- and postsynapses with their respective parameters. The opening curled bracket symbol ({) indicates the division between hidden neurons, while the opening parenthesis (() marks the different synapses, and the at symbol () precedes the synapse-specific weight and delay values.

Figure 4 illustrates an example of a word generated by the proposed grammar and its corresponding network topology. By relating the word with its network topology, the word has two “{” symbols (see end of each row), implying that the network has two hidden neurons. In this case, each row has two “(” symbols meaning three synaptic configurations (but it can vary for each hidden neuron), where the first and second synaptic configurations represent connections with neurons from the input layer, and the last configuration marks the synapse with the output layer; each synaptic configuration is formed by a neuron identifier, a synaptic weight, and a delay. In Figure 4, each presynaptic neuron and its synaptic connection with a postsynaptic neuron are portrayed in the same color to clarify the reading of the transformation process from a word to a network topology.

2.3.2. Mapping Process

The mapping process transforms an individual from its genotypic form into its phenotypic form to represent a functional network. The depth-first mapping process—employed in this framework—is the standard in GE; basically, it begins by deriving (i.e., replace it by one of its productions) the left-most nonterminal symbol (initially, <architecture> non-terminal symbol) until all nonterminal symbols in depth are derived and then moves to the current left-most nonterminal. The process continues until either nonterminals are depleted, or all elements of the genotype have been used.

2.3.3. Search Engine

Several population-based metaheuristic algorithms can be used as the search engine of grammatical evolution. The well-known genetic algorithm (GA) and differential evolution (DE) are used in this framework [69].

2.4. Fitness Function

Two different fitness functions are considered to provide a measure of the ability of the solutions to solve the problem:(1)The squared error is as defined in the following equation, where P is the total number of training patterns, O is the number of neurons in the output layer, is the actual firing time, and is the desired firing time of neuron o:(2)The accuracy error of the training subset is as in the following equation, where C is the number of correct predictions and T is the total of predictions:

Both fitness functions are designed to be minimized.

2.5. Target

In order to obtain a prediction, a particular firing time is assigned to each class in the dataset employed, resulting in a desired time-to-first spike for every sample belonging to a specific class.

3. Experiments and Results

Twelve supervised classification benchmark datasets from the UCI Machine Learning Repository [70] were considered for experimentation: Balance Scale, Blood Transfusion Service Center (Blood), Breast Cancer Wisconsin (Breast Cancer), Japanese Credit Screening (Card), Pima Indians Diabetes (Diabetes), Fertility, Glass Identification (Glass), Ionosphere, Iris Plant, Liver Disorders (Liver), Parkinson, and Wine. Table 1 shows the details of the datasets employed.

Each dataset was randomly divided into two subsets of approximately the same size, accounting for the instances of each class to be evenly distributed between the subsets. One of these subsets is assigned to be the design set, while the other is to be the test set.

Then, the design set is employed to carry out the GE, while the test set is reserved to prove the performance of the best solution provided by the evolutionary process.

Aiming to compare the performance between neural models from different generations in solving pattern recognition tasks, six different configurations were considered, as shown in Table 2, observing the following details:(i)α configurations employ the parameters defined in [35], focusing on developing second-generation partially connected ANNs(ii)β configurations aim to be an homology of α configurations but used to produce third-generation partially connected ANNs(iii)γ configurations are defined as β configurations but employing DE as search engine instead of GA

Parameters between configurations were matched to make a comparison as fair as possible. Furthermore, configurations labeled with subscript 1 look upon the squared error as the fitness function to guide the evolutionary process, while configurations labeled with subscript 2 consider the accuracy error of the design set.

In order to guarantee statistical significance, the central limit theorem [71] is satisfied by performing 33 experiments for each configuration. Specific parameters used in this framework for configurations β and γ are provided next.Temporal Encoding: The one-dimensional encoding scheme observes a temporal range from 0.01 to 9 milliseconds (ms).SRM: membrane potential time constant τ = 9; target: {12 ms, 15 ms, 18 ms, … .} (depending on the number of classes in the dataset); simulation time [10 ms, target of the last class plus two]; threshold θ = 1 millivolts (mV); weight range ∈ [−999.99, 999.99]; and delay range ∈ [0.01, 19.99] (ms).GA: binary search space [0, 1], codon size = 8; individual dimension = 4000 (500 codons); population size = 100; function calls = 1,000,000; K-tournament (K = 5) selection operator; elitism percentage = 10%; one-point crossover operator; mutation: bit-negator mutation operator (5%).DE: real search space [0, 255]; individual dimension = 500; function calls = 1,000,000; population size = 100; crossover rate = 10%; mutation: DE/Rand/1.

Tables 3 and 4 show the results obtained by carrying out the aforementioned methodology. Accuracy value ∈ [0, 1] grades the average performance of the configurations applied to classify specific datasets, along with its corresponding standard deviation, for all experiments made. Design accuracy relates with the performance of the best network topology obtained by the evolutionary algorithm, whilst test accuracy indicates the performance of such network applied to the test subset; highest values are indicated in boldface.

As well, Tables 5 and 6 show some of the features of the generated topologies, focusing on the average amount of input vector features actually employed by the networks, and its corresponding rate regarding the total size of the original input vector; besides, the average number of hidden units and synapses present in the generated networks. In Supplementary Materials, some examples of SNNs' topologies with best obtained results are shown; each example contains the benchmark dataset, used configuration, accuracies of design and test phases, the generated word, and the network topology.

4. Comparative Statistical Analysis

As detailed in the previous section, data samples from performing thirty-three independent experiments for each configuration on every dataset were obtained. Thereupon, several statistical tests [72] were applied to these data. First of all, a Shapiro–Wilk [73] test was applied to determine the normality of the samples. Such test showed that data can indeed be modelled under normal distributions. Further analysis was divided into three tests applied to configurations using squared error as fitness function, configurations using accuracy error as fitness function, and all configurations.

4.1. Test of Designs Driven by Squared Error Fitness Function

In order to verify statistical significance of the results, analysis of variance (ANOVA [74]) tests were applied to determine if, firstly, implementing different methodologies to develop weighed network topologies impacts on the accuracy of classification and secondly, to identify which of these methodologies offers the best performance. Table 7 shows the results obtained by two-way ANOVA test, observing as independent variables both configurations and datasets.

ANOVA’s null hypothesis (H0) dictates that observed samples come from one unique normal distribution. As p values (Pr(>F) in Table 7) are smaller than the significance value of 0.05, there is not enough evidence to accept H0, ergo rejecting that samples are statistically similar. In other words, it can be conclude that configurations come from different distributions. This test provides relevant statistical evidence to support the conclusion that changing the methodology while generating weighted topologies influences the classification accuracy of the networks.

Pairwise t-tests and Tukey HSD [75] tests were applied next. As in the ANOVA test, the null hypothesis in both tests assumes that samples come from a single distribution. Table 7 shows t-test p values with a Bonferroni correction. Based on these results, it can be inferred that, with statistical significance, configuration can be considered different from and configurations, based on a significance level of 0.05. Subsequently, Tukey HSD test results can be found in Table 7, to uphold that configuration is significantly different from the other configurations. Once the previous results were found, a higher performance for configuration is noticeable in the three left-most configurations shown in, e.g., Fertility (Figure 5), Glass (Figure 6), and Ionosphere (Figure 7), performance box plots.

4.2. Test of Designs Driven by Accuracy Error Fitness Function

Statistical analysis for configurations driven by accuracy error fitness function was performed with the same approach as in the previous subsection; Table 8 shows ANOVA, t-test, and Tukey HSD tests applied to such configurations. In this case, for designs driven by accuracy error fitness function, the pair-wise t-test show that there is not difference with statistical significance to reject the null hypothesis for and configurations; however, the configuration requires a higher computational power to carry out the designing task due its search engine and its respective operators (crossover and mutation). The aforementioned issues are not presented for the configuration; besides its results show a similar accuracy results with lower dispersion, this can be noticed in the right-most configurations shown in, e.g., Fertility (Figure 5), Glass (Figure 6), and Ionosphere (Figure 7) performance box plots, and this behavior was consistently observed for all benchmark datasets. The Tukey HSD test shows that there is statistical difference for all configurations; this, along with the observed behavior in the previous box plots, confirms that configuration holds as the outperforming algorithm.

4.3. Test of All Configurations

An omnibus test was applied to the entire set of experiments considering both as independent variables, configurations and fitness functions. Two-way ANOVA test was applied to determine if varying both observed variables influences accuracy performance. Table 9 contains such results, providing statistical certainty to reject the null hypothesis ; in other words, the accuracy performance is affected by both variables. The p values lower than the significance level of 0.05 indicate that changing the optimization function (squared error and accuracy error) and the configuration does indeed affect the performance accuracy obtained by the generated topology.

Finally, pairwise t-test was applied to discern if, given two configurations, their performances are statistically similar. Considering p values in Table 9 and a significance value of 0.05, it can be inferred with statistically trustworthy that configuration generally outperforms other configurations.

5. Conclusions and Future Work

This paper presents a GE-based methodology to design partially connected ANNs for solving supervised classification problems; some interesting characteristics of the methodology are that it provides weighted topologies which allow us to avoid an explicit training and those topologies exhibit partial connectivity between input and hidden layers which may avoid redundancies and reduce the dimensionality of the input feature vectors. The proposed methodology () evolved from progressive improvements made to a base methodology (), which uses GE with GA as search engine and squared error as fitness function; improvements were made by changing neuron models which allowed us to generate SNNs () instead of ANNs from the second ANN generation and by changing the search engine by using DE () instead of GA. All the aforementioned configurations were adapted to use another fitness function based on the accuracy error of generated ANNs, so-called , , and .

In order to validate the achieved improvements, several statistics tests were applied. Each configuration was tested for twelve well-known benchmark datasets of supervised classification problems by performing 33 experiments for each dataset. Three types of statistical analysis were performed, and the first being applied to , , and configurations, which use squared error as the fitness function. In such analysis, configuration is shown to outperform the other configurations based on the statistical test and graphic box plots. The second analysis focused , , and configurations, which use the accuracy error as the fitness function; based on the Tukey HSD test, this analysis yielded a similar conclusion as from the first analysis, but with respect to configuration. The last analysis compared all configurations and showed statistical evidence to support that is a better configuration with competitive performances and lower dispersions for its designs.

Focusing in topology designs and performance results, evolutionary designs led to the formulation of solution topologies with fewer connections than those in equivalent fully connected topologies, hence reducing the complexity of the networks and achieving good classification performances. The topology simplification provided a good network design (i.e., design accuracy was competent), but it was desirable to get better generalization capability for unseen data in the test phase; some particular cases exhibited lower test accuracies, evidencing an improving opportunity.

Due to the flexibility of the context-free grammars employed in GE, another aspect of neural network topologies can be considered to cope with detected issues while preserving the enhancements accomplished. The design process may consider other traits, e.g., selection of the neural model and/or the search engine, specification of the model parameters, or even aggregation on the number of hidden layers to design SNNs for deep learning topologies. Moreover, additional types of topologies with structures other than layered networks can be explored to be designed, such as those of reservoir computing or central pattern generators. Furthermore, another kind of grammar-based genetic programming algorithms can be used to add semantic to the design process, such as Christiansen grammar evolution [67].

Finally, contemplating the fitness function as another relevant aspect to produce enhanced designs, considerations can also be made to it: to minimize the amount of processing units in the hidden layer or to consider another evaluation measurements to comply with other kinds of problems; features in the fitness function may be treated as weighted mono-objective fitness function or by using algorithms such as the nondominated sorting genetic algorithm (NSGA) [76] with fitness functions with multiple objectives.

Data Availability

The supervised classification dataset benchmarks used to support the findings of this study have been taken from the UCI Machine Learning Repository of the University of California, Irvine (http://archive.ics.uci.edu/ml/datasets.html).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors wish to thank the National Technology of México and University of Guanajuato. A. Espinal wishes to thank SEP-PRODEP for the support provided to the Project 511-6/17-8074 “Diseño y entrenamiento de Redes Neuronales Artificiales mediante Algoritmos Evolutivos.” G. López-Vázquez and A. Rojas-Domínguez thank the National Council of Science and Technology of México (CONACYT) for the support provided by means of the Scholarship for Postgraduate Studies (701071) and research grant CÁTEDRAS-2598, respectively. This work was supported by the CONACYT Project FC2016-1961 ”Neurociencia Computacional: de la teoría al desarrollo de sistemas neuromórficos”.

Supplementary Materials

Examples of the best results obtained for SNNs are shown; each example contains the benchmark dataset, used configuration, accuracies of design and test phases, the generated word, and the network topology. (Supplementary Materials)