Abstract

The latest development of neuroscience has deepened the understanding of the information-processing mechanisms in the human brain and inspired a couple of sophisticated computational methods, such as deep learning, memory networks, and hierarchical temporal memory. However, it remains a challenge to explore simpler models due to the high computational cost of the above-mentioned methods. This paper proposes recall network (RN), an intuitive and simple model, that initializes itself by constructing the network path derived from the correlation of features in the training dataset and then makes classification decisions by recalling the paths that are relevant to the features in the test set. The algorithm has been applied to 263 datasets available from UCI Machine Learning Repository, and the classification results of repeated 10-fold cross-validation experiments on Weka demonstrate its competitive performance with prestigious classification algorithms, such as ANN, J48, and KNN.

1. Introduction

Brain-inspired algorithms, like artificial neural networks, have shown great success in solving numerous problems in multiple fields for many years. Recently, the development of brain science and neuroscience has deepened the understanding of brain information-processing mechanisms and therefore inspires new generation computational models. For example, deep learning surpasses artificial neural networks (ANNs) in terms of both complexity and capability [13], and later in 2015, memory networks (MemNN) came with a long-term memory component [4, 5]; on the other hand, hierarchical temporal memory (HTM) builds a tree-shaped hierarchy of levels where the higher level gets input from results from the next lower level [6, 7]. Bin Hu etc. suggested that it is a feasible way to reconstruct cortical networks with dynamic activities instead of using only artificial computing networks [8].

Those algorithms have a trend of being more and more sophisticated. However, designing simpler models is still worth exploring due to the high computational cost of the above-mentioned methods. This work proposes a straightforward model called recall network (RN), which is a memory network that stores, marks, and then retrieves the previous paths. A RN consists of nodes and edges (path), where a node represents an attribute (feature) value, while an edge connects between two nodes and denotes the number of samples that has both attribute values. For each sample, if we connect its attribute values consecutively, we can form an end-to-end route. During the prediction phase, the network determines its type by plural edges along the route. For example, if a route consists of three edges with 2 YES and 1 NO, then the result will be YES.

This paper investigates the use and application of recall network in the realm of classification field. The objective of the study is to unveil the capabilities of this newly developed method on benchmark problems of classification. Compared with other classic approaches, the performance of the proposed algorithm is experimented and evaluated on Weka [9], a prestigious machine learning platform with the function of the fair comparison of algorithm performance. The results obtained verify the promising performance of the proposed algorithm.

The remaining sections of the paper are organized as follows. The second section introduces the proposed algorithm with particular emphasis on its application in the classification field. In the third section, the concept of RN is compared with other similar algorithms. Classification examples proving the accuracy of the propose algorithm are covered in the fourth section. The fifth section discusses the difference between RN and some powerful classification methods. A brief conclusion of the study is given in the last section.

2. Structure and Algorithms

2.1. Structure of a Recall Network

A recall network is a group of connected layers, where every layer consists of similar nodes that represent the value for a certain attribute (see Figure 1). Also, each connection or edge, expressed as a set, stores relations between connected nodes of previous samples, for the purpose of determining the output value of the edge of the pair nodes by their plurality.

RN can be defined as an undirected graph  =  (N, E), consisting of the set N of nodes where nodes representing the same attribute are lined in one layer, and the set E of edges, where each edge connects two nodes from adjacent layers.

2.2. Mechanism of Recall Network

Each training sample, called an instance in Weka, is represented by a route from a node in the first layer to the last one. Routes are piled up in the training process, and during the prediction phase, each predicted sample, also a route, is voted by each edge whose value was accumulated in the training stage.

The above mechanism can be expressed aswhere e1, e2, e3,…, en are edges of the instance route from the 1st layer to the last one (n) and the Mode function is to get the mode of the edge set. In the same way, each edge ei gets the mode of plural connections of their adjacent nodes ei1, ei2, ei3, …, eij.

The algorithm is simple to understand and cheap to implement, requiring only three easy steps: building a RN from instances (see Algorithm 1), training it by known data (see Algorithm 2), and using it to classify unknown samples (see Algorithm 3).

Input: number of attributes (N), number of categories for each discrete attribute, scope of value for each continuous attribute
Output: a network with N layers where each layer has different nodes corresponding to the intervals of an attribute
Initialization of the Recall Network;
for i ⟵ 1 to the Number of attribute−1 do
  for m ⟵ 1 to the intervals of the ithattribute do
   for n ⟵ 1 to the intervals of the (i + 1)thattribute do
    Create two nodes: node (i,m) and node (i+1,n);
    Create an edge between them;
    Initializing classID vector of the edge;
   end
  end
end
input: training data
output: the network weighted by all training examples
Training the Recall Network;
for i ⟵ 1 to the number of examples do
  classIndex = the class index of the ith example
  for j ⟵ 1 to the Number of attribute−1 do;
    RowNoj = the row number of the value in jth layer;
    RowNoj+1 = the row number the value in (j + 1)th layer;
    e = the edge between the Node (j, RowNoj) and Node (j, RowNoj+1);
    e.ClassVector[ClassIndex] + 1
  end
end
input: test data
output: the class of the data
Determining the category by votes from all edges;
for i ⟵ 1 to the number of attributes−1 do
  RowNoi = the row number of the value in ith layer;
  RowNoi+1 = the row number the value in (i + 1)th layer;
  e = the edge between the Node (i, RowNoi) and Node (i, RowNoi+1);
  ResultVector+ = e.ClassVector
end
return the mode in ResultVector
2.3. A Classification Example

To illustrate the proposed algorithm, an example is shown in Table 1. In this example, our alrogithm aims to predict whether a given weather condition is suitable for playing tennis.

Figure 2 shows how a RN is created based on Table 1. Here, the RN consists of 4 layers (attributes) with Outlook, Temperature, Humidity, and Wind. Each node represents a possible value for a weather attribute, and nodes that shared the same attribute are arranged in a column. Each edge connects two nodes from adjacent columns. If a given day (a row in Table 1) is suitable for playing tennis (PlayTennis = YES), then we add one blue edge between each adjacent pair of nodes, in which the node should correctly represent the value of a weather attribute on that day. If a given day is not suitable for playing tennis (PlayTennis = NO), we add red edges with similar criteria.

To predict a new sample with following weather condition: Rain, Mild, High, Strong, first, we idenfity the set of 4 nodes that represent that weather condition in Figure 2. Then, we find all the edges (Rain-Mild, Mild-High, High-Strong) that connects each pair of nodes within that set. Finally, we group those edges by color and count them in Figure 2. In this example, we got 3 red edges and 5 blue edges and therefore, this sample is likely to be classified as PlayTennis = YES (5 blues win 3 reds). However, if we got the same number of red and blue edges, the system will choose the color of the first edge in the edge set.

2.4. Appropriated Problems

Apparently, the proposed approach can naturally be used for discrete values and categorical values, and datasets with missing or error values are tolerated.

For continuous values, it divides the data range into a couple of intervals, and each interval is regarded as a discrete value. As for the data missing case, the missing node will be skipped.

3. Conceptual Comparison of RN with Similar Algorithms

Though RN is a novel algorithm, its structure is still similar to artificial neural network (ANN) and the memory method like pheromone in ant colony optimization (ACO); after all, it is a simple, intuitive, and high readability brain-inspired algorithm.

3.1. Compared with the ANN

The RN shares the same biological motivation and similar network structure with ANN; however, there are three main differences.(1)The meaning of nodes is totally different. A node in ANN represents the summation of inputs while it works only as an identifier in RN.(2)Structure is more flexible in ANN but not in RN, since the number of RN layers is decided strictly by the number of attributes, and the number of nodes should equal the number of intervals of the attribute.(3)Though ANN is an effective algorithm in many fields, it is a “black box,” while RN is more explainable.

3.2. Compared with the MemNN

MemNN, strictly speaking, is more like a system consisting of a set of memory and inference modules than an independent algorithm. The idea that retrieves the most relevant memory is similar to RN; however, RN is much simpler: it does not need to compress and transform the input or use complex functions to score memories. On the contrary, RN just stores the input as it is and then returns the vote based on the majority marks of the same kind.

3.3. Compared with the HTM

Like HTM, RN simulates the mechanism of the cerebral cortex and memorizes patterns for solving problems, but RN has only one “level” and therefore spares the transformation process from lower levels to higher levels; moreover, RN uses a very primitive method—voting—to identify the patterns.

3.4. Compared with the ACO

RN is similar to ACO in the idea of “pheromone.” As long as a sample passes through a network route, each segment (edge) in the route will get a permanent mark, and such a mark will never decay along with the time, unlike ACO. We argue that RN more meets biological characteristics of the brain: memories learned by brain will hardly disappear completely, and they will be recalled when a similar event triggers even through a long period.

3.5. Comparison of the Mechanism, Computing Complexity, and Structure

To distinguish the RN further, we analyze RN in terms of mechanism, time complexity, space complexity, and structure (Table 2).

In a summary, RN is a distinctive brain-inspired algorithm. Although its network structure is similar to ANN, and its memory mechanisms is similar to MemNN, HTM, and ACO, RN is the simplest of them all.

4. Experiments and Results

4.1. Testing Tool

To evaluate the performance of the proposed algorithm, we chose Weka, a prestigious machine learning platform that gives a fair comparison among different algorithms. No doubt, Weka is one of the most popular tools to research machine learning algorithms. A number of scientists use this software to study classification problems, for example, Arora and Suman tested J48 and MLP on 5 UCI datasets [18]; Kiranmai and Laxmi classified power quality problems using Weka and studied the effect of attributes on classification accuracy [19]; Mhetre and Nagar adopted Weka to compare four classification algorithms (Naive Bayes, J48, ZeroR, and random tree) on education datasets [20]; Farhat et al. compared the performance of SVM, KNN, NB, logistic regression, decision tree, and random forest on intrusion detection data [21]; Villavicencio et al. studied J48 decision tree, random forest, SVM, KNN, and NB on the COVID-19 Symptoms and Presence dataset from Kaggle [22].

It can be seen from the above research that SVM, KNN, J48, NB, and OneR are competitive methods in the classification field. However, the above research works focus on limited field or limited datasets. In this paper, we used all available data to study the performance comparison.

4.2. Testing Data

The test problem includes 263 UCI datasets from UCI Machine Learning Repository. Although the repository offers 466 datasets in total, 121 are too big for Weka, 20 are missing, 53 are for natural language processing, and 9 are pictures. Somehow, the rest still represent a wide range of domains and data characteristics, whose detailed description is listed in Table 3. In a word, all available data in UCI Repository are used for testing.

4.3. Method

The RN is published on https://github.com/tianzhaoning/RRN/releases/download/RecallNetwork/RN.zip, and we then retrieved them back to Weka on a local machine to compare with others. In all experiments, 10 runs of 10-fold cross validation are used to average the results.

To illustrate the performance of the proposed method, we chose some competitive classification algorithms, such as Naive Bayes (NB), support vector machine (SVM), decision tree (J48), K-nearest neighbors (KNN), artificial neural network (ANN), and OneR (since it is also a very simple decision tree algorithm), on a bulk of benchmark data.

All algorithms use default parameters supplied by Weka, for the purpose of fair play. However, we applied two more parameters on SVM and one more on ANN besides the default setting to explore the effect under different configurations.

As for SVM, we took the default as the SVM1, changing the following parameters: degree, gamma, cachesize, cost, E-eps, and the value (the value of the loss function in e-SVR). As for ANN, we took the default as ANN1, we change the learning rate and the momentum. The specific values of super-parameters are shown in Table 4.

In the process of statistical analysis, this paper adopts the Friedman test, which is a non-parametric test method using rank to realize whether there are significant differences in multiple population distributions, and ANOVA to test the remarkable mean difference among multiple groups. If both results are yes, we then use the Tukey HSD test to make pair-wise comparisons.

4.4. Performance Metrics

Totally, we evaluate the classification performance with the following indicators [23, 24]:(1)Accuracy (ACC): number of correctly predicted items divided by the total of the item to predict. It is the most important metric to measure classification performance. In this work, we use mean ACC: the average of each accuracy per dataset, which represents the average performance over all tasks.(2)Kappa statistic (k): a measure of how much better the classification results are compared to classification labels assigned by random chance. Generally, in terms of degree of agreement, k ≤ 0 is interpreted as indicating no agreement, 0.01–0.20 as none to slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as almost perfect agreement.(3)Root mean squared error (RMSE): the standard deviation of the residuals (prediction errors). This is another indicator of model accuracy.(4)Mean absolute error (MAE): the average difference between the prediction result and the true value. MAE can avoid the problem of error cancellation, so it can reflect the actual prediction error.(5)Relative absolute error (RAE): sum of the absolute value of the forecast errors divided by the sum of the absolute values of the mean errors. The RAE considers each error equally important. The better the model is, the closer its RAE is to 0.(6)Root relative squared error (RRSE): square root of the relative squared error (RSE) where RSE means the sum of squared errors of a predictive model normalized by the sum of squared mean errors. It tells how well a model performs relative to the average of the true values.(7)Running time: the executed time of an algorithm on data. It is an intuitive measurement of the time complexity in the real environment. In this paper, we use total time instead of running time per dataset, since Weka does not provide the detailed time on each experiment.

Generally, 10 runs of 10-fold cross validation must be used to average the above results [25] with the exception of run time, which uses the total time.

4.5. Results

Table 5 and Figure 3 show the average performance of each algorithm on all datasets. The detailed results are listed in Tables 611 which present the accuracy (ACC), kappa statistic (k), root mean squared error (RMSE), mean absolute error (MAE), relative absolute error (RAE), root relative squared error (RRSE), and their standard deviation of each model on each dataset, respectively.

Then, we used Friedman and ANOVA tests to compare the above 6 performance metrics of all models among 263 datasets, and the tests proved that all algorithms differed in both the distribution and the mean value at 95% confidence level (see Table 5). Then, the Tukey HSD test is employed to determine which pair has significant differences, and the detailed result is presented in Tables 1217, where the result underlined means the corresponding pair is remarkably different.

Some findings of RN are summarized briefly as follows:(1)Figure 3 shows that the performance of RN is not significantly different from other prestigious classification algorithms.(2)As for average accuracy, the most important performance indicator, Table 5 shows that RN (73.56%) significantly outperforms OneR (67.16%) and is not statistically different from other algorithms (Figure 4(a)).(3)For kappa statistic, RN is different from ANN1, ANN2, J48, and KNN (Figure 4(b)); however, all algorithms range from 0.41 to 0.60 (except RN); therefore, they are in the same level: moderate.(4)About root mean squared error (RMSE) and root relative squared error (RRSE), RN outperforms OneR and lags behind ANN1, ANN2, and J48 (Figures 4(c) and 4(f)).(5)On mean absolute error (MAE) and relative absolute error (RAE), RN is weaker than others (Figures 4(d) and 4(e)).(6)With regard to running time, RN works faster than ANN1, ANN2, SVM1, SVM2, and SVM3, while it works slower than the remaining algorithms (Table 5).

In a word, though the proposed algorithm is simple and straightforward, it still shows the competitive performance with the above prestigious classification algorithms on core indicators.

5. Discussion

5.1. RN vs Ensemble Algorithms

As we all know, there are many powerful ensemble algorithms in the field of classification, such as random forest, random tree, and so on. However, these ensemble methods use multiple learning algorithms to achieve better predictive performance than any of the constituent learning algorithms alone. Therefore, it is not fair to compare ensemble methods with recall network, a simple and basic algorithm with a single classifier. it is not fair to compare ensemble methods with recall network, a simple and basic algorithm with a single classifier.

5.2. RN vs LSTM

In this paper, ANN shows the best performance on the classification task, but long short-term memory (LSTM) is a next-generation recurrent neural network and is better than traditional neural networks, let alone with the RN algorithm. Though the operation mechanism of both LSTM and RN is based on memory, the former obviously employs much more nodes and layers and has more complex structure and therefore takes a much longer running time, so the comparison with LSTM is not to be considered in this work.

5.3. Statistic Tests

In this paper, we employ three types of statistical methods. Friedman test is used to determine distribution differences of the result of the compared algorithms in the six evaluation indicators, while ANOVA is used to determine the mean difference of that. Then, we use the Tukey HSD test to find which pair has significant mean differences.

6. Conclusions

In this paper, a simple and effective classification algorithm is designed as a memory network that stores, marks, and then retrieves the previous paths. Though its structure is similar to artificial neural networks, and the memory mechanism is similar to memory networks, hierarchical temporal memory, and ant colony optimization, the proposed algorithm is still a naive and distinctive algorithm, and more interpretative than others.

To investigate the capabilities of this newly developed method in the realm of dataset classification, the RN is compared with other classic approaches on benchmark problems in Weka, and experiments show that this simple algorithm has no statistical difference with ANN, J48, KNN, and SVM in accuracy, and all compared algorithms are in the same level on kappa statistic except OneR, though RN performs poorly on RMSE, MAE, RAE, and RRSE.

There are several possible extensions to this work. On the one hand, the classification performance of RN can be improved further, since it is sensitive to the order of layers, so it could be archiving higher performance by exhaustively exploring all possible RN structures and then choosing the best one to fulfill the prediction; moreover, the technique of ensemble learning is also worthy to be used on bagging and boosting recall networks. On the other hand, RN can be also applied in other fields, for example, clustering based on RN could use Hamming distance to partition the routes into small groups.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Fundamental Research Funds for National University, China University of Geosciences (Wuhan) (CUG090109).