Recall Network: A Simple Brain-Inspired Algorithm for Classification

Tian, Zhaoning; Li, Ying; Li, Zhenhua; Li, Site

doi:https://doi.org/10.1155/2022/9374946

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Results Discussion Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Neuroevolution: Methods and Applications

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 9374946 | https://doi.org/10.1155/2022/9374946

Recall Network: A Simple Brain-Inspired Algorithm for Classification

Zhaoning Tian,¹Ying Li,¹Zhenhua Li,¹and Site Li²

Academic Editor: Diego Oliva

Received15 Feb 2022

Revised09 Jun 2022

Accepted24 Jun 2022

Published13 Aug 2022

Abstract

The latest development of neuroscience has deepened the understanding of the information-processing mechanisms in the human brain and inspired a couple of sophisticated computational methods, such as deep learning, memory networks, and hierarchical temporal memory. However, it remains a challenge to explore simpler models due to the high computational cost of the above-mentioned methods. This paper proposes recall network (RN), an intuitive and simple model, that initializes itself by constructing the network path derived from the correlation of features in the training dataset and then makes classification decisions by recalling the paths that are relevant to the features in the test set. The algorithm has been applied to 263 datasets available from UCI Machine Learning Repository, and the classification results of repeated 10-fold cross-validation experiments on Weka demonstrate its competitive performance with prestigious classification algorithms, such as ANN, J48, and KNN.

1. Introduction

Brain-inspired algorithms, like artificial neural networks, have shown great success in solving numerous problems in multiple fields for many years. Recently, the development of brain science and neuroscience has deepened the understanding of brain information-processing mechanisms and therefore inspires new generation computational models. For example, deep learning surpasses artificial neural networks (ANNs) in terms of both complexity and capability [1–3], and later in 2015, memory networks (MemNN) came with a long-term memory component [4, 5]; on the other hand, hierarchical temporal memory (HTM) builds a tree-shaped hierarchy of levels where the higher level gets input from results from the next lower level [6, 7]. Bin Hu etc. suggested that it is a feasible way to reconstruct cortical networks with dynamic activities instead of using only artificial computing networks [8].

Those algorithms have a trend of being more and more sophisticated. However, designing simpler models is still worth exploring due to the high computational cost of the above-mentioned methods. This work proposes a straightforward model called recall network (RN), which is a memory network that stores, marks, and then retrieves the previous paths. A RN consists of nodes and edges (path), where a node represents an attribute (feature) value, while an edge connects between two nodes and denotes the number of samples that has both attribute values. For each sample, if we connect its attribute values consecutively, we can form an end-to-end route. During the prediction phase, the network determines its type by plural edges along the route. For example, if a route consists of three edges with 2 YES and 1 NO, then the result will be YES.

This paper investigates the use and application of recall network in the realm of classification field. The objective of the study is to unveil the capabilities of this newly developed method on benchmark problems of classification. Compared with other classic approaches, the performance of the proposed algorithm is experimented and evaluated on Weka [9], a prestigious machine learning platform with the function of the fair comparison of algorithm performance. The results obtained verify the promising performance of the proposed algorithm.

The remaining sections of the paper are organized as follows. The second section introduces the proposed algorithm with particular emphasis on its application in the classification field. In the third section, the concept of RN is compared with other similar algorithms. Classification examples proving the accuracy of the propose algorithm are covered in the fourth section. The fifth section discusses the difference between RN and some powerful classification methods. A brief conclusion of the study is given in the last section.

2. Structure and Algorithms

2.1. Structure of a Recall Network

A recall network is a group of connected layers, where every layer consists of similar nodes that represent the value for a certain attribute (see Figure 1). Also, each connection or edge, expressed as a set, stores relations between connected nodes of previous samples, for the purpose of determining the output value of the edge of the pair nodes by their plurality.

RN can be defined as an undirected graph = (N, E), consisting of the set N of nodes where nodes representing the same attribute are lined in one layer, and the set E of edges, where each edge connects two nodes from adjacent layers.

2.2. Mechanism of Recall Network

Each training sample, called an instance in Weka, is represented by a route from a node in the first layer to the last one. Routes are piled up in the training process, and during the prediction phase, each predicted sample, also a route, is voted by each edge whose value was accumulated in the training stage.

The above mechanism can be expressed aswhere e₁, e₂, e₃,…, e_n are edges of the instance route from the 1st layer to the last one (n) and the Mode function is to get the mode of the edge set. In the same way, each edge ei gets the mode of plural connections of their adjacent nodes e_i1, e_i2, e_i3, …, e_ij.

The algorithm is simple to understand and cheap to implement, requiring only three easy steps: building a RN from instances (see Algorithm 1), training it by known data (see Algorithm 2), and using it to classify unknown samples (see Algorithm 3).

	Input: number of attributes (N), number of categories for each discrete attribute, scope of value for each continuous attribute
	Output: a network with N layers where each layer has different nodes corresponding to the intervals of an attribute
	Initialization of the Recall Network;
	for i ⟵ 1 to the Number of attribute−1 do
	for m ⟵ 1 to the intervals of the i_thattribute do
	for n ⟵ 1 to the intervals of the (i + 1)_thattribute do
	Create two nodes: node (i,m) and node (i+1,n);
	Create an edge between them;
	Initializing classID vector of the edge;
	end
	end
	end

	input: training data
	output: the network weighted by all training examples
	Training the Recall Network;
	for i ⟵ 1 to the number of examples do
	classIndex = the class index of the i_th example
	for j ⟵ 1 to the Number of attribute−1 do;
	RowNo_j = the row number of the value in j_th layer;
	RowNo_j+1 = the row number the value in (j + 1)_th layer;
	e = the edge between the Node (j, RowNo_j) and Node (j, RowNo_j+1);
	e.ClassVector[ClassIndex] + 1
	end
	end

	input: test data
	output: the class of the data
	Determining the category by votes from all edges;
	for i ⟵ 1 to the number of attributes−1 do
	RowNo_i = the row number of the value in i_th layer;
	RowNo_i+1 = the row number the value in (i + 1)_th layer;
	e = the edge between the Node (i, RowNo_i) and Node (i, RowNo_i+1);
	ResultVector+ = e.ClassVector
	end
	return the mode in ResultVector

2.3. A Classification Example

To illustrate the proposed algorithm, an example is shown in Table 1. In this example, our alrogithm aims to predict whether a given weather condition is suitable for playing tennis.

Figure 2 shows how a RN is created based on Table 1. Here, the RN consists of 4 layers (attributes) with Outlook, Temperature, Humidity, and Wind. Each node represents a possible value for a weather attribute, and nodes that shared the same attribute are arranged in a column. Each edge connects two nodes from adjacent columns. If a given day (a row in Table 1) is suitable for playing tennis (PlayTennis = YES), then we add one blue edge between each adjacent pair of nodes, in which the node should correctly represent the value of a weather attribute on that day. If a given day is not suitable for playing tennis (PlayTennis = NO), we add red edges with similar criteria.

To predict a new sample with following weather condition: Rain, Mild, High, Strong, first, we idenfity the set of 4 nodes that represent that weather condition in Figure 2. Then, we find all the edges (Rain-Mild, Mild-High, High-Strong) that connects each pair of nodes within that set. Finally, we group those edges by color and count them in Figure 2. In this example, we got 3 red edges and 5 blue edges and therefore, this sample is likely to be classified as PlayTennis = YES (5 blues win 3 reds). However, if we got the same number of red and blue edges, the system will choose the color of the first edge in the edge set.

2.4. Appropriated Problems

Apparently, the proposed approach can naturally be used for discrete values and categorical values, and datasets with missing or error values are tolerated.

For continuous values, it divides the data range into a couple of intervals, and each interval is regarded as a discrete value. As for the data missing case, the missing node will be skipped.

3. Conceptual Comparison of RN with Similar Algorithms

Though RN is a novel algorithm, its structure is still similar to artificial neural network (ANN) and the memory method like pheromone in ant colony optimization (ACO); after all, it is a simple, intuitive, and high readability brain-inspired algorithm.

3.1. Compared with the ANN

The RN shares the same biological motivation and similar network structure with ANN; however, there are three main differences.(1)The meaning of nodes is totally different. A node in ANN represents the summation of inputs while it works only as an identifier in RN.(2)Structure is more flexible in ANN but not in RN, since the number of RN layers is decided strictly by the number of attributes, and the number of nodes should equal the number of intervals of the attribute.(3)Though ANN is an effective algorithm in many fields, it is a “black box,” while RN is more explainable.

3.2. Compared with the MemNN

MemNN, strictly speaking, is more like a system consisting of a set of memory and inference modules than an independent algorithm. The idea that retrieves the most relevant memory is similar to RN; however, RN is much simpler: it does not need to compress and transform the input or use complex functions to score memories. On the contrary, RN just stores the input as it is and then returns the vote based on the majority marks of the same kind.

3.3. Compared with the HTM

Like HTM, RN simulates the mechanism of the cerebral cortex and memorizes patterns for solving problems, but RN has only one “level” and therefore spares the transformation process from lower levels to higher levels; moreover, RN uses a very primitive method—voting—to identify the patterns.

3.4. Compared with the ACO

RN is similar to ACO in the idea of “pheromone.” As long as a sample passes through a network route, each segment (edge) in the route will get a permanent mark, and such a mark will never decay along with the time, unlike ACO. We argue that RN more meets biological characteristics of the brain: memories learned by brain will hardly disappear completely, and they will be recalled when a similar event triggers even through a long period.

3.5. Comparison of the Mechanism, Computing Complexity, and Structure

To distinguish the RN further, we analyze RN in terms of mechanism, time complexity, space complexity, and structure (Table 2).

In a summary, RN is a distinctive brain-inspired algorithm. Although its network structure is similar to ANN, and its memory mechanisms is similar to MemNN, HTM, and ACO, RN is the simplest of them all.

4. Experiments and Results

4.1. Testing Tool

To evaluate the performance of the proposed algorithm, we chose Weka, a prestigious machine learning platform that gives a fair comparison among different algorithms. No doubt, Weka is one of the most popular tools to research machine learning algorithms. A number of scientists use this software to study classification problems, for example, Arora and Suman tested J48 and MLP on 5 UCI datasets [18]; Kiranmai and Laxmi classified power quality problems using Weka and studied the effect of attributes on classification accuracy [19]; Mhetre and Nagar adopted Weka to compare four classification algorithms (Naive Bayes, J48, ZeroR, and random tree) on education datasets [20]; Farhat et al. compared the performance of SVM, KNN, NB, logistic regression, decision tree, and random forest on intrusion detection data [21]; Villavicencio et al. studied J48 decision tree, random forest, SVM, KNN, and NB on the COVID-19 Symptoms and Presence dataset from Kaggle [22].

It can be seen from the above research that SVM, KNN, J48, NB, and OneR are competitive methods in the classification field. However, the above research works focus on limited field or limited datasets. In this paper, we used all available data to study the performance comparison.

4.2. Testing Data

The test problem includes 263 UCI datasets from UCI Machine Learning Repository. Although the repository offers 466 datasets in total, 121 are too big for Weka, 20 are missing, 53 are for natural language processing, and 9 are pictures. Somehow, the rest still represent a wide range of domains and data characteristics, whose detailed description is listed in Table 3. In a word, all available data in UCI Repository are used for testing.

4.3. Method

The RN is published on https://github.com/tianzhaoning/RRN/releases/download/RecallNetwork/RN.zip, and we then retrieved them back to Weka on a local machine to compare with others. In all experiments, 10 runs of 10-fold cross validation are used to average the results.

To illustrate the performance of the proposed method, we chose some competitive classification algorithms, such as Naive Bayes (NB), support vector machine (SVM), decision tree (J48), K-nearest neighbors (KNN), artificial neural network (ANN), and OneR (since it is also a very simple decision tree algorithm), on a bulk of benchmark data.

All algorithms use default parameters supplied by Weka, for the purpose of fair play. However, we applied two more parameters on SVM and one more on ANN besides the default setting to explore the effect under different configurations.

As for SVM, we took the default as the SVM1, changing the following parameters: degree, gamma, cachesize, cost, E-eps, and the value (the value of the loss function in e-SVR). As for ANN, we took the default as ANN1, we change the learning rate and the momentum. The specific values of super-parameters are shown in Table 4.

In the process of statistical analysis, this paper adopts the Friedman test, which is a non-parametric test method using rank to realize whether there are significant differences in multiple population distributions, and ANOVA to test the remarkable mean difference among multiple groups. If both results are yes, we then use the Tukey HSD test to make pair-wise comparisons.

4.4. Performance Metrics

Totally, we evaluate the classification performance with the following indicators [23, 24]:(1)Accuracy (ACC): number of correctly predicted items divided by the total of the item to predict. It is the most important metric to measure classification performance. In this work, we use mean ACC: the average of each accuracy per dataset, which represents the average performance over all tasks.(2)Kappa statistic (k): a measure of how much better the classification results are compared to classification labels assigned by random chance. Generally, in terms of degree of agreement, k ≤ 0 is interpreted as indicating no agreement, 0.01–0.20 as none to slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as almost perfect agreement.(3)Root mean squared error (RMSE): the standard deviation of the residuals (prediction errors). This is another indicator of model accuracy.(4)Mean absolute error (MAE): the average difference between the prediction result and the true value. MAE can avoid the problem of error cancellation, so it can reflect the actual prediction error.(5)Relative absolute error (RAE): sum of the absolute value of the forecast errors divided by the sum of the absolute values of the mean errors. The RAE considers each error equally important. The better the model is, the closer its RAE is to 0.(6)Root relative squared error (RRSE): square root of the relative squared error (RSE) where RSE means the sum of squared errors of a predictive model normalized by the sum of squared mean errors. It tells how well a model performs relative to the average of the true values.(7)Running time: the executed time of an algorithm on data. It is an intuitive measurement of the time complexity in the real environment. In this paper, we use total time instead of running time per dataset, since Weka does not provide the detailed time on each experiment.

Generally, 10 runs of 10-fold cross validation must be used to average the above results [25] with the exception of run time, which uses the total time.

4.5. Results

Table 5 and Figure 3 show the average performance of each algorithm on all datasets. The detailed results are listed in Tables 6–11 which present the accuracy (ACC), kappa statistic (k), root mean squared error (RMSE), mean absolute error (MAE), relative absolute error (RAE), root relative squared error (RRSE), and their standard deviation of each model on each dataset, respectively.

Then, we used Friedman and ANOVA tests to compare the above 6 performance metrics of all models among 263 datasets, and the tests proved that all algorithms differed in both the distribution and the mean value at 95% confidence level (see Table 5). Then, the Tukey HSD test is employed to determine which pair has significant differences, and the detailed result is presented in Tables 12–17, where the result underlined means the corresponding pair is remarkably different.

Some findings of RN are summarized briefly as follows:(1)Figure 3 shows that the performance of RN is not significantly different from other prestigious classification algorithms.(2)As for average accuracy, the most important performance indicator, Table 5 shows that RN (73.56%) significantly outperforms OneR (67.16%) and is not statistically different from other algorithms (Figure 4(a)).(3)For kappa statistic, RN is different from ANN1, ANN2, J48, and KNN (Figure 4(b)); however, all algorithms range from 0.41 to 0.60 (except RN); therefore, they are in the same level: moderate.(4)About root mean squared error (RMSE) and root relative squared error (RRSE), RN outperforms OneR and lags behind ANN1, ANN2, and J48 (Figures 4(c) and 4(f)).(5)On mean absolute error (MAE) and relative absolute error (RAE), RN is weaker than others (Figures 4(d) and 4(e)).(6)With regard to running time, RN works faster than ANN1, ANN2, SVM1, SVM2, and SVM3, while it works slower than the remaining algorithms (Table 5).

(a)

(b)

(c)

(d)

(e)

(f)

In a word, though the proposed algorithm is simple and straightforward, it still shows the competitive performance with the above prestigious classification algorithms on core indicators.

5. Discussion

5.1. RN vs Ensemble Algorithms

As we all know, there are many powerful ensemble algorithms in the field of classification, such as random forest, random tree, and so on. However, these ensemble methods use multiple learning algorithms to achieve better predictive performance than any of the constituent learning algorithms alone. Therefore, it is not fair to compare ensemble methods with recall network, a simple and basic algorithm with a single classifier. it is not fair to compare ensemble methods with recall network, a simple and basic algorithm with a single classifier.

5.2. RN vs LSTM

In this paper, ANN shows the best performance on the classification task, but long short-term memory (LSTM) is a next-generation recurrent neural network and is better than traditional neural networks, let alone with the RN algorithm. Though the operation mechanism of both LSTM and RN is based on memory, the former obviously employs much more nodes and layers and has more complex structure and therefore takes a much longer running time, so the comparison with LSTM is not to be considered in this work.

5.3. Statistic Tests

In this paper, we employ three types of statistical methods. Friedman test is used to determine distribution differences of the result of the compared algorithms in the six evaluation indicators, while ANOVA is used to determine the mean difference of that. Then, we use the Tukey HSD test to find which pair has significant mean differences.

6. Conclusions

In this paper, a simple and effective classification algorithm is designed as a memory network that stores, marks, and then retrieves the previous paths. Though its structure is similar to artificial neural networks, and the memory mechanism is similar to memory networks, hierarchical temporal memory, and ant colony optimization, the proposed algorithm is still a naive and distinctive algorithm, and more interpretative than others.

To investigate the capabilities of this newly developed method in the realm of dataset classification, the RN is compared with other classic approaches on benchmark problems in Weka, and experiments show that this simple algorithm has no statistical difference with ANN, J48, KNN, and SVM in accuracy, and all compared algorithms are in the same level on kappa statistic except OneR, though RN performs poorly on RMSE, MAE, RAE, and RRSE.

There are several possible extensions to this work. On the one hand, the classification performance of RN can be improved further, since it is sensitive to the order of layers, so it could be archiving higher performance by exhaustively exploring all possible RN structures and then choosing the best one to fulfill the prediction; moreover, the technique of ensemble learning is also worthy to be used on bagging and boosting recall networks. On the other hand, RN can be also applied in other fields, for example, clustering based on RN could use Hamming distance to partition the routes into small groups.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Fundamental Research Funds for National University, China University of Geosciences (Wuhan) (CUG090109).

References

G. E. Hinton and R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science, vol. 313, pp. 504–507, 2006.
View at: Publisher Site | Google Scholar
G. E. Hinton, “Learning Multiple Layers of Representation,” Trends in Cognitive sciences, vol. 11, no. 10, pp. 428–434, 2008.
View at: Publisher Site | Google Scholar
L. Deng and D. Yu, “Deep Learning: Methods and Applications,” Foundations and Trends in Signal Processing, vol. 7, pp. 197–387, 2014.
View at: Publisher Site | Google Scholar
J. Weston, S. Chopra, and A. Bordes, “Memory Networks,” 2015, https://arxiv.org/abs/1410.3916?context=cs.
View at: Google Scholar
S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus, “End-to-end Memory Networks,” 2015, https://arxiv.org/abs/1503.08895?context=cs.
View at: Google Scholar
J. Hawkins and S. Blakeslee, On Intelligence: How a New Understanding of the Brain Will Lead to the Creation of Truly Intelligent Machines, Henry Holt and Company, New York, NY, U.S.A, 2007.
X. Chen, W. Wang, and W. Li, “An Overview of Hierarchical Temporal Memory: A New Neocortex Algorithm,” in Proceedings of the International Conference on Modelling, Identification and Control, pp. 1004–1010, Wuhan, China, June 2012.
View at: Google Scholar
B. Hu, Z.-H. Guan, G. Chen, and C. L. P. Chen, “Neuroscience and network dynamics toward brain-inspired intelligence,” IEEE Transactions on Cybernetics, pp. 1–14, 2021.
View at: Publisher Site | Google Scholar
I. Witten, E. Frank, M. Hall, and C. Pal, Data Mining: Practical Machine Learning Tools and Techniques, the Morgan Kaufmann Series in Data Management Systems, Elsevier Science, Netherlands, 2016.
T. Mitchell, Machine Learning, McGraw-Hill Education, New York, NY, U.S.A, pp. 10–16, 1997.
F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 2017.
View at: Google Scholar
K. He and J. Sun, “Convolutional Neural Networks at Constrained Time Cost,” 2014, https://arxiv.org/abs/1412.1710.
View at: Google Scholar
S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” 2015, https://arxiv.org/abs/1502.03167.
View at: Google Scholar
C. Szegedy, W. Liu, Y. Jia et al., “Going Deeper with Convolutions,” 2014, https://arxiv.org/abs/1409.4842.
View at: Google Scholar
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the Inception Architecture for Computer Vision, 2015, https://arxiv.org/abs/1512.00567?context=cs.
A. Vaswani, N. Shazeer, N. Parmar et al., “Attention is all you need,” 2017, https://arxiv.org/abs/1706.03762?context=cs.
View at: Google Scholar
J. Weston, S. Chopra, and A. Bordes, “Memory Networks,” 2015, https://arxiv.org/abs/1410.3916?context=cs.
View at: Google Scholar
R. Arora and S. Suman, “Comparative analysis of classification algorithms on different datasets using weka,” International Journal of Computer Application, vol. 54, no. 13, pp. 21–25, 2012.
View at: Publisher Site | Google Scholar
S. A. Kiranmai and A. J. Laxmi, “Data Mining for Classification of Power Quality Problems Using Weka and the Effect of Attributes on Classification Accuracy,” Protection and Control of Modern Power Systems, vol. 3, no. 1, pp. 1–12, 2018.
View at: Google Scholar
V. Mhetre and M. Nagar, “Classification based data mining algorithms to predict slow, average and fast learners in educational system using weka,” in Proceedings of the 2017 International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, July 2018.
View at: Publisher Site | Google Scholar
S. Farhat, M. Abdelkader, A. M. Makhlouf, and F. Zarai, “Comparative study of classification algorithms for cloud ids using nsl-kdd dataset in weka,” in Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), pp. 445–450, IEEE, Limassol, Cyprus, June 2020.
View at: Publisher Site | Google Scholar
C. N. Villavicencio, J. J. E. Macrohon, X. A. Inbaraj, J. H. Jeng, and J. G. Hsieh, “Covid-19 prediction applying supervised machine learning algorithms with comparative analysis using weka,” Algorithms, vol. 14, no. 7, p. 201, 2021.
View at: Publisher Site | Google Scholar
M. Othman and T. Yau, “Comparison of Different Classification Techniques Using Weka for Breast Cancer,” in Proceedings of the 3rd Kuala Lumpur international conference on biomedical engineering 2006, pp. 520–523, Berlin, Germany, January 2007.
View at: Google Scholar
V. Vakharia, I. E. Castelli, K. Bhavsar, and A. Solanki, “Bandgap Prediction of Metal Halide Perovskites Using Regression Machine Learning Models,” Physics Letters A, vol. 422, 2021.
View at: Publisher Site | Google Scholar
M. R. Barusu, “Diagnosis of bearing outer race faults using a low-cost non-contact method with advanced wavelet transforms,” Electronics, vol. 25, no. 1, pp. 44–53, 2018.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Zhaoning Tian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

289

Downloads

461

Citations