Classification of Cancer Recurrence with Alpha-Beta BAM
Bidirectional Associative Memories (BAMs) based on first model proposed by Kosko do not have perfect recall of training set, and their algorithm must iterate until it reaches a stable state. In this work, we use the model of Alpha-Beta BAM to classify automatically cancer recurrence in female patients with a previous breast cancer surgery. Alpha-Beta BAM presents perfect recall of all the training patterns and it has a one-shot algorithm; these advantages make to Alpha-Beta BAM a suitable tool for classification. We use data from Haberman database, and leave-one-out algorithm was applied to analyze the performance of our model as classifier. We obtain a percentage of classification of 99.98%.
Breast cancer is a preponderant disease in the world and it is death cause of women. The women who have suffered from breast cancer and have overcome it have the risk to suffer a relapse; therefore women have to be monitored after the tumor has been extracted.
The prediction of recurrent cancer in women with previous surgery has high monetary and social costs; as a result, many researchers working in the Artificial Intelligent (AI) topic have been attracted to this problem and they have used many AI tools among others for breast cancer prediction. Some of these works are described as follows.
Many methods of AI have shown better results than the obtained by the experimental methods; for example, in 1997 Burke et al.  compared the accuracy of TNM staging system with the accuracy of a multilayer backpropagation Artificial Neural Network (ANN) for predicting the 5-year survival of patients with breast carcinoma. ANN increased the prediction capacity in 10% obtaining the final result of 54%. They used the following parameters: tumor size, number of positive regional lymph nodes, and distant metastasis.
Domingos  used a breast cancer database from UCI repository for classifying survival of patients using the unification of two widely used empirical approaches: rule induction and instance-based learning.
In 2000, Boros et al.  used the Logical Analysis of Data method to predict the nature of the tumor: malignant or benign. Breast Cancer (Wisconsin) database was used. The classification capacity was 97.2%. This database was used by Street and Kim  who combined several classifiers to create a high-scale classifier. Also, it was used by Wang and Witten ; they presented a general modeling method for optimal probability prediction over future observations and they obtained the 96.7% of classification.
K. Huang et al.  construct a classifier with the Minimax Probability Machine (MPM), which provides a worst-case bound on the probability of misclassification of future data points based on reliable estimates of means and covariance matrices of the classes from the training data points. They used the same database utilized by Domingos. The classification capacity was of 82.5%.
In other types of breast cancer diagnosis, C.-L. Huang et al.  employed the Support Vector Machine method to predict a breast tumor from the information of five DNA viruses.
In the last two decades, the impact of breast cancer in Mexico has increased . Every year 3500 women die due to breast cancer, becoming the first death cause and the second frequent type of tumor. Therefore, we applied Associative Models to classify recurrence cancer.
The area of Associative Memories, as a relevant part of Computing Sciences, has acquired great importance and dynamism in the activity developed by international research teams, specifically those who research topics related with theory and applications of pattern recognition and image processing. Classification is a specific homework of pattern recognition because its main goal is to recognize some features of patterns and put these patterns into the corresponding class.
Associative Memories have been developed, at the same time with Neural Networks, from the first model of artificial neuron  to neural networks models based on modern concepts such as mathematical morphologic  getting through the important works of pioneers in neural networks perceptron-based [11–13].
In 1982 Hopfield presents his associative memory; this model is inspired in physical concepts and has as particularity an iterative algorithm . This work has great relevance because Hopfield proved that interactions of simple processing elements similar to neurons give rise to collective computational properties, such as memory stability.
However, Hopfield model has two disadvantages: firstly, associative memory shows a low recall capacity, 0,15n, where n is the dimensions of stored patterns; secondly, Hopfield memory is autoassociative, which means that it is not able to associate different patterns.
In 1988, Kosko  developed a heteroassociative memory from two Hopfield memories to overcome the second disadvantage of Hopfield model. Bidirectional Associative Memory (BAM) is based in an iterative algorithm the same as Hopfield. Many later models were based on this algorithm and they replaced the original learning rule with an exponential rule [16–18]; other models used a multiple training method and dummy addition  to achieve more pairs of patterns to be stable states and, at the same time, they eliminated spurious states. Lineal programming techniques , gradient descent method [21, 22], genetic algorithms , and delayed BAMs [24, 25] had been used with the same purpose. There are many other models which are not based on Kosko, so that they are not iterative and have not stability problems: Morphologic  and Feedforward  BAM. All these models have appeared to overcome the low-capacity recall problem showed by the first BAM; however, none of them have could recover all training patterns. Besides, these models require the patterns to have certain conditions such as Hamming distance, orthogonality, lineal independence, and lineal programming solutions, among others.
The bidirectional associative memory model used in this work is based on Alpha-Beta Associative Memories ; it is not an iterative process and does not have stability problems. Alpha-Beta BAM recall capacity is maximum: , where and are the dimensions of input and output patterns, respectively. This model always shows perfect recall without any condition. Alpha-Beta BAM perfect recall has mathematical bases . It has been demonstrated that this model has a complexity of O(n2) (see Section 2.4). Its main application is pattern recognition and it has been applied as translator  and fingerprints identifier .
Because Alpha-Beta BAM shows perfect recall, it is used as a classifier in this work. We used Haberman database, which contains data from cancer recurrence patients, because it has been included in several works to prove other classification methods such as Support Vector Machines (SVMs) combined with Cholesky Factorization , Distance Geometry , Bagging technique , Model-Averaging with Discrete Bayesian Network , in-group and out-group concept , and ARTMAP fuzzy neuronal networks . Alpha-Beta BAM pretends to surpass the previous results, doing the observation that none of the aforementioned works have used associative models for classifying.
In Section 2 we present basic concepts of associative models along with the description of Alpha-Beta associative memories and Alpha-Beta BAM and its complexity. Experiments and results are showed in Section 3 along with the analysis of our proposal with leave-one-out method.
2. Alpha-Beta Bidirectional Associative Memories
In this section Alpha-Beta Bidirectional Associative Memory is presented. However, since it is based on the Alpha-Beta autoassociative memories, a summary of this model will be given before presenting our model of BAM.
2.1. Basic Concepts
Basic concepts about associative memories were established three decades ago in [38–40]; nonetheless here we use the concepts, results, and notation introduced in . An associative memory M is a system that relates input patterns and outputs patterns, as follows: x→M→y with x and y being the input and output pattern vectors, respectively. Each input vector forms an association with a corresponding output vector. For k integer and positive, the corresponding association will be denoted as . Associative memory M is represented by a matrix whose ijth component is mij. Memory M is generated from an a priori finite set of known associations, known as the fundamental set of associations.
If is an index, the fundamental set is represented as with p being the cardinality of the set. The patterns that form the fundamental set are called fundamental patterns. If it holds that , for all , M is autoassociative; otherwise it is heteroassociative; in this case it is possible to establish that for which . A distorted version of a pattern xk to be recuperated will be denoted as . If when feeding a distorted version of xϖ with to an associative memory M, it happens that the output corresponds exactly to the associated pattern yϖ, we say that recuperation is perfect.
2.2. Alpha-Beta Associative Memories
Among the variety of associative memory models described in the scientific literature, there are two models that, because of their relevance, it is important to emphasize morphological associative memories which were introduced by Ritter et al.  and Alpha-Beta associative memories. Because of their excellent characteristics, which allow them to be superior in many aspects to other models for associative memories, morphological associative memories served as starter point for the creation and development of the Alpha-Beta associative memory.
The Alpha-Beta associative memories are of two kinds and are able to operate in two different modes. The operator α is useful at the learning phase, and the operator β is the basis for the pattern recall phase. The heart of the mathematical tools used in the Alpha-Beta model is two binary operators designed specifically for these memories. These operators are defined as follows: first, we define the sets and , and then the operators α and β are defined in Tables 1 and 2, respectively:
The sets A and B, the and operators, along with the usual (minimum) and (maximum) operators form the algebraic system which is the mathematical basis for the Alpha-Beta associative memories.
Below are shown some characteristics of Alpha-Beta autoassociative memories.(1)The fundamental set takes the form .(2)Both input and output fundamental patterns are of the same dimension, denoted by n.(3)The memory is a square matrix, for both modes, V and . If , then
and according to , we have that vij and , for all and for all .
In recall phase, when a pattern is presented to memories V and , the th components of recalled patterns are
2.3. Alpha-Beta BAM
Generally, any bidirectional associative memory model appearing in current scientific literature could be draw as Figure 1 shows.
General BAM is a “black box’’ operating in the next way: given a pattern , associated pattern is obtained, and given the pattern , associated pattern is recalled. Besides, if we assume that and are noisy versions of and , respectively, it is expected that BAM could recover all corresponding free noise patterns and .
The model used in this paper has been named Alpha-Beta BAM since Alpha-Beta associative memories, both max and min, play a central role in the model design. However, before going into detail over the processing of an Alpha-Beta BAM, we will define the following.
In this work we will assume that Alpha-Beta associative memories have a fundamental set denoted by and , with , , , , and . Also, it holds that all input patterns are different; M that is if and only if . If for all it holds that , the Alpha-Beta memory will be autoassociative; if on the contrary, the former affirmation is negative, that is, for which it holds that , then the Alpha-Beta memory will be heteroassociative.
Definition 2.1 (One-Hot). Let the set be and , , , such that . The kth one-hot vector of bits is defined as vector for which it holds that the th component is and the set of the components are , for all , .
Remark 2.2. In this definition, the value is excluded since a one-hot vector of dimension 1, given its essence, has no reason to be.
Definition 2.3 (Zero-Hot). Let the set A be and , , , such that . The kth zero-hot vector of bits is defined as vector for which it holds that the kth component is and the set of the components are , , .
Remark 2.4. In this definition, the value is excluded since a zero-hot vector of dimension 1, given its essence, has no reason to be.
Definition 2.5 (Expansion vectorial transform). Let the set A be and , . Given two arbitrary vectors and , the expansion vectorial transform of order , , is defined as , a vector whose components are for and for .
Definition 2.6 (Contraction vectorial transform). Let the set A be and , such that . Given one arbitrary vector , the contraction vectorial transform of order , , is defined as , a vector whose components are for .
In both directions, the model is made up by two stages, as shown in Figure 2.
For simplicity, the first will describe the process necessary in one direction, in order to later present the complementary direction which will give bidirectionality to the model (see Figure 3).
The function of Stage 2 is to offer a as output given an as input.
Now we assume that as input to Stage 2 we have one element of a set of p orthonormal vectors. Recall that the Linear Associator has perfect recall when it works with orthonormal vectors. In this work we use a variation of the Linear Associator in order to obtain , parting from a one-hot vector in its kth coordinate.
For the construction of the modified Linear Associator, its learning phase is skipped and a matrix M representing the memory is built. Each column in this matrix corresponds to each output pattern . In this way, when matrix M is operated with a one-hot vector , the corresponding will always be recalled.
The task of Stage 1 is the following: given an or a noisy version of it (), the one-hot vector must be obtained without ambiguity and with no condition. In its learning phase, Stage 1 has the following algorithm.(1)For do expansion: .(2)For and : .(3)For do expansion: .(4)For and , (5)Create modified Linear Associator:
Recall phase is described through the following algorithm.()Present, at the input to Stage 1, a vector from the fundamental set for some index .()Build vector: .()Do expansion: .()Obtain vector: .()Do contraction: .If r is one-hot vector, it is assured that , then . STOP. Else:() For .() Do expansion: .() Obtain a vector: .() Do contraction: .()If s is zero-hot vector, then it is assured that , , where is the negated vector of . STOP. Else:()Do operation , where ⋀ is the symbol of the logical AND operator, so . STOP.
The process in the contrary direction, which is presenting pattern as input to the Alpha/Beta BAM and obtaining its corresponding , is very similar to the one described above. The task of Stage 3 is to obtain a one-hot vector given a . Stage 4 is a modified Linear Associator built in similar fashion to the one in Stage 2.
2.4. The Alpha-Beta BAM Algorithm Complexity
An algorithm is a finite set of precise instructions for the realization of a calculation or to solve a problem . In general, it is accepted that an algorithm provides a satisfactory solution when it produces a correct answer and is efficient. One measure of efficiency is the time required by the computer in order to solve a problem using a given algorithm. A second measure of efficiency is the amount of memory required to implement the algorithm when the input data are of a given size.
The analysis of the time required to solve a problem of a particular size implies finding the time complexity of the algorithm. The analysis of the memory needed by the computer implies finding the space complexity of the algorithm.
In order to store the patterns, a matrix is needed. This matrix will have dimensions . Input patterns and the added vectors, both one-hot and zero-hot, are stored in the same matrix. Since , then this values can be represented by character variables, taking 1 byte each. The total amount of bytes will be .
A matrix is needed to store the patterns. This matrix will have dimensions . Output patterns and the added vectors, both one-hot and zero-hot, are stored in the same matrix. Since , then this values can be represented by character variables, taking 1 byte each. The total amount of bytes will be .
During the learning phase, 4 matrices are needed: two for the Alpha-Beta autoassociative memories of type max, Vx and Vy, and two more for the Alpha-Beta autoassociative memories of type min, Λx y Λy. Vx and Λx have dimensions of , while Vy and Λy have dimensions . Given that these matrices hold only positive integer numbers, then the values of their components can be represented with character variables of 1 byte of size. The total amount of bytes will be and .
A vector is used to hold the recalled one-hot vector, whose dimension is p. Since the components of any one-hot vector take the values of 0 and 1, these values can be represented by character variables, occupying 1 byte each. The total amount of bytes will be .
The total amount of bytes required to implement an Alpha-Beta BAM is
The time complexity of an algorithm can be expressed in terms of the number of operations used by the algorithm when the input has a particular size. The operations used to measure time complexity can be integer compare, integer addition, integer division, variable assignation, logical comparison, or any other elemental operation.
The following is defined: EO: elemental operation; n_pares: number of associated pairs of patterns; : dimension of the patterns plus the addition of the one-hot or zero-hot vectors.
The recalling phase algorithm will be analyzed, since this is the portion of the whole algorithm that requires a greater number of elemental operations.
u = 0; ()
i = 0; ()
j = 0; ()
if(y[u][i]==0 && y[u][j]==0) ()
else if(y[u][i]==0 && y[u][j]==1) (a)
else if(y[u][i]==1 && y[u][j]==0) (b)
()1 EO, assignation()n_pares EO, comparison()n_pares EO, assignation()n_pares*n EO, comparison()n_pares*n EO, assignation()n_pares*n *n EO, comparison(a)n_pares*n *n EO, comparison: y[u][i]==0(b)n_pares*n *n EO, relational operation AND: &&(c)n_pares*n *n EO, comparison: y[u][j]==0()There is allways an allocation to variable t, n_pares*n *n EO()Both if sentences (a and b) have the same probability of being executed, n_pares*n *(n/2)()n_pares*n *n EO, comparison()This allocation is done only once, 1 EO() (n_pares*n *n)-1 EO, comparison()Allocation has half probability of being run, n_pares*n *(n/2)()n_pares*n *n EO, increment()n_pares*n EO, increment()n_pares EO, increment
The total number of EOs is .
From the total of EOs obtained, n_pares is fixed with value 50, resulting in a function only dependant on the size of the patterns: .
In order to analyze the feasibility of the algorithm we need to understand how fast the mentioned function grows as the value of rises. Therefore, the Big-O notation , shown below, will be used.
Let f and g be functions from a set of integer or real numbers to a set of real numbers. It is said that f(x) is O(g(x)) if there exist two constants C and k such that The number of elemental operations obtained from our algorithm was A function g(x) and constants C and k must be found, such that the inequality holds. We propose
Then if , and , we have that
3. Experiments and Results
The database used in this work for Alpha-Beta BAM performance analysis as classifier was proposed by Heberman and it is available in . This database has 306 instances with 3 attributes, which are (1) age of patient at time of operation, (2) patient’s year of operation, and (3) number of positive axillary nodes detected. Database has a survival status (class attribute): () the patient survived 5 years or longer and () the patient died within five year.
The number of instances was reduced at 287 due to some records appeared as duplicated or in some cases records were associated with a same class. From the 287 records, 209 belonged to class 1 and the 78 remainder belonged to class 2.
Implementation of Alpha-Beta BAM was accomplished on a Sony VAIO laptop with Centrino Duo processor and language programming was Visual C++ 6.0.
Leave-one-out method  was used to carry out the performance analysis of Alpha-Beta BAM classification. This method operates as follows: a sample is removed from the total set of samples and these 286 samples are used as the fundamental set; therefore, we used the samples to create the BAM. Once Alpha-Beta BAM learnt, we proceeded to classify the 286 samples along with the removed sample, and this means that we presented to the BAM every sample belonging to fundamental set as well as the removed sample.
The process was repeated 287 times, which corresponds to the number of records. Alpha-Beta BAM had the following behavior: in 278 times, Alpha-Beta BAM classified in perfect way the excluded sample and in the 9 remainder probes it did not achieve to classify correctly. Here, it must be emphasized that incorrect classification appears just with the excluded sample, because in all probes belonging to fundamental set, Alpha-Beta BAM shows perfect recall. Therefore, in 278 times the classification percentage was of 100% and 99.65% in the remainder. Calculating the average of classification from the 287 probes, we observed that Alpha-Beta BAM classification was of 99.98%.
In Table 3 there can be observed results comparisons of some classification methods such as SVM-Bagging, Model-Averaging, in-group/out-group method, fuzzy ARTMAP neural network, and Alpha-Beta BAM. Methods presented in [24, 25] do not show classification results and they just indicate that their algorithms are used to accelerate the method performance.
Alpha-Beta BAM exceeds the other methods by a 9.98% and none of these algorithms use an associative model.
We must mention that Haberman database has records very similar to each other, and this feature could complicate the performance of some BAMs, due to the restriction respecting to the data characteristics, for example, Hamming distance or orthogonality. However, Alpha-Beta BAM does not present these kinds of data limitations and we had proved it with the obtained results.
The use of bidirectional associative memories as classifiers using Haberman database has not been reported before. In this work we use the model of Alpha-Beta BAM to classify cancer recurrence.
Our model present perfect recall of the fundamental set in contrast with Kosko-based models or morphological BAM; this feature makes Alpha-Beta BAM the suitable tool for pattern recognition and, particularly, for classification.
We compared our results with the following methods: SVM-Bagging, Model-Averaging, in-group/out-group method, and fuzzy ARTMAP neural network, and we found that Alpha-Beta BAM is the best classifier when Haberman database was used, because the classification percentage was of 99.98% and exceeds the other methods by a 9.98%.
With these results we can prove that Alpha-Beta BAM not just has perfect recall but also can recall the most of records not belonging to training patterns.
Even though patterns are very similar to each other, Alpha-Beta BAM was able to recall many of the data, so that it could perform as a great classifier. Most of Kosko-based BAMs have low recalling when patterns show features as Hamming distance, orthogonality and linear independence; however, Alpha-Beta BAM does not impose any restriction in the nature of data.
The next step in our research is to test Alpha-Beta BAM as classifier using other databases as Breast Cancer (Wisconsin) and Breast Cancer (Yugoslavia) and with standard databases as Iris Plant or MNIST; therefore we can obtain the general performance of our model. However, we have to take into account the “no free lunch” theorem which asserts that any algorithm could be the best in one type of problems but it can be the worst in other types of problems. In our case, our results showed that Alpha-Beta BAM is the best classifier when Haberman database was used.
The authors would like to thank the Instituto Politécnico Nacional (COFAA and SIP) and SNI for their economical support to develop this work.
P. Domingos, “Unifying instance-based and rule-based induction,” Machine Learning, vol. 24, no. 2, pp. 141–168, 1996.View at: Google Scholar
W. N. Street and Y. Kim, “A streaming ensemble algorithm (SEA) for large-scale classification,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '01), pp. 377–382, ACM, San Francisco, Calif, USA, August 2001.View at: Google Scholar
Y. Wang and I. H Witten, “Modeling for optimal probability prediction,” in Proceedings of the 9th International Conference on Machine Learning (ICML '02), pp. 650–657, July 2002.View at: Google Scholar
K. Huang, H. Yang, and I. King, “Biased minimax probability machine for medical diagnosis,” in Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics (AIM '04), Fort Lauderdale, Fla, USA, January 2004.View at: Google Scholar
O. López-Ríos, E. C. Lazcano-Ponce, V. Tovar-Guzmán, and M. Hernández-Avila, “La epidemia de cáncer de mama en México. Consecuencia de la transición demográfica?” Salud Publica de Mexico, vol. 39, no. 4, pp. 259–265, 1997.View at: Google Scholar
G. X. Ritter and P. Sussner, “An introduction to morphological neural networks,” in Proceedings of the 13th International Conference on Pattern Recognition, vol. 4, pp. 709–717, 1996.View at: Google Scholar
C. Yáñez-Márquez, Associative memories based on order relations and binary operators, Ph.D. thesis, Center for Computing Research, Mexico City, Mexico, 2002.
M. E. Acevedo-Mosqueda, C. Yáñez-Márquez, and I. López-Yáñez, “Alpha-beta bidirectional associative memories based translator,” International Journal of Computer Science and Network Security, vol. 6, no. 5A, pp. 190–194, 2006.View at: Google Scholar
D. DeCoste, “Anytime query-tuned kernel machines via cholesky factorization,” in Proceedings of the SIAM International Conference on Data Mining (SDM '03), 2003.View at: Google Scholar
D. DeCoste, “Anytime interval-value outputs for kernel machines: fast support vector machine classification via distance geometry,” in Proceedings of the International Conference on Machine Learning (ICML '02), 2002.View at: Google Scholar
D. Dash and G. F. Cooper, Model-Averaging with Discrete Bayesian Network Classifiers, Cambridge, UK.
G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds, and D. B. Rosen, “Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps,” IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 698–713, 1992.View at: Publisher Site | Google Scholar
C. Yáñez-Márquez and J. L. Díaz de León, “Memorias asociativas basadas en relaciones de orden y operaciones binarias,” Computación y Sistemas, vol. 6, no. 4, pp. 300–311, 2003.View at: Google Scholar
K. Rosen, Discrete Mathematics and Its Applications, McGraw-Hill, Estados Unidos, Brazil, 1999.
A. R. Webb, Statistical Pattern Recognition, John Wiley & Sons, West Sussex, UK, 2002.View at: MathSciNet