Abstract

A brain tumor is an abnormal mass or growth of a cell that leads to certain death, and this is still a challenging task in clinical practice. Early and correct diagnosis of this type of cancer is very important for the treatment process. For this reason, this study aimed to develop computer-aided systems for the diagnosis of brain tumors. In this research, we proposed three different end-to-end deep learning approaches for analyzing effects of local and deep features for brain MRI images anomaly detection. The first proposed system is Directional Bit-Planes Deep Autoencoder (DBP-DAE) which extracts and learns local and direction features. The DBP-DAE by decomposition of a local binary pattern (LBP) into eight bit-planes extracts are directional and inherent local-structure features from the input image and learns robust feature for classification purposes. The second one is a Dilated Separable Residual Convolutional Network (DSRCN) which extracts high (deep) and low-level features. The main advantage of this approach is that it is robust and shows stable results regardless to size of image database and to solve overfitting problems. To explore the effects of mixture of local and deep extracted feature on accuracy of classification of brain anomaly, a multibranch convolutional neural network approach is proposed. This approach is designed according to combination of DBP-DAE and DSRCN in an end-to-end manner. Extensive experiments conducted based on brain tumor in MRI image public access databases and achieves significant results compared to state-of-the-art algorithms. In addition, we discussed the effectiveness and applicability of CNNs with a variety of different features and architectures for brain abnormalities such as Alzheimer’s.

1. Introduction

Computer diagnostic systems (CADs) [1] have grown significantly in recent years to accurately diagnose diseases [2]. Furthermore, modern artificial intelligence techniques, such as deep learning algorithms using computer vision-based diagnostic tools, are key components of these advances [3]. One of the fundamental challenges of CAD systems is accurate brain disease detection. Accurate and early detection of brain diseases can improve the healing process and control the condition of patients [4]. Currently, the most successful technique of diagnosis is medical imaging [5]. Magnetic resonance imaging (MRI) is more effective in diagnosing brain diseases than other imaging techniques such as CT scans and X-rays because of its better resolution in soft tissue [6]. Different physicians may present conflicting diagnostic results due to environmental factors and manual interpretation, which lead to the loss of large amounts of information in MRI data. Therefore, due to the fast development of CAD systems in the field of computer vision and deep learning, it can be an effective tool to help physicians to increase diagnostic accuracy [7]. Nowadays, scientists are presenting different types of brain MRI classification methods. The problem of diagnosing brain abnormalities is essentially a challenge of MRI image classification. In general, these types of research are categorized into two main groups, multiple and single label classification algorithms. The multilabel classification studies are focused on detection of types of brain diseases. For example, in the field of multilabel classification, we can refer to the diagnosis of Alzheimer’s disease (AD) by MRI images. The research dataset for this field of study consists of MRI images with Mild Demented, Moderate Demented, Non-Demented, and Very Mild Demented classes. For instance, in, [8], the authors present Alzheimer’s disease detection builds on the deep learning algorithms. The research utilized a weakly supervised learning (WSL) technique with name of ADGNET model. Results of these studies show that these proposed systems achieved significant results with 99.61% accuracy rate in detection of types of Alzheimer’s disease. Furthermore, in the binary classification scenario, morphometrics and deformation-based approaches are studied to draw a pattern of structural changes in the brain. In addition, they focused on binary classification, which distinguishes an abnormal brain pattern from a healthy one based on MRI or CT images. Due to a variety of brain diseases that do not have visual data sets and have not been studied in deep learning algorithms, detection of abnormal brain pattern is more applicable than diagnosing the disease. For instance, U-Transformer-based anomaly detection framework (UTRAD) algorithm is proposed [9] for abnormality detection of medical images such as head-CT, brain MRI, and retinal-OCT. In addition, the UTRAD algorithm consists of reconstruction-based methods and pre-trained feature-based methods. Another similar study, the attention-based deep ensemble model is proposed in [10] for brain age estimation and anomaly detection. In this study, the ensemble of the attention-based residual network with uncertainty estimation is employed for fetal brain anomaly detection from MRI images. In the same manner, our proposed approach relies on brain anomaly detection from MRI images.

According to our studies, the main challenge of deep learning methods in case of brain abnormally detection for classification can be categorized into number of groups. Due to lack of training dataset and overfitting problems, literature studies try to utilize transfer learning, extraction robots feature sets, and image data generation tools to improve accuracy of classification algorithms.

Based on our category, the first group of studies is focused on the transfer learning methods as a primary solution for these problems. For instance, CNN-based approach [11] based on ResNet-50 model by employing transfer learning with crop normalization preprocessing steps is proposed. In the same paper, the inception V3 and VGG 16 pre-trained models are compared with ResNet-50-based approach in case of accuracy of detection of brain abnormalities. Another similar study for brain abnormality detection proposed [12] based on a mobile net pre-trained model as a deep feature extraction tools and for classification employed feedforward networks with the chaotic bat optimization algorithm. In case of deep feature extraction with pre-trained well-known models, another study [13] proposed a deep learning framework includes attention and hypercolumn techniques with residual block. The paper presented the BrainMRNet model which includes attention modules and hyper column technique. The attention modules employed for an image augmentation method for select important areas of each image. In addition, the convolutional layers are utilized as a feature extractor technique as hypercolumn techniques for brain tumor [14] detection. Furthermore, in another study [15], pre-trained well known models such as Inception-v3 and DensNet201 are utilized as feature extraction for classification task. In addition, transfer learning methods including Alex Net and Google Net are utilized in this study to enhance approach accuracy performance. Disease classification methods, like research to diagnose brain abnormalities, suffer from a lack of training data sets. For example, in case of Alzheimer’s disease due to lack of training data sets, different research are focused on the transfer learning and deep feature extraction. For instance, in [16], the Alex Net framework is proposed to extract significant features effectively from MRI images. In another similar study [17], A temporal convolutional network is designed with multiple deep sequence-based architecture. In case of decreasing processing cost, the depth wise separable convolution (DSC) is proposed in [18]. For transfer learning, two well-known models are employed, and significant classification accuracies are obtained, demonstrating the efficacy of the proposed depth wise separable convolutional neural network.

In addition, to achieve high accuracy of classification results with less training data set, another solution using efficient and robust feature set has been proposed. For instance, in [19], radial basis function neural network is proposed with utilized 2D discrete wavelet transform (DWT), and entropy-based feature sets. In another research [20], the naive Bayes method is employed for feature extraction and classification. In the case extraction robust and efficient features some of research employed segmentation and classification algorithms in a pipeline manner. For instance, the study in [21] a proposed deep learning framework which contains segmentation deep learning based model and classification of these segmented features. Another similar study [22] presented based on Gabor-like multiscale texture for segmentation and modification of AdaBoost for classification.

According to our category for articles on the diagnosis of brain abnormalities, another solution is unsupervised brain outliers’ detection. For instance, the authors of [23] proposed the MADGAN model includes a different two-step method for brain MRI scans for distinguishing AD. This unsupervised medical anomaly detection utilized generative adversarial network (GAN) model with multiple adjacent brain MRI slice reconstruction technique. In addition, in similar study [24], unsupervised anomaly detection (AnoGAN) was examined in the administration of value of 1H-MRS a person’s brain spectra.

Based on aforementioned studies, it can be concluded that the main problem of deep learning methods in case of brain abnormally detection is the lack of training dataset, extraction efficient and robots feature sets to improve accuracy of classification algorithms. Literature studies are focused on the transfer learning methods as a first solution for these problems. However, transfer learning is an efficient method when fine-tune dataset is similar to main train dataset (ImageNet dataset). In case of brain abnormality detection, the MRI images are gray color image datasets, but ImageNet dataset is RGB color space datasets. The other solution proposed by related studies is utilizing extension dataset functions. These kinds of functions increase the processing cost. Thus, our proposed approach is based on deep and robust extraction capabilities and improvement of local feature sets. This research proposed three end-to-end deep learning frameworks, namely, directional bite-planes (DBP) [25] with a deep autoencoder model (DAE) [26, 27], dilated separable residual convolution network (DSRCN), and multibranch approach for brain MRI anomaly detection. In the proposed DBP-DAE, we analyze the directional and robust features set affection in accuracy of classification. By decomposition of local binary pattern (LBP) into eight bite-planes, the local and direction feature sets are extracted. For achieving more robust and compact datasets, we utilized DAE. In this approach not only DAE decreases the dimension of feature sets but also it helps to extract more robust feature sets for classification purposes. In the second approach, we proposed DSRCN. The separable residual convolution network is inspired by idea [28] for face recognition. Furthermore, because of extraction more enhance feature set in this type of shallow deep learning approach, we utilized convolutional kernel with different dilated rates instead of standard one. This proposed approach achieved significant results in terms of accuracy because of extraction low- and high-level deep features during the training phase from a MRI image. To explore affection of concatenation of low- and high-level deep features with local direction feature sets, we fusion these features as an end to end model with a name of the multi-branch model.

The main contribution of this study is as follows:(1)We designed three CNN models to detect anomalies in brain MRI images to discuss their advantages and disadvantages and to offer possible solutions to problems of excessive computational complexity and lack of training samples.(2)Numerous experiments are performed on diverse datasets of different types of brain abnormalities, such as tumors and Alzheimer’s.(3)We compared our proposed three architectures with existing methods and showed that the proposed methods are competitive with state-of-art methods.

The structure of this paper is as follows: Section 2 introduces proposed system; Section 3 describes public dataset and experimental results and analysis; and Section 4 and 5 give conclusion and discussion.

2. Proposed Method

In this part, the proposed three approaches describe in detail as follows: directional bite-planes (DBP) with a deep autoencoder model (DAE), dilated separable residual convolution network (DSRCN), and multibranch approach.

2.1. DBP- DAE

The proposed approach in the local feature descriptor part contains directional bite-planes (DBP) [25] with a deep autoencoder model which we named as DBP-DAE. DAE has been used to prevent duplication and redundancy of the DBP feature to achieve robust classification accuracy. The DBP-DAE approach is summarized as below:

2.1.1. Directional Feature Extraction

For each input MRI image f (x, y), the LBP feature descriptor [29] processed is achieved by equation (1).where represents the intensity values and represent 8 neighboring pixels (n = 0, …, 7) at center of windows . Therefore, can be computed as follows:where implies the nth bit-plane (n = 0, …, 7) and called DBP. The DBP feature descriptor model contains directional information of each MRI input image as presented in Figure 1 [25, 29].The location information of each center pixel from the surrounding pixels is presented as follows:

2.1.2. Deep Autoencoder

To reduce feature redundancy and decrease processing cost, we have used four among eight bit-plans in the DBP-DAE approach. To reduce the dimensions of DBP, we utilized deep autoencoder (DAE). As presented in Figure 2, the autoencoder is a type of the neural network with a symmetric structure with the equal number of units in the input and output layers.

The main advantage of this structure extracts and learns abstract features from input images. The DAE extracts and learns abstract features from input images than by feeding these features to logistic regression on the top of this deep model. To use DAE as a dimensional reduction and part of the deep learning method for detecting brain abnormalities, the DAE training is employed. In this phase, the DAE model by training based on the input DBP learns the deep feature in a hierarchical mode. To present a nonlinear mapping of input images the activation function f(.) is applied on the basis equation (4).

For estimating the error rate to update the weights, a cross-entropy cost function is applied for reconstruction. A cross-entropy cost function is applied for reconstruction and utilized based on mini-batch size input images, as presented in equation (5).where D declare as the input feature vector and M defined as mini-batch size regarding input features and reconstruction images (, ) [30].

Consequently, partial differentials equations regard to , , factors can be determined as follows:where signifies the input of the i for the nods to the hidden layer and implies the features of the hidden layer for reconstructing. Furthermore, is declared as the sigmoid activation function. To implement classification task with DAE, the output layer of the deep model (reconstruction layer) is replaced with the logistic registration classifier and fine-tuning based upon backpropagation method with SoftMax activation function. The probability estimation of input image with I classification is calculated from the following:where R is the last layer (output feature vector) for each the input vector. Additionally, W and b are the weights and biases of SoftMax classifier, and the pseudocode of DBP- DAE is presented in Algorithm 1.

Initialize mini-batch size, epochs number (EP), pretraining learning rate (PLR), number of layers (NL), dimension (D), total number of classes (C), and neurons in each hidden layer n[L]
(1)Input DBP as a input feature Vector with D dimensions
(2)For each layer (NL):
(3) 1< L < D
(4) D-input and D-hidden
(5) If (L = 1)
(6)  n [1] = D-hidden
(7)  D = D-input
(8) Else
(9)  Dimension of visible layer n[L − 1]
(10)  Dimension of hidden layer n[L]
(11) End
(12) Initialize , ,
(13) For each pretraining epoch
(14)  For each mini batch
(15)   Compute reconstruction
(16)    
(17)   Compute Cost
(18)     
(19)   Update , ,
(20)  End
(21) Freeze reconstruction layer
(22)End
(23)Initial parameters of logistic regression layer
(24)Input of classifier layer = n[D]
(25)Output of classifier layer = C
(26)For each fine-tuning
(27) For each mini batch
(28)  Compute probability function of each class regarding to equation (6)
(29)  Update weights with backpropagation
(30) End
(31)End
(32)End
2.2. Dilated Separable Residual Convolutional Network (DSRCN)

For extraction of deep and low-level features of MRI brain images, we proposed a dilated depth wise separable residual convolution network (DSRCN). This model to develop depth wise separable residual convolution module was employed the depth wise separable module with the residual network model. Assume the classification method based on the CNN model with number of labels, the input feature and labels are and , respectively, so cost function of training set calculated are as follows:

In this equation, is the model parameter and is normalization in terms of probability distribution. In case of improving accuracy, the dilated convolution [31] is utilized. This factor describes the stride of dilated convolution kernel during training phase. Assume as a discrete function, also and as a discrete filter with size of . The discrete convolution operative of  can be described as Let represents the dilation factor. In this manner, the discrete convolution operator  has the following definition:In this case, refers as a dilated convolution ( -dilated convolution). In this approach, each block of this model contains three SeparableConv2D, one Conv2D, and Max pooling layer. After each layer batch normalization, ReLU activation function is applied. For the blocks of DSRCN, as presented in in Figure 3, two layers of separable Conv 2D and Conv2D with dilation rate (1,1) are connected to the input layers. Separable Conv 2D contains 64 (1 × 1) filters with batch normalization and ReLU function. In the same manner continuously, two separable Conv 2D with filter size 64 (3 × 3) with dilation rate (2, 2) and 64 (1 × 1) with batch normalization and ReLU function are employed. The last separable Conv 2D layer connected to Max pooling 2d (3 × 3) layer. In the end of each block Max pooling 2D concatenated to normalized output of Conv 2D. The second and third block architecture is same as the first block architecture with 128 and 256 filter size, respectively. The main advantage of this architecture is the extraction of low-level features along with deep features as presented in Figure 4. Depending on the scenario, the number of blocks and filters changes. Builds on the experimental results in this study, we utilized three blocks of DSRCN for abnormality detection of brain images. The architecture of the proposed approach is shown in Table 1.

2.3. Multibranch Approach

In the last phase of our proposed approach, a multi-input module is designed with two inputs to simultaneously estimate global features and local texture features. These models have numerous layers that are used to extract features. These layers are consisting of convolutional, pooling, batch normalization, rectified linear unit (ReLU), SoftMax, and fully connected layers. To extract local and deep features from the input , the convolutional layer relies on several kernels with weights for each layer is as represented in the following equation: denotes the output feature map obtained by processing the dot products of the kernels and the input with bias . Two key types of the pooling layer are maximum and average pooling. The output of the pooling method is the downsampled version of the complete feature map of which is depending upon (m, n) as the window filter size:

The last and most important layer is the fully connected layer. Assume that layer is fully connected, this layer expects feature set produce a size of as input. The output feature sets of a fully connected layer:where denotes the weights connect in layer and the feature sets of layers . As shown in Figure 5, after finetuning the DBP-DAE and networks with the MRI brain image, the classification layer (SoftMax) is removed. For the DSRCN, global average pooling 2D is added on top of the model for dimensional reduction purposes and in the concatenate with a compressed fully connected layer of the DBP-DAE model which contains 200 nods. After a concatenate layer, a fully connected layer with 4096 is applied. In this case, the extracted features of the two deep models are fused together. In the end, the last fully connected layer attaches to SoftMax classification layer for brain anomaly detection describes as [32]

3. Experimental Results

Several experiments with different databases and types of anomalies were conducted to evaluate and present the performance of the proposed method. The experiments were carried out using an 8400 core i5 CPU, 16 gigabytes of RAM, and a NVidia GTX-1050 TI with 4 GB of memory. In this study, all images resized to (256,256) to achieve standard comparation among different datasets.

3.1. Dataset

In this study, two types of data sets are used to detect Alzheimer’s and tumor-based abnormalities. To recognize tumor-based abnormalities [33], we used public access databases by specialists, such as physicians and radiologists, obtained from volunteer patients. The database contains 253 images, separated into 98 normal images and 155 tumor images. The quality and resolution of the images is low, and it has been converted to the JPEG format. For anomaly detection in case of Alzheimer’s disease, we utilized public access dataset, namely, Alzheimer’s classification dataset (KACD) [34]. The KACD dataset includes 6400 MRI 2D images separate into four different groups: nondemented, very mild demented, mildly demented, and moderately demented which, respectively, contains 3200, 2240, 896, and 64 data. This dataset is separated into train, validation, and test folders. Some samples of these databases are presented in Figure 6.

3.2. Configuration of Proposed Approach

In this test, with the help of grid search, we analyze different architectures in terms of the number of layers and the size of DAE nodes according to accuracy of classification. The final parameters set for DAE are given in Table 2.

In case of the DSRCN, we utilized different number of blocks with different size of kernels to find optimal architecture for brain anomaly detection. The parameter configuration of model is presented in Table 3 and the structure of the model is as described in Table 3.

3.3. Experiments on Two Public Datasets

For the first experiment, we plotted the obtained accuracy of KACD and brain tumor datasets using the DBP-DAE, DSRCN, the multibranch approach in Figure 7. As seen in this figure, the DSRCN model stays on the best validation in KACD and brain tumor. In addition, the DBP-DAE model achieves the least accuracy that may cause overfitting. In the same manner for the brain tumor DSRCN model remains on the best accuracy of classification. Experiments results in Figure 7 and Table 4 concern that DBP-DAE may not be able to perform with significant accuracy in diagnosing Alzheimer’s disease. On the contrary, in the field of diagnosis of brain tumor anomalies, acceptable results have been obtained by this algorithm. The DSRCN model achieved the best accuracy in detection brain anomaly based on tumor and Alzheimer’s disease with 0.95 and 0.96, respectively. The lowest accuracy rates are achieved by a multibranch approach with 0.88 in brain tumor and DBP-DAE in Alzheimer anomaly detection.

To provide clear results for both public access datasets, we employed the ROC curve for the proposed approaches in Figure 8. Following the results, it is appeared that the best AUC results in tumor anomaly is for DSRCN with 0.998 and the lowest one is for DBP-DAE with 0.92. However, in Alzheimer’s anomaly detection the lowest result is for Multi-branch approach which can be the cause of overfitting. Depending on the result, it can be concluded that the proposed DSRCN methods because of extraction deep and local features during training phase can achieve stable and significant results. In addition, the proposed DBP-DAE due to the inability to diagnose Alzheimer’s disease indicates that this approach is not able to extract low-level features; in this case, it also affects the accuracy of anomaly detection of the multibranch approach.

3.4. Comparison with State-of-the-Arts

To evaluate the performance of the proposed approaches with existing systems, the accuracy of our proposed and the state-of-the-art methods on the KACD and brain tumor databases is listed in Table 5. The stat-of-art methods listed in the table are implemented in MATLAB and KERAS platforms with the same configurations mentioned in the research. Based upon the experimental results, it seems that the diagnosis of Alzheimer’s anomaly is more difficult than the diagnosis of a brain tumor. The highest accuracy for this type of anomaly is for BrainMRNet, ResNet-50 (augmentation), and Mobile Net-ELM-CBA, which are 0.94.0.93 and 0.88 accuracy rates, respectively. Based on these findings, the DSRCN algorithm stayed on the best results with 0.95 accuracy rates. In case of brain tumor detection, the highest accuracy rate is for BrainMRNet which is equal to our proposed approach (DSRCN) with 0.96 accuracy rates. In this data set, the least accuracy is achieved with the multibranch approach, which can be due to the overfitting problem. Considering the accuracy results of KACD and brain tumor MRI images, it can be concluded that the proposed DSRCN approach, considering the extraction of image features by the end-to-end method, obtains stable results in different conditions. In addition, the proposed DBP-DAE approach, despite better classification accuracy than other methods such as the Naive Bayes with ELM, Mobile Net-SNN-CBA and CNN, failed to achieve consistent results in different scenarios using local extraction features. In the case of the multibranch deep learning approach achieved the lowest accuracy in small size of database (Brain tumor) after the naive Bayes method with ELM. One of the reasons for the inefficiency of this approach is the overfitting problem.

4. Discussion and Conclusion

Most importantly, our CNN approaches achieved outstanding performance without the use of additional data or training functions, comprehensive data enhancement, or segmentation algorithms. It is predictable that in the future, these approaches will succeed in processing large databases (big data). Experimental results showed that the DSRCN method significantly improves the performance of detecting brain abnormalities in different database sizes. In addition, by extracting deep and large features from the input image, this model solves two main problems such as lack of training data and overfitting. In the DBP-DAE approach, due to the use of robust local features, the performance of the method in the brain tumor image dataset is considerable. This is due to the inability to extract deep features which includes more details features from images. In case of fusion local and deep features the proposed approach achieve remarkable results in KACD dataset. It is appeared that this model by extracting deep and local features achieved outstanding accuracy results in high size of datasets. However, in the small data set, insufficient results were obtained due to overfitting.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.