Abstract

In the field of ophthalmology, diabetic retinopathy (DR) is a major cause of blindness. DR is based on retinal lesions including exudate. Exudates have been found to be one of the signs and serious DR anomalies, so the proper detection of these lesions and the treatment should be done immediately to prevent loss of vision. In this paper, pretrained convolutional neural network- (CNN-) based framework has been proposed for the detection of exudate. Recently, deep CNNs were individually applied to solve the specific problems. But, pretrained CNN models with transfer learning can utilize the previous knowledge to solve the other related problems. In the proposed approach, initially data preprocessing is performed for standardization of exudate patches. Furthermore, region of interest (ROI) localization is used to localize the features of exudates, and then transfer learning is performed for feature extraction using pretrained CNN models (Inception-v3, Residual Network-50, and Visual Geometry Group Network-19). Moreover, the fused features from fully connected (FC) layers are fed into the softmax classifier for exudate classification. The performance of proposed framework has been analyzed using two well-known publicly available databases such as e-Ophtha and DIARETDB1. The experimental results demonstrate that the proposed pretrained CNN-based framework outperforms the existing techniques for the detection of exudates.

1. Introduction

In the area of ophthalmology, deep learning is performing a vital role to diagnose serious diseases including diabetic retinopathy (DR). DR is a severe and common disease all over the world. Diabetic retinopathy is a widespread disease that is diagnosed in diabetic patients. The World Health Organization (WHO) has declared that, in 2030, diabetes will be the most serious and 7th highest death-causing disease in the world [1]. In this perspective, it is most important to prevent the human lives from being affected by diabetes. In the case of diabetic retinopathy, some abnormalities including lesions are generated in the retina, which later lead towards the nonreversible blindness and vision impairment. But the early detection and treatment of these lesions can reduce the blindness significantly. The retinal abnormalities in DR also include hemorrhages, cotton wool spots, microaneurysm (MA), retinal neovascularization, and exudates, which are clearly shown in Figure 1. Soft exudates (cotton wool spots) are exemplified as light yellow or white areas with distracted edges, but hard exudates are illustrated as yellow waxy patches in the retina. The existence of exudates in the retinal fundus photographs is one of the most serious causes of diabetic retinopathy [3]. The manual identification of hard exudates is based on the analyst, which is a time-consuming task. On the contrary, automatic exudate identification technique is possible to timely detect the hard exudates accurately. It is also a difficult task to handle the factors, including shape, texture, color, size, and poor contrast of the exudates.

For the diagnosis of diabetic retinopathy, image processing techniques, including optic disk localization, adaptive threshold, image boundary tracing, and morphological preprocessing, are widely used for feature extraction using retinal fundus images. According to [4], early detection of exudates in retina may assist the ophthalmologists for timely and proper treatment of affected person. The U-Net-based technique was applied for the segmentation and detection of exudates on 107 retinal images. The reported network was composed of expensive and shrinking streams, where shrinking has a similar structure with CNNs. The unsupervised segmentation technique can detect the hard exudates on the basis of ant colony optimization. The experimental results were compared with traditional segmentation technique named Kirsch filter and found that the unsupervised approach performed better than the traditional approach [5].

Deep convolutional neural network has also performed an important role in the segmentation and detection of exudates using digital fundus images. Tan et al. [6] developed convolutional neural network to automatically discriminate and segment microaneurysms, hemorrhages, and exudates. The reported method describes that only one CNN can be used for the segmentation of retinal features using a huge amount of retinal datasets with appropriate accuracy. Furthermore, García et al. [7] investigated three classifiers: multilayer perceptron (MLP), radial basis function (RBF), and support vector machine (SVM) to detect the hard exudates. In this report, 117 retinal fundus images were used with different variables, including quality, brightness, and color. Xiao et al. presented a review of exudate detection in diabetic retinopathy on the basis of a large-scale assessment of the related published articles. In the reported paper, the authors focused on the recent and emerging techniques including deep learning to detect and classify the diabetic retinopathy in the retinal fundus images [8].

In the segmentation and detection of exudates, it is necessary to localize the specified features. The location to segmentation approach for exudate segmentation using digital fundus images was reported [9] and composed of three steps including noise removal, hard exudate localization in the retinal fundus images, and hard exudate segmentation of diabetic retinopathy. The noise removal was performed with match filters for vessel segmentation, and optic disc segmentation was performed on the basis of saliency technique. Furthermore, the location of exudates was identified using random forest classifier to categorize the patches into exudate and nonexudate classes. Finally, the local contrast and exudate regions were identified for the segmentation of exudates and were further classified as exudate and nonexudate patches. Asiri [10] presented a review to highlight the recent development in the field of diabetic retinopathy. The automatic detection of diabetic retinopathy and macular degeneration has become one of the hottest topics of recent deep learning-based research work.

In addition, enormous work has been done to automatically identify the exudates on the basis of its features including texture, shape, and size. The well-known exudates detection techniques can be separated into 4 basic types: (1) machine learning-based techniques; (2) threshold-based techniques; (3) mathematical morphological techniques; (4) region growing approaches.

Machine learning-based algorithms contain supervised and unsupervised learning approaches. A. R. Chowdhury et al. [11] applied random forest classifier for the detection of retinal abnormalities. The technique was based on k-means segmentation of fundus photographs and preprocessing performed by machine learning approaches based on statistical and low-level features. Moreover, a novel approach was introduced by Perdomo et al. [12] for the detection of diabetic macular edema on the basis of exudates’ locations using machine learning techniques. Furthermore, Carson Lam et al. [13] applied pretrained models, namely, AlexNet and GoogleNet, for the detection of diabetic retinopathy. The reported article recognized different stages of diabetic retinopathy using convolutional neural networks. The authors highlighted multinomial classification models and discussed some issues about misclassification of disease and CNNs inability in the article.

Threshold techniques utilize variations in color strength among different image regions. In this context, iterative thresholding technique is presented on the basis of particle and firefly swarm optimization to diagnose the exudates and hemorrhages [14]. The threshold technique consisted of image enhancement using preprocessing techniques and vessel segmentation using Top-hat and Gabor transformation. The detection of hemorrhages is performed on the basis of linear regression and support vector machine classifier. Additionally, Kaur and Mittal [15] reported exudate segmentation technique to help the eye specialists for effective planning, and timely treatment in the detection of DR was developed. The authors applied dynamic decision thresholding approach to find the faint and bright edges which help to segment the hard exudates efficiently and pick the threshold values in the retinal fundus images dynamically. Furthermore, Das and Puhan [16] presented the Tsallis entropy thresholding technique to enhance the visibility of exudates in diabetic retinopathy. The obtained features of exudates are further filtered to remove the false-positive values by sparse-based dictionary learning and categorization. The Tsallis technique was analyzed on the basis of the public dataset including DIARETDB1 and E-Ophtha to obtain better accuracy results with 95% accuracy.

A huge amount of contribution has been made to detect the abnormalities in fundus images using mathematical morphological approaches. Morphological techniques utilize several mathematical operators having different structures of elements. Jaafar et al. [17] reported an automated technique for the identification of exudates in fundus photographs. In this work, a new method for pure splitting of fundus colored images was applied, and on the first stage, a segmentation process was performed on the basis of variation calculation of pixels in fundus images and then the morphological technique was applied to filter out the adaptive thresholding outcomes on the basis of segmentation results. Additionally, a random forest technique was applied for the detection of hard exudates in the given fundus images. In the diabetic retinopathy, ensemble classifier is applied for multiclass segmentation and localization of hard exudates [18]. The features of exudates were extracted with coarse grain and fine grain levels with the use of Gabor filter and morphological reconstruction, respectively. The candidate regions were trained on ensemble classifier to classify the exudate and nonexudate boundaries. The four types of publicly available datasets, including Messidor, HEI-MED, e-Ophtha Ex, and DIARETDB1, were used for experiments. Harangi and Hajdu [19] also reported a novel approach to detect the exudates in three steps, including candidate extraction by greyscale morphological technique, precise boundary segmentation by contour-based technique, and exudate classification by region wise classifier. Harangi et al. [20] presented an exudate detection approach using greyscale morphology and active contour techniques to recognize potential exudate states and to extract exact boundaries of the candidates, respectively.

Region growing approaches observe neighbourhoods of start positions and decide whether they can be a member of a particular region. Lim et al. [21] introduced a modified technique of the previous research work. The classification of diabetic and normal macular edema was performed with the help of extracted exudates. The detection of exudates was performed on the basis of signed macular regions to distinguish the diabetic retinopathy from the retinal fundus images. On the basis of contour identification, Harangi and Hajdu [22] introduced exudate detection technique and, additionally, region wise categorization. In this technique, morphological approaches were applied including greyscale morphology to extract the exudate features and proper shape by Markovian segmentation system. A novel approach for detection of diabetic macular edema was developed by Giancardo et al. [23] on the basis of features including exudate segmentation, wavelet decomposition, and color. The experiments were performed on the publicly available datasets, and obtained 88 to 94% accuracy depends on different datasets.

In this research work, the goal of the proposed technique is to detect the exudates from diabetic retinopathy using transfer learning. The main contribution of the proposed work is to apply the transfer learning concept for feature extraction using well-known pretrained deep convolutional neural networks includes Inception-v3, ResNet-50, and VGG-19. Additionally, fusion is performed on extracted features and further classified by softmax for the final decision.

The rest of the article is organized as follows: the proposed method is explained in Section 2; the experimental results and discussion are covered in Section 3. Finally, the findings are concluded in Section 4.

2. The Proposed Technique

In this portion, the proposed framework based on pretrained convolutional neural network architectures is described for retinal exudate detection and classification in fundus images. In the proposed framework, three well-reputed pretrained network architectures are combined together to perform feature fusion, as different architectures can capture different features; if only single architecture had been adopted instead of combining multiple architectures, then the probability would have been high to miss some useful features, and ultimately, it might had affected the performance of the proposed framework.

Initially, data preprocessing is performed on both datasets to standardize the exudate patches and then Gaussian mixture technique is applied to localize the candidate exudate before feature extraction. The novel framework becomes helpful for the low-level feature extraction individually by 3 reputed pretrained convolutional neural network architectures including Inception-v3, VGG-19, and ResNet-50. Moreover, collective features are treated as input into the fully connected (FC) layers for further action including classification, performed by softmax to classify the retinal exudate and nonexudate patches, as shown in Figure 2.

2.1. Dataset

Data gathering is an essential part of the experiments for the analysis of the proposed technique. In this proposed approach, two publicly available retinal datasets are used for experiments: (i) e-Ophtha and (ii) DIARETDB1. E-Ophtha dataset contains 47 retinal fundus images examined by four ophthalmologist experts for manual annotation of exudates [24]. The size of the retinal images varied from the resolution of 1400 × 960 to 2544 ×1 696 pixels. The DIARETDB1 dataset contains 89 retinal fundus photographs with the resolution of 1500 ×1 152 [25]. All the retinal images were captured by the digital specified fundus image camera having a 50-degree field of view. The examination of exudates in the diabetic retinopathy was performed manually and evaluated by five authorized ophthalmologists. Soft and hard exudates were labelled with “exudates” as a single class. The total images were resized to the standard size of DIARETDB1 images having a resolution of 1500 × 1152 pixels, and the estimated image scale size was decided based on the standard size of the retinal optic disc. The samples including affected and healthy retinal images of the e-Ophtha and DIARETDB1 are shown in Figure 3.

2.2. Data Preprocessing

In this phase, the input data are prepared for standardization because of variations in the size of the retinal exudates. Figure 4 demonstrates the distinction between patch sizes among all the extracted retinal exudate patches. The length and the width of the extracted patches are corresponding to the X and Y axis, respectively. It also determines that, with the ignorance of outliers, the collection of retinal exudate patch differs from the size of 25 × 25 to the size of 286 × 487 resolution. In this case, the analysis of the retinal images requires the standard size of the patch for better understanding of data labelling. For this solution, the smallest patch size was selected for the identification of the pathological sign by the experts [26].

In the proposed model, 25 × 25 patch size of colored patch images is used with two types of groups including nonexudate and exudate. The manual exudate patch extraction is performed and obtained 36500 and 75600 exudates from e-Ophtha and DIARETDB1 datasets, respectively. Similarly, for the balance dataset, 35000 and 60000 are extracted nonexudate patches and obtained by the regions of e-Ophtha and DIARETDB1 databases. In the retinal nonexudate patch group, there are various retinal diseases including optic nerve heads, background tissues, and retinal blood vessels. In the proposed technique, all the patches were obtained and extracted without any kind of overlap and can be seen as nonexudate and exudate patch classes in Figures 5(a) and 5(b), respectively.

2.3. Region of Interest Localization

Exudates can be described as bright lesions, highlighted as bright patches and spots in diabetic retinopathy with full contrast in the yellow plane of the color fundus image. Exudate segmentation is applied before the application of feature extraction using the region of interest (ROI) localization. In this step, exudate segmentation is performed to detect the ROI into the retinal fundus images. In this case, numerous approaches have been used including neural network, fuzzy models, edge-based segmentation, and ROI-based segmentation. In the proposed technique, Gaussian mixture approach is used for exudate localization. Stauffer and Grimson [27] used Gaussian sorting to attain the background subtraction technique. In this paper, a hybrid technique is applied with the integration of Gaussian mixture model (GMM) on the basis of adaptive learning rate (ALR) to attain the significant outcome in the form of candidate exudate detection. The region of interest (ROI) is acquired from hybrid approach, as shown in Figure 6. The ROI is fed into the pretrained convolutional neural network models for feature extraction to obtain compact feature vector.

The following equation calculates the region of interest (ROI) by Gaussian mixture model:where is denoted as a weight factor and represents the normalized form of the average. The adaptive learning rate is described to revise frequently with the application of probability constraint to recognize that a pixel is a part of Gaussian distribution or not.

2.4. Pretrained Deep Convolutional Neural Network Models for Feature Extraction

In the start, individual deep convolutional neural network models are applied to extract the features, and later, adopted models are further combined with FC layer for the categorization of fundus images. In this scenario of feature combination, there could be multiple types of features including compactness, roundness, and circularity extracted by the single shape descriptor. In the proposed technique, three up-to-date and the most recent deep convolutional neural network architectures, including Inception-v3 [28], Residual Network (ResNet)-50 [29], and Visual Geometry Group Network (VGGNet)-19 [30], are applied for feature extraction and for further classification of exudate and nonexudate diabetic retinopathy. The above CNN models are already trained for numerous standard image descriptors monitored by the significant extracted features from the tiny images, on the basis of transfer learning [31]. In the following subsections, the adopted deep convolutional neural network architectures are briefly defined.

2.4.1. Inception-v3 Architecture

Inception-v3 architecture is a convolutional network based on convolutional layers including pooling layers, rectified linear operation layers, and fully connected layers. Inception-v3 architecture is designed for image recognition and classification. The proposed model is also based on the Inception-v3 architecture, which pools several convolutional filters of various sizes towards an innovative single filter. Furthermore, the innovative filter not only decreases the computational complexity but also abates the number of parameters. Inception-v3 also attains better accuracy with the combination of heterogeneous-sized filters and low-dimensional embeddings. The basic architecture of the Inception-v3 is shown in Figure 7.

2.4.2. ResNet-50 Architecture

Residual Network-50 is a deep convolutional neural network to achieve significant results in the classification of ImageNet database [32]. ResNet-50 is composed of numerous sizes of convolutional filters to reduce the training time and manage the degradation issue that happens because of deep structures. In this work, ResNet-50 is applied, which is already trained on the standard ImageNet database [33] except fully connected softmax layer associated with this model. The basic architecture of the ResNet-50 is shown in Figure 8.

2.4.3. VGG-19 Architecture

The Visual Geometry Group Network model is based on multilayered operations called a deep neural network model. It is comparable with the AlexNet model except additional convolutional layers. The expansion of VGGNet architecture is based on the replacement of kernel-sized filters with the window size 3 × 3 filters and with 2 × 2 pooling layers consecutively. The general VGG-19 architecture contains 3 × 3 convolutions layers, ratification layers, pooling layers, and three fully connected layers with 4096 neurons [30]. The performance of VGGNet-19 neural network is better than AlexNet architecture due to its simplicity. The basic architecture of the VGG-19 is shown in Figure 9.

2.5. Transfer Learning and Features Fusion

In the field of machine learning, transfer learning is recognized as a most useful method, which learns the contextual knowledge used for solving one problem and applying it to the new related problems. Primarily, the transfer learning approach network is trained for a particular job on the related dataset, and after that, transfer to the objective job is trained by the objective dataset [34]. In this work, the objective of the proposed technique is to experiment the well-known CCN models in both transfer learning context and feature-level fusion, concerning retinal exudate classification, and to validate the achieved results on the e-Ophtha and DIARETDB1 retinal datasets. The fusion approach combines features extracted from fully connected layer using three different DCNNs. The features of all three DCNNs are merged together in single feature vector. Suppose three different CNN architectures with respective FC layers are represented aswhere equation (2) represents three CNN models and equation (3) illustrates a number of FC layers. Therefore, the extracted features are combined in the feature vector space, having dimensions ““, and can be described as

Transfer learning-based techniques are implemented with the pretrained Inception-v3, ResNet-50, and VGGNet-19 architectures from ImageNet. The transfer learning setup is tracked by handling the continuing neural network modules as the fixed feature extractor for the different datasets. Generally, the transfer learning holds the primary pretrained prototypical weights and extracts image-based features through the concluding network layer. Mostly, a huge amount of data are mandatory to train a convolutional neural network from scrape however sometime; it is hard to organize a large amount of database of related problems. Opposite to an ultimate circumstance, in the case of most real-world applications, it is a difficult job or it rarely happens to achieve similar training and testing data. In this scenario, the transfer learning approach is presented and is also proved a fruitful technique. There are two main steps of transfer learning approach: firstly, the selection of pretrained architecture; secondly, the problem similarity and its size. In the selection phase, the choice of pretrained architecture is based on the relevant problem which is associated with the objective problem. In the case of similarity and size of the dataset, if the amount of the target database is lesser (for example, smaller than one thousand images) and also relevant to the source database (for example, vehicles dataset, hand-written character dataset, and medical datasets), then there will be more chances of data over fitting. In another case, if the amount of the target database is sufficient and relevant to the source training database, then there will be a little chance of over fitting and it just needs fine tuning of the pretrained architecture. The deep convolutional neural network (DCNN) models including Inception-v3, ResNet-50, and VGG-19 are applied in the proposed framework to utilize their features on fine-tuning and transfer learning. In the beginning, the training of the selective convolutional neural network models is performed using sample images taken by the standard publicly available “ImageNet” database; moreover, the idea of transfer learning for fused feature extraction has been implemented. In this case, the proposed technique assists the architecture to learn the common features from the new dataset without any requirement of other training. The independently extracted features of all the selective convolutional neural network models are joined into the FC layer for further action including the classification of nonexudate and exudate patch classes performed by softmax.

3. Results and Discussion

The experiments are performed on “Google Colab” using graphics processing units (GPUs). For the performance evaluation, two publicly available standard datasets are selected for experiments. The training phase is divided into 2 sessions, and each session took 6 hours to complete the experimental task. The designed framework of the proposed technique is trained on 3 types of convolutional neural network architectures including Inception-v3, ResNet-50, and VGG-19 individually, and after that, transfer learning is performed to transfer the knowledge data into the fused extracted features. The attained experimental results from the individual convolutional neural network is compared and analyzed with the set of fused features accompanied by various existing approaches. 10-fold cross-validation approach is applied for performance evaluation. Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k = 10 becoming 10-fold cross-validation [35]. The input data are divided into different ratios of training and testing datasets used in the experiments of the proposed methodology. The splitting data are performed in three different ways with the ratio of 70% for training and 30% for testing, similarly 80% for training and 20% for testing, and 90% for training and 10% for testing the CNN architectures. Table 1 shows the comparative analysis of three individual CNN architectures with the proposed technique on the basis of data splitting using e-Ophtha dataset.

Similarly, Table 2 illustrates individual architecture and proposed technique results in terms of classification accuracy using DIARETDB1 dataset.

In the context of classification performance, a true positive is an outcome where the model correctly predicts the positive class. Similarly, a true negative is an outcome where the model correctly predicts the negative class. However, false negative (FN) and false positive (FP) represent the samples, which are misclassified by the model. The following equations can be applied for the performance assessment.

Accuracy: it is a measure used to evaluate the model effectiveness to identify correct class labels and can be calculated by the following equation:

F-measure: it averages out the precision and recall of a classifier having a range between 0 and 1. Best and worst scores are represented by “0” and “1”, respectively, computed as follows:

Table 1 also illustrates the output as exudate patches, with respective F1 score, recall, precision value, and accuracy. The highest classification accuracy of individual CNN architectures and proposed technique is achieved with the help of splitting data approach. Overall, it is mentioned that the proposed approach achieves significant classification accuracy for retinal exudate detection than the individual CNN architectures. Using e-Ophtha dataset, Table 1 shows that the highest classification accuracy of individual CNN architectures including Inception-v3, ResNet-50, and VGG-19 is 93.67%, 97.80%, and 95.80%, respectively, but the proposed approach attained a classification accuracy of 98.43%. Using the DIARETDB1 dataset, Table 2 illustrates that the highest classification accuracy of individual CNN architectures including Inception-v3, ResNet-50, and VGG-19 is 93.57%, 97.90%, and 95.50%, respectively, but the proposed approach attained a classification accuracy of 98.91%.

In order to make better understanding of classification accuracy results, Figure 10 and Figure 11 show the comparative classification accuracies of the proposed model against the individual models using e-Ophtha and DIARETDB1 datasets, respectively.

Additionally, Table 3 demonstrates the comparative results obtained by proposed framework and the existing familiar approaches for the detection of retinal exudates. Table 3 illustrates the classification accuracies of [18] as 87%, [36] as 92%, and [37] as 97.60% and 98.20%. But the proposed framework achieved higher accuracy than the abovementioned techniques using both e-Ophtha and DIARETDB1 datasets.

The comparative classification performance of the proposed framework against [37] is a little bit high, but the extracted features achieved by the proposed framework can support the final results and specifically be very meaningful in clinical practices. The comparative analysis shows that the proposed pretrained CNN-based transfer learning technique outperforms the existing individual methods in terms of accuracy against both datasets for the detection of retinal exudates.

4. Conclusions

In this article, a pretrained convolutional neural network- (CNN-) based framework is proposed for the detection of retinal exudates in fundus images using transfer learning. In the proposed framework, pretrained models, namely, Inception-v3, Residual Network-50 (ResNet-50), and Visual Geometry Group Network-19 (VGG-19), are used to extract the features from fundus images, based on transfer learning for the improvement of classification accuracy. Finally, the classification accuracy of the proposed model is compared with various DCNN models separately, as well as compared with the existing techniques. The proposed transfer learning-based framework has been evaluated and outstanding results in terms of accuracy are obtained, instead of training from scratch. Hence, the accuracy of the proposed approach outperforms the other existing techniques for the detection of retinal exudates. In future work, the proposed framework can be modified to discriminate hard and soft exudates. Moreover, the proposed framework can also be extended to diagnose hemorrhages and microaneurysms for diabetic retinopathy.

Data Availability

In the experiment, the data used to support the findings of the proposed framework are available at http://www.adcis.net/en/Download-Third-Party/E-Ophtha.html and http://www.it.lut.fi/project/imageret/diaretdb1/index.html.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

The authors are thankful to Dr. Somal Sana (MBBS), who provided knowledge about diabetic retinopathy. This work was supported in part by the National Key R&D Program of China under Grant 2018YFF0214700.