Table of Contents Author Guidelines Submit a Manuscript
Mobile Information Systems
Volume 2019, Article ID 6142839, 14 pages
Research Article

Computer-Assisted Diagnosis for Diabetic Retinopathy Based on Fundus Images Using Deep Convolutional Neural Network

1Department of Computer Science & Information Engineering, National Central University, Taoyuan 32001, Taiwan
2Department of Ophthalmology, Taipei Veterans General Hospital, School of Medicine, National Yang-Ming University, Beitou, Taipei, Taiwan
3Department of Ophthalmology, Fu Jen Catholic University Hospital, New Taipei City, Taiwan

Correspondence should be addressed to Yung-Hui Li;

Received 27 April 2018; Revised 28 June 2018; Accepted 27 November 2018; Published 2 January 2019

Guest Editor: Antonio Celesti

Copyright © 2019 Yung-Hui Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Diabetic retinopathy (DR) is a complication of long-standing diabetes, which is hard to detect in its early stage because it only shows a few symptoms. Nowadays, the diagnosis of DR usually requires taking digital fundus images, as well as images using optical coherence tomography (OCT). Since OCT equipment is very expensive, it will benefit both the patients and the ophthalmologists if an accurate diagnosis can be made, based solely on reading digital fundus images. In the paper, we present a novel algorithm based on deep convolutional neural network (DCNN). Unlike the traditional DCNN approach, we replace the commonly used max-pooling layers with fractional max-pooling. Two of these DCNNs with a different number of layers are trained to derive more discriminative features for classification. After combining features from metadata of the image and DCNNs, we train a support vector machine (SVM) classifier to learn the underlying boundary of distributions of each class. For the experiments, we used the publicly available DR detection database provided by Kaggle. We used 34,124 training images and 1,000 validation images to build our model and tested with 53,572 testing images. The proposed DR classifier classifies the stages of DR into five categories, labeled with an integer ranging between zero and four. The experimental results show that the proposed method can achieve a recognition rate up to 86.17%, which is higher than previously reported in the literature. In addition to designing a machine learning algorithm, we also develop an app called “Deep Retina.” Equipped with a handheld ophthalmoscope, the average person can take fundus images by themselves and obtain an immediate result, calculated by our algorithm. It is beneficial for home care, remote medical care, and self-examination.

1. Introduction

The global cost of treating adult diabetes and its induced chronic complications is USD 850 billion in 2017. Diabetic retinopathy (DR) is one of the most common and serious complications of diabetes mellitus and is a leading cause of low vision and blindness in working-age adults [1, 2]. The International Diabetes Foundation (IDF) estimated that the global population with diabetes in 2017 was 451 million and over one-third of the population had DR [3], representing a tremendous population at risk of visual impairment or blindness. By 2045, the worldwide prevalence of diabetes is expected to increase to 693 million people [3]. In addition, almost half (49.7%) of all people living with diabetes remain undiagnosed for years because of silent symptoms [3]. However, long-term high blood sugar levels ultimately destroy blood vessels and nerves, leading to complications, such as cardiovascular disease and blindness. Detection and treatment of DR in the early stage will prevent its development or progression.

The diagnosis and severity of DR are based on retinal examination. Clinically, the classification of DR can be divided into two categories: (1) nonproliferative diabetic retinopathy (NPDR) with exudation and ischemia in different severity but without retinal neovascularization, and (2) proliferative diabetic retinopathy (PDR), which is characterized by neovascularization with or without its complications of traditional retinal detachment and the initial appearance of vitreous hemorrhage. Microvascular diseases of NPDR include microaneurysms, retinal dot and blot hemorrhages, lipid exudates, venous beading change, and intraretinal microvascular abnormalities (IRMA). Based on the degree and extent of these lesions, NPDR can be divided into three levels: mild NPDR presents with microaneurysms or few retinal hemorrhages; moderate NPDR shows more severe microaneurysms, hemorrhage or soft exudate, but not reaching the level of severe NPDR, which is associated with marked retinal hemorrhage in 4 quadrants, venous beading in at least 2 quadrants and IRMA in at least 1 quadrant. Table 1 summarizes the DR category with its manifestation.

Table 1: Classification of diabetic retinopathy [4].

Manual grading by ophthalmologists has been the mainstay of DR screening in the past decades. However, due to the expanding population with diabetes and the recent advances in technology, automated detection of DR offers the potential to provide an efficient and cost-effective approach to screening. Current commercialized automated retinal image analysis systems (ARIAs), such as iGradingM, Retmarker, and EyeArt, focus on differentiating diseased/no disease, or detection of referable DR [5, 6]. Nonetheless, ARIAs are currently not sufficiently sophisticated to classify different levels of DR, which means that identifying the subtle change between levels is still a challenging task for the technique of medical image analysis. Figure 1 shows example fundus images for each lesion.

Figure 1: Examples of fundus images showing different lesions. (a) 0 level (normal), (b) level 1, (c) level 2, (d) level 3, and (e) level 4.

In addition to the accuracy of medical image processing, the mobility and portability of medical examination equipment are of equal importance. Currently, the acquisition of digital fundus images requires the cooperating patient to sit in front of the fundus camera in the room, with ambient lighting minimized or turned off. The patient needs to look forward at the camera at a fixed light and use infrared fundus imaging to focus on the area of interest. Many nonmydriatic cameras have software that automatically detects the posterior pole of the eye and takes a picture when it is focused behind the eye. The RGB image sensor still requires a flash to capture images in the visible light spectrum. However, the digital fundus imagers most popularly used in the clinics are bulky and expensive, as shown in Figure 2, which limit its capability for large-scale screening.

Figure 2: Commercialized fundus camera used in the clinics. The machine is bulky and expensive, making it difficult for large-scale screening.

One of the major goals for this study, besides increasing the classification accuracy using artificial intelligence, is to come up with a new system framework for DR screening. The new framework combines the advantages of mobile computing, cloud computing, big data, and artificial intelligence. The components of the proposed framework can be described as following:(i)Mobile block: The fundus image acquisition is achieved using a hand-held fundus imager, coupled with a self-developed iPhone APP. The imager is small and light-weighted. It can be carried inside a backpack. The deployment of such devices is extremely convenient. The portable nature due to its small form factors and light-weight can benefit the medical service for remote rural areas.(ii)Cloud block: The proposed system does not sacrifice its computational performance for its portability. Thanks to the architecture of cloud computing, the core of the computational resources is moved to the cloud and can be scaled up flexibly as the request increases. We developed highly efficient deep learning algorithm which runs on cloud server and is able to respond to the diagnosis request within 10 seconds.(iii)Big data block: The cloud-based architecture also helps to collect big data. As more and more end devices (hand-held fundus camera) are being used, the number of fundus images that passed into the cloud will increase accordingly. By storing all of these fundus image data, we are able to make good use of such big dataset, such as machine learning model retraining, new feature exploration, or cross-domain data-mining for different types of ophalmological diseases.

In summary, in this paper, we propose a new system framework for DR screening, based on artificial intelligence, mobile computing, cloud computing, and big data analytics. Figure 3 shows an illustration of the proposed system. Such system is a new paradigm for telemedical service and will benefit rural areas where the medical resources are insufficient.

Figure 3: The proposed framework. It is composed of mobile block (image acquisition), cloud block (deep learning algorithm on cloud computing platform) and big data block (big data collection and training).

In the following sections, we will gradually unveil our ideas and show the experimental results. In Section 2, we performed related literature review for important algorithms for the foundation of DR classification, which is retinal vessel segmentation. In Section 3, we performed related literature review for DR detection. In Section 4, we illustrated the proposed deep learning and machine learning algorithms in full details. We show the experimental results in Section 5 and discuss some important findings in Section 6. Finally, a conclusion is given in Section 7.

2. Literature Review of Retina Vessels Segmentation

In the process of identifying DR, it is pivotal to locate the retinal vessels. If the vessels position can be correctly known, we can determine whether the patient is suffering from DR based on information about the precise location and thickness of the vessels. However, vessel tracking is a complex process because of the many other substances besides vessels in fundus images. Numerous vessel segmentation methods have been proposed, which can be broadly divided into five categories: vascular tracking, matched filtering, morphological processing, deformation models, and machine learning.

2.1. Methods of Vascular Tracking

Methods of vascular tracking are based on the continuous structure of vessels, by starting at an initial point and following the vessels until no further vessels are found. The critical factor in this procedure is the setting of the initial point, as this will affect the accuracy of vessel segmentation. Currently, setting the initial point can be done either artificially or automatically.

The earliest adaptive vascular tracking method was proposed by Liu and Sun [7] in 1993, which extracts the vasculature from X-ray angiograms. First, given an initial point and direction within a vessel, the authors apply an “extrapolation-update” scheme that involves estimating local vessel trajectories. Once a vessel fragment has been tracked, it is removed from the image. This procedure is repeated until the vascular tree has been extracted. The drawbacks of this strategy are that due to the algorithm used, the user must set the vessel starting points and that the approach does not seem adaptable to three-dimensional extraction. In 1999, an automatic vascular tracking method was developed by Can et al. [8]. This strategy mainly collects pixel wide vascular local minimum points (usually in the middle of a vessel) to perform tracking. Vlachos and Dermatas [9] suggested a multiscale line tracking method with morphological postprocessing. Yin et al. [10] proposed a retinal vascular tree extraction, based on iterative tracking and Bayesian method.

The advantage of vascular tracking is that it can provide local information about characteristics, such as the diameter/width and direction of vessels. However, the vascular tracking performance can be easily affected by crossing or branching of vessels, which reduces the identification efficiency.

2.2. Methods of Matched Filtering

Matched filtering methods employ multiple matched filters for extraction, so designing proper filters is essential to detect vessels. Since the gray-scale distribution of fundus vessels is in keeping with Gaussian, an intuitive method exists that uses the maximum response of images after filtering to find vessel points. As the diameter/width of vessels is diverse, a multiscale Gaussian filter method is often used for vessel tracking.

In 1989, Chaudhuri et al. [11] pioneered the application of Gaussian filters in vessel tracking, by using some vascular characteristics, such as the fact that vessels are darker than the background, the width of the vessels ranges from 2 to 10 pixels, and the vessels grow from the optic disc into a radial shape. Therefore, Chaudhuri et al. [11] designed two-dimensional Gaussian filters that can detect vessels in 12 different directions. However, this method needs large computation, and some of the dark lesions are similar to the characteristic of vessels, causing tracking errors. Hoover et al. [12] described an improved method that considers local and regional characteristics of vessels to separate blood vessels in retinal images and iteratively determine whether the current point is a vessel point.

After such improvement, a large number of studies of reformed filters have been developed. Jiang and Mojon [13] promoted a generalized threshold method based on a multithreshold detection. Zhang et al. [14] improved the matching filtering method by applying a local vessel cross section analysis, using local bilateral thresholding. Li et al. [15] suggested a multiscale production of the matched filter, to enhance the extraction of tiny vessels.

2.3. Methods of Morphological Processing

Morphological processing facilitates the segmentation and identification of target objects by analyzing and processing structural elements in a binary image. Thus, linear and circular elements of blood vessels can be selected, isolating the desired structure instead of the background image. In addition, morphological processing can also smooth and fill the image contour with the advantage of antinoise. However, this method overrelies on structural elements and does not make good use of characteristics of vessels.

According to vessel characteristics, Zana and Klein [16] introduced a mathematical morphology-based algorithm that allows separating the vessels from all possible undesirable patterns. Building on this approach, Ayala et al. [17] proposed using different average fuzzy sets. In Miri and Mahloojifar [18], fundus images were analyzed by the use of curvelet transform and morphological reconstruction of multistructural elements to enhance the boundaries and determine the vascular ridge. Karthika and Marimuthu [19] combined curvelet transform and morphological reconstruction of multistructural elements, with strongly connected component analysis (SCCA) to segment and identify vessels.

2.4. Methods of Deformation Models

First introduced by Kass et al. [20] in 1988, the key benefit of deformation models is the ability to produce smooth parametric curves or surfaces. Two categories of deformation models are identified: parametric deformation and geometric deformation. Parametric deformation models are also called active contour or snake models (set of points each with an associated energy). Through the external and internal forces acting on the snake, the snake model can change its shape and smoothness toward the desired structure. In 2007, Espona et al. [21] used a parametric deformation model method on fundus images and promoted an improved method with morphological segmentation. With the assistance of morphological vessel segmentation, the snake model expands to the contour of the obtained vessels until the local energy function is minimal. Another deformation model method called ribbon of twins (ROT), which combines ribbon snakes and double snakes, was proposed by Al-Diri and Hunter [22]. Each twin consists of two snakes, one inside and one outside the vessel edges. The double snake model then attempts to integrate the pairs of twins on the vessel borders into a single ribbon and calculate the vessel width.

There are several shortcomings in parametric deformation models. For instance, the segmentation results depend on the initial contour, and difficulties arise when extending from low to high dimensions and in segmenting complex objects. Geometric deformation can well solve the problems caused by parametric deformable models. Geometric deformable models are based on deformation curve evolution theory and have no strict requirement on the position of the initial contour, which increases the robustness of the method and allows it to be extended to high dimensions. Zhang et al. [23] proposed an automatic vessel segmentation method, which uses nonlinear orthogonal projection to capture the characteristics of retinal blood vessels and obtained an adaptive local thresholding algorithm for blood vessel segmentation. Zhao et al. [24] suggested a retinal vessel segmentation method that employs a region-based active contour model with a level set implementation and a region growing model.

2.5. Methods of Deformation Model

Machine learning is an algorithm that teaches computers to learn to achieve goals automatically, by building generative or discriminative models from accumulated datasets. Machine learning can be divided into supervised learning and unsupervised learning. The supervised learning methods learn to achieve goals based on ground-truth, which means that during the training stage, the training data used to train the model come with a “label” that can be used by the machine learning algorithm to differentiate the data. Applying such paradigm in the problem of DR, it means that when using supervised learning, one needs to mark all of the pixels belonging to vessels in advance, whereas the unsupervised learning method does not need to mark them beforehand.

For supervised learning, Cesar and Jelinek [30] and Leandro et al. [31] proposed a supervised classification with two-dimensional Gabor wavelet. Each pixel has a feature vector that consists of the gray-scale feature and responses of distinct sizes of two-dimensional Gabor wavelet. Ricci and Perfetti [25] proposed a segmentation method for retinal vessels based online manipulation and support vector classification. Since the features are extracted by two orthogonal vertical lines, it reduces the features and training samples in supervised learning. A supervised method using neural network was proposed by Marin et al. [26], which has one input layer, three hidden layers, and one output layer. Each pixel in the image is represented by a seven-dimensional feature vector to train the network. Shanmugam and Banu [27] used an extreme learning machine (ELM) to detect retinal vessels by creating a seven-dimensional feature vector based on gray-scale features and invariant moments and using ELM to segment vessels. In 2015, Wang et al. [28] raised a new hierarchical retinal vascular segmentation, including three steps: preprocessing, hierarchical feature extraction, and integration classification. It involves using simple linear iterative clustering (SLIC) to perform super-pixel segmentation and randomly selecting a pixel to represent the entire super-pixel, as a more easy and efficient means of extracting features.

For unsupervised learning, in 1998, Tolias and Panas [32] created an automatic and unsupervised segmentation method based on blurred fundus images, which used fuzzy C-means (FCM) to find initial candidate points. Xie and Nie [33] proposed a segmentation method based on a genetic algorithm and FCM. Salazar-Gonzalez et al. [29] used methods of vector flow to segment retinal vessels.

Table 2 is a summary about the performance comparison between different existing methods.

Table 2: Performance comparison.

3. Literature Review of Diabetic Retinopathy Detection

Although extracting vessels before detecting DR with machine learning can achieve high accuracy, it is time-consuming to create the marked ground-truth for retinal vessels. Another paradigm is to train the computer to automatically learn how to distinguish levels of DR by reading retinal images directly, without performing vessel segmentation. In 2000, Ege et al. [34] proposed an automatic analysis of DR by different statistical classifiers, including Bayesian, Mahalanobis, and k-nearest neighbor. Silberman et al. [35] introduced an automatic detection system for DR and reported an equal error rate of 87%. Karegowda et al. [36] tried to detect exudates in retinal images using back-propagation neural networks (BPN). Their features were decided by two methods: decision trees and genetic algorithms with correlation-based feature selection (GA-CFS). In their experiment, the best BPN performance showed 98.45% accuracy. Kavitha and Duraiswamy [37] did some research on automatic detection of hard and soft exudates in fundus images, using color histogram thresholding to classify exudates. Their experiments showed 99.07% accuracy, 89% sensitivity, and 99% specificity. In 2014, de la Calleja et al. [38] used local binary patterns (LBP) to extract local features and artificial neural networks, random forest (RF), and support vector machines (SVM) for detection. In using a dataset containing 71 images, their best result achieved 97.46% accuracy with RF.

4. Material and Methods

We propose an automatic DR detection algorithm, based on DCNN, fractional max-pooling [39], SVM [40], and teaching-learning-based optimization (TLBO) [41]. Specifically, we train two DCNN networks with fractional max-pooling, combining their prediction results using SVM and optimizing the SVM parameters with TLBO. The reason for training two distinct networks is that different network architectures may have their unique advantages in feature space representation. By training two DCNNs and combining their features, the prediction accuracy can be further enhanced. Another important factor impacting the recognition rate is the parameter of classifiers. We propose to optimize the SVM parameters using TLBO. We illustrate the image preprocessing methods in Section 4.1 and present the fractional max-pooling, SVM, and TLBO, in Sections 4.2, 4.3, and 4.4, respectively.

4.1. Preprocessing

Given the vessels in the original fundus images are mostly not very clear, and the size of each fundus image may differ, it is essential to preprocess images so that they have the same size and the visibility of the vessels is improved. There are three steps in preprocessing. The first is to rescale images to the same size. Since the fundus images are circular, we rescale the input images so that the diameter of the fundus images becomes 540 pixels. After rescaling, the local average color value is subtracted from the rescaled images, and another transformation is performed so that the local average is mapped to 50% gray-scale in order to remove the color divergence caused by different ophthalmoscopes. Last but not least, because boundary effects may occur in some images, we remove the periphery by clipping 10% from the border of the images. Figure 4 shows the original fundus image and the image after preprocessing.

Figure 4: Image before and after the preprocessing stage. (a, b) images before preprocessing stage; (c, d) images after preprocessing stage.
4.2. Fractional Max-Pooling

Pooling is a procedure that turns the input matrix into a smaller output matrix. The purpose is to divide the input matrix into multiple pooling regions ():

The pooling results are computed according to pooling type:

In equation (2), “Oper” refers to a particular mathematical operation. For example, if max-pooling is used, the operation will be to take the maximum of the input region. For average pooling, the average of the input region is taken. For such a network that requires tremendous learning, it is preferable to use as many hidden layers as possible. In this work, the pooling layer used in our networks is fractional max-pooling instead of general max-pooling.

Fractional max-pooling is a pooling scheme that makes the size of the output matrix equivalent to fractional times that of the input matrix after pooling, i.e., . To describe the general pooling regions, let and be two increasing integer sequences starting with one and ending with or . These two sequences are used in pooling steps, as described in Figures 5 and 6.

Figure 5: An overlapping example. The blue solid line indicates pooling region while the red dotted line shows the .
Figure 6: Different pooling types when both and equal to one. The left side shows the equations for pooling condition and computation of pooling region. Pictures on the right side indicate regions before and after pooling.

The constants, and in equation (3), stand for the overlapping length and the width of the pooling window, respectively. Figure 5 is a simple example of overlapping pooling. Figure 6 illustrates different pooling region types.

After fractional max-pooling, the pooling window size is still integers, but the global pooling size will change. Namely, fractional max-pooling does not directly change the pooling window into a fractional scale. Instead, it uses windows of variable size to achieve fractional pooling. The generation of and sequences can be random or pseudorandom. Pseudorandom sequences generate more stable pooling regions than random sequences and can also achieve higher accuracy [39].

4.3. Support Vector Machine (SVM)

SVM is a supervised learning method used for classification and regression analysis. SVM can find the hyperplane or decision boundary defined by the solution vector , which not only separates the training vectors but also works well with unseen test data. To improve its generalization ability, SVM selects decision boundaries based on maximizing margins between classes.

Figure 7 illustrates the idea. Suppose there are points in a binary dataset:where is the data label, which can be 1 or −1, indicating the class to which belongs. We need to find the optimized hyperplane, such that the distance between the hyperplane and its nearest point is maximized. A hyperplane can be written as equation (5) based on :where is the normal vector of the hyperplane, and the value of decides the margin of hyperplane from the training data point along the normal vector .

Figure 7: Example of decision boundary hyperplane with two classes of samples.

For , whose value is 1, the data must satisfy , and for , whose value is −1, has to be satisfied. Combining these two conditions, we get

The goal is to maximize according to the constrain of equation (6) in order to derive the optimized decision hyperplane for classification.

Sometimes, the training data might not be able to be perfectly separated using linear boundaries. Therefore, in the SVM formulation, we need to introduce the error metric and the cost parameter , as shown in equation (7). The goal now becomes to minimize

Subject to

The performance of SVM is influenced by two main parameters, the first one is , which is a tunable parameter in equation (7). The other one is , which is used in the radial basis function (RBF) kernel to map data into a higher dimensional space before training and classification. The RBF kernel can be defined aswhere denotes the width of the Gaussian envelope in a high-dimensional feature space.

4.4. Teaching-Learning-Based Optimization (TLBO)

TLBO, an evolution-based optimization algorithm, was proposed by Rao et al. [41], in 2011. The concept of TLBO is inspired by the evolution of the learning process when a group or a class of learners learn a target task. There are two ways of learning in groups or classes: (1) learning from the guidance of the teacher and (2) learning from other learners. The procedure of TLBO can be divided into two phases, as described below in Sections 4.4.1 and 4.4.2.

4.4.1. Teacher Phase

In the whole population, the teacher () can be considered as the best solution. Namely, learners learn from the teacher in the teacher phase. In this phase, the teacher strives for enhancing the results of other individuals () by increasing the mean result of the classroom. This can be described as adjusting to approximate . In order to maintain a stochastic nature during the optimization process, two randomly generated parameters, and , are applied in each iteration for the solution as

In equation (10), is a randomly selected number in the range of 0 and 1. Moreover, and are the new and existing solutions at iteration , respectively. in equation (11) is a teaching factor which can be either 1 or 2.

4.4.2. Learner Phase

The learners gain their knowledge by interacting with each other. Therefore, an individual learns new information if other individuals have more knowledge than him or her. In this phase, the student interacts randomly with another student () in order to enhance his or her knowledge. Equation (12) shows that if is better than (i.e., for minimization problems), is moved toward . Otherwise, it is moved away from .

If the new solution to the problem is better than the old ones, the new solution will be recorded as the best solution. After updating the status of each learner, a new iteration begins. A stop criterion, based on the iteration number or the difference of the cost function, can be set to stop the iteration properly. The flowchart of TLBO is shown in Figure 8.

Figure 8: TLBO flowchart.

5. Result

Our fundus image data is from the database provided by one of the Kaggle contests; entitled “Identify signs of diabetic retinopathy in eye images” [42]. In this database, there are about 90,000 images. We separate 1000 images from the training dataset to be the validation dataset. The detailed information of each dataset is shown in Table 3, and our two network architectures are shown in Figure 9.

Table 3: Detailed information of each dataset.
Figure 9: Architecture of the two DCNN networks that we used.

Our proposed method uses two DCNNs with fractional max-pooling layers. For every input fundus image, the two DCNN will output a vector of size 1 × 5, representing the probability distribution of the prediction for each lesion (category). The probability distribution, together with other values, forms a feature with dimensionality 24. The 24 features are described as follows:(i)DCNN probabilities of each lesion, respectively (5 features)(ii)Averages of R, G, and B channel values within 50% ∗ 50% center cropped image (3 features)(iii)Widths and heights of 50% ∗ 50% center cropped image (2 features)(iv)Overall standard deviation of the original image and 50% ∗ 50% centered cropped image, Laplacian-filtered image (2 features)(v)In total, there are 12 features for one fundus image. We then append another 12 features from the fundus image of the other eye of the same subject. Therefore, the overall length of the feature vector is 24 for one fundus image. The 24 feature vectors of dimensionality are used as input vectors of SVM

The 24-dimensional vector is used to train a multiclass SVM (five classes), whose parameters are optimized using the TLBO method. We implemented the method described in [39] and used it as the baseline. The baseline system uses similar features with a scheme of ensemble classifier (RF).

We used the validation set data to optimize the parameter set in SVM using TLBO. The upper and lower bounds of the parameter are set within [0, 100]. We ran 50 iterations with 50 students.

Our final accuracy for five-class classification task of DR is 86.17% and the accuracy for the binary class classification task is 91.05%. Labels for five-class classification are normal, NPDR level 1, NPDR level 2, NPDR level 3, and PDR while labels for binary class classification are normal and abnormal. For binary classification, its sensitivity is 0.8930 while the specificity is 0.9089. Except counting accuracy, we also do a T-test for our binary class classification. The T-test is also called the Student’s t-test. It is a statistical hypothesis test, in which the test statistic follows a Student’s t-distribution. Usually, the t-test is used to compare whether there is a significant difference between two groups of data and assists in judging the data divergence. In doing a paired samples t-test with results from binary class classification and random judgment, its outcome is 1 for the hypothesis test result, zero for the value and [0.3934, 0.4033] for the confidence interval, under null hypothesis at the 5% significant level.

The hypothesis test result is an index that tells whether two data come from the same distribution or not. If the data come from the same distribution, the value of the hypothesis test result will be close to 0. On the contrary, if the data resources are distinct, the result will be close to 1, which means there is a differentiation between the data. The value is the probability of accepting the assumption that there is a difference between two data may be wrong. The smaller the value, the more reason that there is a disparity between data.

Also, we designed an app called “Deep Retina,” providing personal examination, remote medical care, and early screening. Figure 10 shows our app interfaces. After choosing a fundus image that the user wants to check, it will send the image to our server and use our designed machine learning method. It takes about 10 s (depends on network speed) to get the result, which will be presented as the probability of each lesion. With a handheld device, individuals can do the initial examination at the district office or even at home. More importantly, it can benefit some remote areas that lack medical resources.

Figure 10: App interfaces.

6. Discussion

6.1. Accuracy Improvement

Table 4 shows the accuracy comparison when using different classifiers and parameter optimization methods on each dataset. Using the default parameters with SVM (without optimization), accuracies in both validation and test sets are higher than that of the RF [39]. If we optimize the parameters using the default parameter searching method provided in the LIBSVM software package, though it achieves very high accuracy in the five-fold cross validation experiment, the validation and test accuracies are even lower than the default one. From this result, we believe that overfitting arises when optimizing parameters in SVM.

Table 4: Accuracy comparison when using different classifiers and different parameter optimization methods on each dataset.

Table 5 shows the confusion matrix of the classification results from the two DCNN networks (before performing SVM classification). Network 1 is the architecture shown on the left side of Figure 9, and network 2 is the one on the right side. From Table 5, it shows that the lesion classifications of 0 and 2 are better than the other categories. For lesion 1, most of the prediction results are incorrect. Also, for lesions 3 and 4, the majority of the results are misclassifications that are classified into lesion 2.

Table 5: Confusion matrix of the classification results from the two DCNN networks, before performing SVM classification.

Table 6 shows the confusion matrix of the classification results using the full procedure of the proposed method (using SVM with TLBO). Table 7 displays the difference between Tables 5 and 6, which serves as a performance comparison between the two methods (using DCNN only and DCNN + SVM + TLBO). From the table, every class, except class 0 and overall accuracy, is increased in network 1. For network 2, each accuracy, except class 3, is increased. The decline in accuracy of class 3 is mainly caused by misclassification of class 2. Table 8 shows the confusion matrix of the classification results using the baseline method as reported in [39], for comparison purposes.

Table 6: Confusion matrix of the classification results using the full procedure of the proposed method (using SVM with TLBO).
Table 7: Differences between Tables 5 and 6.
Table 8: Confusion matrix of the classification using the baseline method [39] and its overall accuracy.
6.2. Deep Learning vs Traditional Classification Methods

Many traditional classification methods try to solve the problem of DR detection by (1) using image processing to capture symptoms in fundus images and then (2) building a classifier to make decisions based on the detected symptoms (1). The shortcoming of image processing methods is that the manifestations of the symptoms are random across different images; therefore, it is extremely time-consuming and requires intense efforts to label the locations of the symptoms. Abiding by the new philosophy that comes with the emergence of the deep learning technology, our proposed method is trying to learn how to make decisions directly from the image data itself. Different than the former approaches, our images only need to be labeled with lesion number instead of labeling symptom locations. Consequently, it saves considerable time during the database preprocessing stage. On top of the classification results by the two DCNN networks, we use SVM optimized by TLBO to generate an improved outcome, and we achieve 86.17% accuracy. Our result is better than the first-place winner in the Kaggle competition. It shows that our research result is the state-of-the-art.

6.3. Limitation

In our current datasets, the number of images of lesions 3 and 4 is not sufficient to train a network, which is a limitation of the proposed method. Therefore, one of our future works is to develop deeper collaborative relations with hospitals and clinics to acquire more data of lesions 3 and 4. With more data, we believe the classification accuracy will be further increased. In addition, from our result, we found that it is hard to differentiate the images between lesions 0 and 1. Therefore, when we collect new data, it is desirable to collect more images belonging to lesions 0 and 1. Also, we can attempt to use a different network architecture for this problem.

7. Conclusion

It is feasible to train a deep learning model for automatic diagnosis of DR, as long as we have enough data for statistical model training. Furthermore, the database preparation stage only needs a categorical label for each training image. It does not require detailed annotation for retinal vessel tracking in every image. Hence, it is time-efficient compared to the traditional machine learning-based method for automatic diagnosis of DR. The final accuracy can achieve 86.17% and 91.05% for five-class and binary class classifications, respectively.

The sensitivity and specificity of binary classification are 0.8930 and 0.9089, respectively, which is a satisfactory result. Furthermore, we developed an automatic inspection app that can be used in both personal examination and remote medical care. With more image data collected, we expect the accuracy can be even more enhanced, further improving our system.

Data Availability

All experiment data come from a Kaggle contest called “Identify signs of diabetic retinopathy in eye images” (website:

Conflicts of Interest

The authors declare that they have no conflicts of interest.


The authors would like to thank Mr. Chia-Ming Hu, for his help in preparing computational instruments and performing parts of the experiments in this paper.


  1. N. H. Cho, J. E. Shaw, S. Karuranga et al., “IDF Diabetes Atlas: global estimates of diabetes prevalence for 2017 and projections for 2045,” Diabetes Research and Clinical Practice, vol. 138, pp. 271–281, 2018. View at Publisher · View at Google Scholar · View at Scopus
  2. American Diabetes Association, Eye Complications, American Diabetes Association, Arlington, VA, USA, 2013,
  3. International Diabetes Foundation (IDF), Diabetes Atlas, International Diabetes Federation, Brussels, Belgium, 7th edition, 2015.
  4. C. P. Wilkinson, F. L. Ferris, R. E. Klein et al., “Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales,” Ophthalmology, vol. 110, no. 9, pp. 1677–1682, 2003. View at Publisher · View at Google Scholar · View at Scopus
  5. A. Tufail, C. Rudisill, C. Egan et al., “Automated diabetic retinopathy image assessment software: diagnostic accuracy and cost-effectiveness compared with human graders,” Ophthalmology, vol. 124, no. 3, pp. 343–351, 2017. View at Publisher · View at Google Scholar · View at Scopus
  6. A. Tufail, V. V. Kapetanakis, S. Salas-Vega et al., “An observational study to assess if automated diabetic retinopathy image assessment software can replace one or more steps of manual imaging grading and to determine their cost-effectiveness,” Health Technology Assessment, vol. 20, no. 92, pp. 1–72, 2016. View at Publisher · View at Google Scholar · View at Scopus
  7. I. Liu and Y. Sun, “Recursive tracking of vascular networks in angiograms based on the detection-deletion scheme,” IEEE Transactions on Medical Imaging, vol. 12, no. 2, pp. 334–341, 1993. View at Publisher · View at Google Scholar · View at Scopus
  8. A. Can, H. Shen, J. N. Turner, H. L. Tanenbaum, and B. Roysam, “Rapid automated tracing and feature extraction from retinal fundus images using direct exploratory algorithms,” IEEE Transactions on Information Technology in Biomedicine, vol. 3, no. 2, pp. 125–138, 1999. View at Publisher · View at Google Scholar · View at Scopus
  9. M. Vlachos and E. Dermatas, “Multi-scale retinal vessel segmentation using line tracking,” Computerized Medical Imaging and Graphics, vol. 34, no. 3, pp. 213–227, 2010. View at Publisher · View at Google Scholar · View at Scopus
  10. Y. Yin, M. Adel, and S. Bourennane, “An automatic tracking method for retinal vascular tree extraction,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, Kyoto, Japan, March 2012.
  11. S. Chaudhuri, S. Chatterjee, N. Katz, M. Nelson, and M. Goldbaum, “Detection of blood vessels in retinal images using two-dimensional matched filters,” IEEE Transactions on Medical Imaging, vol. 8, no. 3, pp. 263–269, 1989. View at Publisher · View at Google Scholar · View at Scopus
  12. A. Hoover, V. Kouznetsova, and M. Goldbaum, “Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response,” IEEE Transactions on Medical Imaging, vol. 19, no. 3, pp. 203–210, 2000. View at Publisher · View at Google Scholar · View at Scopus
  13. X. Jiang and D. Mojon, “Adaptive local thresholding by verification-based multithreshold probing with application to vessel detection in retinal images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 1, pp. 131–137, 2003. View at Publisher · View at Google Scholar · View at Scopus
  14. L. Zhang, Q. Li, J. You, and D. Zhang, “A modified matched filter with double-sided thresholding for screening proliferative diabetic retinopathy,” IEEE Transactions on Information Technology in Biomedicine, vol. 13, no. 4, pp. 528–534, 2009. View at Publisher · View at Google Scholar · View at Scopus
  15. Q. Li, J. You, and D. Zhang, “Vessel segmentation and width estimation in retinal images using multiscale production of matched filter responses,” Expert Systems with Applications, vol. 39, no. 9, pp. 7600–7610, 2012. View at Publisher · View at Google Scholar · View at Scopus
  16. F. Zana and J. C. Klein, “Segmentation of vessel-like patterns using mathematical morphology and curvature evaluation,” IEEE Transactions on Image Processing, vol. 10, no. 7, pp. 1010–1019, 2001. View at Publisher · View at Google Scholar · View at Scopus
  17. G. Ayala, T. León, and V. Zapater, “Different averages of a fuzzy set with an application to vessel segmentation,” IEEE Transactions on Fuzzy Systems, vol. 13, no. 3, pp. 384–393, 2005. View at Publisher · View at Google Scholar · View at Scopus
  18. M. S. Miri and A. Mahloojifar, “Retinal image analysis using curvelet transform and multistructure elements morphology by reconstruction,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 5, pp. 1183–1192, 2011. View at Publisher · View at Google Scholar · View at Scopus
  19. D. Karthika and A. Marimuthu, “Retinal image analysis using contourlet transform and multistructure elements morphology by reconstruction,” in Proceedings of the World Congress on Computing and Communication Technologies, IEEE, Tiruchirappalli, India, 2014.
  20. M. Kass, A. Witkin, and D. Terzopoulos, “Snake: active contour models,” International Journal of Computer Vision, vol. 1, no. 4, pp. 321–331, 1988. View at Publisher · View at Google Scholar · View at Scopus
  21. L. Espona, M. J. Carreira, M. Ortega et al., “A snake for retinal vessel segmentation,” in Proceedings of the 3rd Iberian Conference on Pattern Recognition and Image Analysis, Springer-Verlag Berlin and Heidelberg GmbH & Co., KG, Girona, Spain, June 2007.
  22. B. Al-Diri and A. Hunter, “A ribbon of twins for extracting vessel boundaries,” in Proceedings of the 3rd European Medical and Biological Engineering Conference, Prague, Czech Republic, November 2005.
  23. Y. Zhang, W. Hsu, and M. L. Lee, “Detection of retinal blood vessels based on nonlinear projections,” Journal of Signal Processing Systems, vol. 55, no. 1–3, pp. 103–112, 2009. View at Publisher · View at Google Scholar · View at Scopus
  24. Y. Q. Zhao, X. H. Wang, X. F. Wang, and F. Y. Shih, “Retinal vessels segmentation based on level set and region growing,” Pattern Recognition, vol. 47, no. 7, pp. 2437–2446, 2014. View at Publisher · View at Google Scholar · View at Scopus
  25. E. Ricci and R. Perfetti, “Retinal blood vessel segmentation using line operators and support vector classification,” IEEE Transactions on Medical Imaging, vol. 26, no. 10, pp. 1357–1365, 2007. View at Publisher · View at Google Scholar · View at Scopus
  26. D. Marin, A. Aquino, M. E. Gegúndez-Arias, and J. M. Bravo, “A new supervised method for blood vessel segmentation in retinal images by using gray-level and moment invariants-based features,” IEEE Transactions on Medical Imaging, vol. 30, no. 1, pp. 146–158, 2011. View at Publisher · View at Google Scholar · View at Scopus
  27. V. Shanmugam and R. S. D. W. Banu, “Retinal blood vessel segmentation using an extreme learning machine approach,” in Proceedings of the 2013 IEEE Point-of-Care Healthcare Technologies, IEEE, Bangalore, India, January 2013.
  28. S. Wang, Y. Yin, G. Cao, B. Wei, Y. Zheng, and G. Yang, “Hierarchical retinal blood vessel segmentation based on feature and ensemble learning,” Neurocomputing, vol. 149, pp. 708–717, 2015. View at Publisher · View at Google Scholar · View at Scopus
  29. A. Salazar-Gonzalez, D. Kaba, Y. Li, and X. Liu, “Segmentation of the blood vessels and optic disk in retinal images,” IEEE Journal of Biomedical and Health Informatics, vol. 18, no. 6, pp. 1874–1886, 2014. View at Publisher · View at Google Scholar · View at Scopus
  30. R. M. Cesar Jr. and H. F. Jelinek, “Segmentation of retinal fundus vasculature in nonmydriatic camera images using wavelets,” in Angiography and Plaque Imaging: Advanced Segmentation Techniques, J. S. Suri and S. Laxminarayan, Eds., pp. 193–224, CRC Press, Boca Raton, FL, USA, 2003. View at Google Scholar
  31. J. J. G. Leandro, J. V. B. Soares, R. M. Cesar et al., “Blood vessels segmentation in nonmydriatic images using wavelets and statistical classifiers,” in Proceedings of the XVI Brazilian Symposium on Computer Graphics and Image Processing, IEEE, São Carlos, Brazil, October 2003.
  32. Y. A. Tolias and S. M. Panas, “A fuzzy vessel tracking algorithm for retinal images based on fuzzy clustering,” IEEE Transactions on Medical Imaging, vol. 17, no. 2, pp. 263–273, 1998. View at Publisher · View at Google Scholar · View at Scopus
  33. S. Xie and H. Nie, “Retinal vascular image segmentation using genetic algorithm plus FCM clustering,” in Proceedings of the 2013 Third International Conference on Intelligent System Design and Engineering Applications, IEEE, Hong Kong, China, January 2013.
  34. B. M. Ege, O. K. Hejlesen, O. V. Larsen et al., “Screening for diabetic retinopathy using computer based image analysis and statistical classification,” Computer Methods and Programs in Biomedicine, vol. 62, no. 3, pp. 165–175, 2000. View at Publisher · View at Google Scholar · View at Scopus
  35. N. Silberman, K. Ahrlich, R. Fergus et al., “Case for automated detection of diabetic retinopathy,” in Proceedings of the AAAI Spring Symposium: Artificial Intelligence for Development, Stanford, CA, USA, March 2010.
  36. A. G. Karegowda, A. Nasiha, M. A. Jayaram, and A. S. Manjunath, “Exudates detection in retinal images using back propagation neural network,” International Journal of Computer Applications, vol. 25, no. 3, pp. 25–31, 2011. View at Publisher · View at Google Scholar
  37. S. Kavitha and K. Duraiswamy, “Automatic detection of hard and soft exudates in fundus images using color histogram thresholding,” European Journal of Scientific Research, vol. 48, pp. 493–504, 2011. View at Google Scholar
  38. J. de la Calleja, L. Tecuapetla, M. A. Medina et al., “LBP and machine learning for diabetic retinopathy detection,” in Proceedings of the 2014 International Conference on Intelligent Data Engineering and Automated Learning, Springer, Salamanca, Spain, September 2014.
  39. B. Graham, “Fractional max-pooling,” 2015, View at Google Scholar
  40. C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, p. 27, 2011. View at Publisher · View at Google Scholar · View at Scopus
  41. R. V. Rao, V. J. Savsani, and D. P. Vakharia, “Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems,” Computer-Aided Design, vol. 43, no. 3, pp. 303–315, 2011. View at Publisher · View at Google Scholar · View at Scopus
  42. Kaggle contests, Identify Signs of Diabetic Retinopathy in Eye Images, Kaggle, San Francisco, CA, USA, 2015,