Abstract

Symptoms of nutrient deficiencies in rice plants often appear on the leaves. The leaf color and shape, therefore, can be used to diagnose nutrient deficiencies in rice. Image classification is an efficient and fast approach for this diagnosis task. Deep convolutional neural networks (DCNNs) have been proven to be effective in image classification, but their use to identify nutrient deficiencies in rice has received little attention. In the present study, we explore the accuracy of different DCNNs for diagnosis of nutrient deficiencies in rice. A total of 1818 photographs of plant leaves were obtained via hydroponic experiments to cover full nutrition and 10 classes of nutrient deficiencies. The photographs were divided into training, validation, and test sets in a 3 : 1 : 1 ratio. Fine-tuning was performed to evaluate four state-of-the-art DCNNs: Inception-v3, ResNet with 50 layers, NasNet-Large, and DenseNet with 121 layers. All the DCNNs obtained validation and test accuracies of over 90%, with DenseNet121 performing best (validation accuracy = 98.62 ± 0.57%; test accuracy = 97.44 ± 0.57%). The performance of the DCNNs was validated by comparison to color feature with support vector machine and histogram of oriented gradient with support vector machine. This study demonstrates that DCNNs provide an effective approach to diagnose nutrient deficiencies in rice.

1. Introduction

Fertilizers are essential to global food production, particularly by ensuring high and stable yields of rice [1]. The best results come when specific fertilizers are applied in the needed amounts at the proper time. However, rice is often cultivated without such targeted nutrient input in China. Unscientific fertilization practices are common and, when coupled with a general delay between research findings and widespread adoption of technology, result in imbalanced nutrient application to rice fields. At present, blind fertilization still occurs frequently. As a result, ever greater amounts of fertilizers are applied to achieve only limited increases in rice yield, and the quality of the resulting rice declines [2]. This leads to smallholder farmers—who produce most of China’s rice—not realizing potentially attainable increases in income.

Diagnosis of nutrient deficiencies in rice is an integral part of scientific fertilization, because soils often fail to completely meet the nutrient demands of growing plants. Determination of the needed nutrients will facilitate the formulation of a fertilization regime to supply the target nutrients without oversupplying others. Symptoms of a nutrient deficiency in rice are manifestations of malnutrition in the crop; they can be visually inspected from the plant morphology and thereby provide enough information to diagnose the nutrient deficiency [3]. Li et al. [4] analyzed the symptoms of silicon deficiency in rice and proposed its control measures. Problems arise as agricultural production sites are extensive, and many nutrient deficiencies are widely distributed both across a farm or region and in time throughout the growing season. It is therefore difficult for agricultural experts to meet the current demand for their services, which prevents the nutrient deficiencies from being properly diagnosed and effectively corrected.

In addition to manual diagnosis, chemical and nondestructive analytical methods have been used to identify nutrient deficiencies in crops. Chemical diagnoses can be classified as either full-scale analysis or rapid tissue measurement [5]. Full-scale analysis measures the contents of nutrient elements in a crop. This diagnostic technique can determine all the nutrient elements essential to a plant’s survival and those involved in its growth [6]. The results are precise and reliable and usually provide sufficient data for diagnosis, but the technique involves vast labor costs and is confined to a laboratory. The rapid measurement of unassimilated nutrients in plant tissues is carried out by visual colorimetry, which is fast, straightforward, and generally suitable for field diagnosis [7]. Due to the extensive examination required for deeper analysis, this technique is usually applied to assess deficiencies of major elements such as N, P, and K in crops. The minimal contents of trace elements and the high precision required for their analysis prevent rapid measurement of their status.

Nondestructive diagnosis techniques include in situ measurement using a chlorophyll meter, spectral remote sensing, and computer vision. The first two of these methods are costly or complicated in data processing and are suitable only for specific elements [8, 9]. In contrast, computer vision requires only a smartphone for low-cost image acquisition and transfer, making image-based diagnosis potentially the best and most versatile solution. Conventional computer vision methods such as support vector machines (SVMs) have demonstrated their ability to detect nutrient deficiencies of five elements such as N, P, and K in the leaves of rape [10] and rice [1114]; yet, SVMs run the risk of overfitting large datasets as they are sensitive to outliers. Convolutional neural networks (CNNs) are attractive machine learning tools [15], and since the success of deep CNNs (DCNNs) in the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC 2012), they have become the preferred choice for image recognition. However, the use of DCNNs to identify nutrient deficiencies in rice has rarely been reported [16].

Inspired by the Plant Village Project [17], we explore the accuracy of diagnosing nutrient deficiencies in rice by processing image data from hydroponic experiments, with four different DCNNs as potentially practical solutions for real-time crop assessment.

Many studies have considered their use in diagnosis of crop diseases. For example, Mohanty et al. [18] trained a DCNN with deep learning using 54,306 images of healthy and infected plant leaves obtained from the Plant Village dataset to identify 14 crop species and 26 diseases. The study evaluated the feasibility of DCNNs for the diagnosis of crop diseases by using two architectures, AlexNet [19] and GoogLeNet [20], which achieved up to 99.4% accuracy. When using other labeled images taken under different conditions, the accuracy of the trained model was reduced substantially to 31.4%, which nonetheless was much higher than that based on random selection (2.6%). The deep learning model was therefore highly accurate but not robust. The Plant Village dataset based on expert knowledge might include some noise via variations in different experts’ manual labeling, with images acquired in a similar background via a regularized process. This could account for the limitation in generating a robust model. As visible symptoms are time-dependent and tissue-specific, the dataset should be as extensive as possible. Follow-up work by Mohanty et al. [18] found that increasing the variety of the images would lead to better diagnosis of crop diseases.

Brahimi et al. [21] trained AlexNet and GoogLeNet on their own dataset and obtained 99% accuracy with transfer learning when identifying nine classes of tomato diseases. Their method was demonstrated to be more accurate than SVM and random forest. Based on AlexNet, Liu et al. [22] recognized four classes of diseases on apple leaves with up to 97.6% accuracy, while Lu et al. [23] sorted rice among ten disease classes with 95.5% accuracy. Moreover, Zhang et al. [24] detected nine classes of maize leaf diseases using improved GoogLeNet with an accuracy of 98.9%. Furthermore, Amara et al. [25] used the LeNet5 architecture [26] to distinguish three classes of banana diseases, achieving accuracies of 92.9%–98.6%.

All these studies have achieved high classification accuracy of crop diseases, demonstrating the feasibility of DCNNs in image-based diagnosis. A DCNN approach involves both data and algorithm factors. Regarding data, building a truly comprehensive database is difficult. The visual characteristics of the symptoms may change with the progression of crop diseases or nutrient deficiencies, and they can also depend on environmental factors such as humidity and temperature [17]. Therefore, a large number of photographs are needed to cover the entire range of possibilities. A further consideration is that all images need to be labeled correctly, which is usually labor-intensive and error-prone [27]. Uneven sampling can skew the optimization direction of the model in the wrong direction. To solve this issue, researchers have proposed weighted cross-entropy loss [2830].

Regarding algorithms, since AlexNet [19] was the best-performed model at ILSVRC 2012, many studies have sought to improve its performance. A typical trend in the evolution of architectures is that the networks are getting deeper [31]. For example, ResNet [32] which won ILSVRC 2015 is approximately 20 times deeper than AlexNet and 8 times deeper than VGGNet [33]. In addition, GoogLeNet [20], the champion of ILSVRC 2014, proposes an inception structure to improve the utilization of network computing resources and increase the network breadth and depth for a constant amount of computation. Furthermore, DenseNet [34] improves network performance stereotypes without requiring deepening (like ResNet) or widening (like GoogLeNet) of the network structure.

From the perspective of image features, the feature reuse and substitution (bypass) settings can greatly reduce the network parameter number while alleviating the problem of gradient disappearance to some extent. Edna et al. [35] used the Plant Village dataset to fine-tune DenseNet121, which achieved a better accuracy than ResNet50, ResNet101, ResNet152, VGG16, and Inception-v4 [36]. Excluding the artificially designed architecture, Zoph et al. [37] constructed a NasNet model with two AutoML-designed modules and achieved a prediction accuracy of 82.7% on ImageNet’s validation set.

Considering rice, Lu et al. [23] applied AlexNet to identify ten classes of plant diseases in this crop. Compared with the new state-of-the-art DCNNs, AlexNet is a relatively shallow CNN model, and diagnosis of plant nutrient deficiencies is another application scenario. Based on parallelized shallow CNNs, Watchareeruetai et al. [38] achieved the overall precision of 43.02%, recall of 52.13, and F-measure of 47.14 for identifying six classes of nutrient deficiencies in black gram leaves. Since the dataset contains various deficiency periods of plant leaves, it is difficult to classify different types of nutrient deficiencies. Recently, Sethy et al. [39] applied pretrained DCNNs with an SVM classifier to identify four levels of nitrogen deficiency in rice and achieved an accuracy of 99.8%. This encourages us to explore the ability of DCNNs to classify more elements with different deficiency phases in rice.

3. Materials and Methods

3.1. Hydroponic Experiment

The dataset is the key to fitting a model with good generalization ability. Many studies have used the Plant Village dataset, which is based on expert knowledge as some of the observed phenotypes and diseases can be straightforwardly identified by visual cues. Although the symptoms of nutrient deficiencies in rice are similar universally, a challenge remains in collecting images in the field covering all the deficiency types. Here, we designed a hydroponic experiment to collect images for 10 classes of nutrient deficiencies (N, P, K, Ca, Mg, S, Fe, Mn, Zn, and Si, each denoted by a minus sign followed by its chemical symbol, e.g., –N) and contrast them with rice plants under full nutrition, making a total of 11 classes.

The experiment lasted for two years (2017–2018) and used late rice (Oryza sativa L.) “Taiyou 398,” the major local variety in Jiangxi Province, China. We designed the experiment as a single-factor test with five replications, giving a total of 55 treatments. Eleven nutrient solutions (pH 5.0–5.5) were prepared separately. The contents of full-strength Kimura B nutrient solution and 10 deficiency solutions are listed in Table 1. Each treatment was performed in a 2-L plastic bucket with a lid, and three holes were punched in the lid to hold rice seedlings. A single three-day-old seedling was planted in each hole, and the stem was wrapped with a sponge. After three days of adaptive cultivation in the laboratory, the plants were moved to a greenhouse to prevent contamination of the nutrient solution (Figure 1). During the growing period, the nutrient solution was changed every three days, and images of rice plants (especially the leaves) were collected.

Kimura B was used as a standard for full nutrition. Data in italic or left blank indicate deficient compounds. Deficiencies were created by substituting the compounds containing that nutrient element with other compounds to provide all the other nutrients except the selected deficient nutrient.

3.2. Dataset

The symptoms of nutrient deficiencies manifesting during the hydroponic experiments are recorded in Table 2. The morphological symptoms of rice plants differed among the 11 classes due to the nutrients having various physiological functions in the plant system. For example, Ca, Mg, Fe, Mn, and Zn are directly or indirectly related to chlorophyll formation and/or photosynthesis [4042], and therefore their deficiencies result in chlorosis. As P is associated with carbohydrate metabolism [43, 44], its deficiency results in carbohydrate retention in the leaves and thus purple-red coloration developing in the stems and leaves due to the formation of anthocyanins.

The symptoms of nutrient deficiencies were also influenced by the mobility of the nutrients in the plant system. Mobile elements such as N, P, K, and Mg, when deficient, tend to move quickly toward the younger parts of the plant, making symptoms always appear first in older parts such as the lower leaves. Conversely, deficiencies of less mobile elements such as Ca and Fe often manifest in the younger parts of the plant.

Mobile telephones or cameras captured images from plants leaves at different time points depending on the location and progress of the symptoms. Before photography, the deficiency symptom must be at least minimally recognizable, and the minimum recognized unit of the symptom was used as the image framing center. Photographs were taken from six different angles (Figure 2) and then cropped to show only the leaf.

After the development of typical deficiency symptoms (Table 2), we removed rice plants from the plastic bucket. The whole plants were washed with water and then deactivated at 110°C for 15 min. The materials were oven-dried at 75°C to constant weight before being crushed and analyzed for nutrient content in the laboratory. Table 3 lists the standards and test methods followed to analyze each nutrient’s content.

The images of rice leaves included complex backgrounds; they were given one of 11 class labels according to the assigned nutrient deficiency. The images were cropped to ensure that the symptoms appeared prominently and therefore reduce memory usage for better calculation performance. Because the time points of different deficiency symptoms are inconsistent, data collection cannot guarantee the balance of each class throughout the experimental period. Table 4 gives details of the 1818 images created, 60% of which were used for training, 20% for validation, and 20% for test.

3.3. State-of-the-Art DCNNs

GoogLeNet [20] introduces the “Inception” concept, which changes a full connection to a sparse connection. It increases the breadth and depth of the network and eventually allows the utilization of network computing resources. The Inception module convolves the input simultaneously with convolution kernels of different sizes, plus a pooling operation, and finally aggregates the respective results together as a total output. The convolutions are of varied sizes (1 × 1, 3 × 3, and 5 × 5) for ease of alignment. As the network goes deeper, the features become increasingly abstract, and the receptive field involved in each feature becomes larger. Therefore, as the number of layers increases, the proportion (i.e., the amount) of 3 × 3 and 5 × 5 convolutions also increases. In this case, the number of parameters and the amount of calculations remain very large. The Inception module uses a bottleneck layer comprising a 1 × 1 convolution to help reduce the computation requirements. One of the most important improvements in Inception-v3 is the factorization of n × n convolution kernels into 1 × n and n × 1 convolutions, which speeds up computations and deepens the network.

ResNet [50, 51] solves the vanishing gradient problem in deeper networks using shortcut connections where the gradient reaches earlier layers and compositions of features at varying depth can be combined to improve performance. ResNet relies on many stacked residual units that are composed of convolution and pooling layers; it therefore acts as the building block to construct the network. The structure of ResNet consists of residual blocks, each of which is concatenated by three convolutions of 1 × 1, 3 × 3, and 1 × 1 kernels. The residual blocks are then converged by average pooling and classified using the softmax function.

DenseNet [34] provides an extreme example of attempting to overcome the training difficulties in deeper networks by introducing shortcut connections. It concatenates all previous layers to form the input of each layer, connecting each layer to all previous ones. DenseNet is designed from the perspective of image features; its feature reuse and bypass settings allow DenseNet to achieve state-of-the-art performance with fewer parameters than ResNet. ResNets and DenseNets achieve similar accuracy to visual geometry group (VGG) on the ImageNet dataset at only 20% and 10%, respectively, of its computational cost [52]. DenseNet enhances the transfer of gradients, facilitates feature reuse, and reduces overfitting of small sample data. Its structure consists of dense blocks linked by transition layers. Each dense block is constructed by the following steps: batch normalization, ReLU activation, convolution operation, batch normalization, ReLU activation, convolution operation, and final concatenation with the last input. Each transition block is constructed by batch normalization, ReLU activation, convolution operation, and average pooling operation.

NasNet [37], a new search space designed by Google, uses reinforcement learning to optimize the network structure in experiments. The structure of NasNet is similar to those of ResNet and Inception, and it performs basic block stacking to generate the final network. The network structure contains two main modules, a normal cell and a reduction cell, which are stacked to form the final network. NasNet-Large has been trained on the ImageNet database, and the network has learned rich feature representations for a wide range of images, with an image input size of 331 × 331.

3.4. Fine-Tuning

Fine-tuning is a technique in which knowledge is acquired during training for use in other related tasks or areas [53]. In a DCNN, the pretrained weight is trained to identify characteristics, some of which can be used in other target jobs. Therefore, during the learning process, the last layers of the trained network can be deleted and new layers can be retrained for the target job. Application of fine-tuning learning in experiments requires some learning, but the technique remains much quicker than learning from scratch [18]. In addition, fine-tuning is more accurate than models trained from scratch [35]. Herein, we applied all the DCNNs introduced in Section 3.3 with ImageNet pretraining weight and removed the top layer. The average pooling layer for each model was then added, followed by a fully connected layer with 1024 nodes. Finally, the 11 categories were classified using the softmax function. All the added layers were initialized with random parameters.

3.5. Weighted Categorical Cross-Entropy

Unevenly distributed images will lead to bias if one class has much more samples than the others. Therefore, weighted categorical cross-entropy is introduced to solve this problem. This method works by weighting the loss function so that the training process will focus more on the samples from an underrepresented class [30]. Its implementation is available at Keras (https://keras.io). The weighting value for each class is set to the reciprocal of the number of samples for that class.

4. Results and Discussion

4.1. DCNN Experiments

The experiments were performed on a Windows10 desktop equipped with one Intel Core i9 7920X CPU with 64 GB RAM, accelerated by two GeForce GTX 1080Ti GPUs with 11 GB memory. The model implementation was powered by the Keras framework with the TensorFlow backend. Fine-tuning models with weights pretrained on ImageNet were used for model fitting. No data augmentation was used. Evaluation of the models was performed by using the accuracy metric and kappa score. Model performance was evaluated three times, expressed as the average accuracy of training and validation, and graphically depicted for each model. The Adam optimizer was used to accelerate the training process. Images were resized through Python Image Library to a specific size based on the model requirement.

Repeated experiments comparing different learning rates (0.001, 0.0001, 0.00006, 0.00003, and 0.00001) showed that the learning rate of 0.00001 was most suitable for the model to run 50 epochs for training, and no decay was used. Due to GPU constraint with NasNet-Large, batch size cannot exceed 10. Therefore, to evaluate the models in terms of time efficiency, we set the same learning rate (0.00001) and batch size (10) for all the tests. Table 5 lists the experiment details.

In addition, we compared the four DCNN models tested in this study with two traditional machine learning methods, color feature with SVM and HOG (Histogram of Oriented Gradient) with SVM [39, 54]. The color feature was read from the images directly and then trained with an SVM classifier (implementation was based on scikit-learn library); the HOG extraction process can be divided into 5 parts: detection window, normalized image, calculated gradient, statistical histogram and normalized gradient histogram, and obtained HOG feature vector. These steps are integrated with the hog.compute function of OpenCV-Python library.

4.2. Approach Analysis

This study used four state-of-the-art DCNNs to evaluate the performance of image recognition techniques in identifying nutrient deficiencies in rice. All the four DCNNs, Inception-v3, ResNet with 50 layers, NasNet-large, and DenseNet with 121 layers, were effectively fitted (Figure 3). After fine-tuning, all the models achieved average training, validation, and test accuracies of over 90% and yielded average kappa scores of over 0.90, which were much higher than those of color feature with SVM and HOG with SVM (Table 5). Except for Inception-v3, the remaining three models achieved over 95% average accuracies. Specifically, ResNet50 and NasNet-Large obtained similar prediction accuracies, while DenseNet121 required fewer parameters and attained a slightly higher accuracy than the other models. The test and validation kappa scores of DenseNet121 were over 0.97, and its time efficiency was acceptable.

From the perspective of approach process, NasNet-Large showed a slow increase in its test and validation accuracies, which might be due to the huge number of parameters in this model compared with other models. Thus, during backpropagation, the pretrained parameters of NasNet-Large were updated slowly to obtain good generalization ability. Moreover, the DCNNs were able to adequately fit the data, while the two conventional methods (i.e., color feature with SVM and HOG with SVM) seemed to give an overfitting. The possible reason is that some of the labeled images had complex background where the outline data would easily lead to overfitting in the SVM approach. In contrast, the DCNN approach can synthesize local receptive field in higher-level networks.

4.3. Qualitative Analysis

Since DenseNet121 performed the best, one run of this model was used for qualitative analysis. Three indices were calculated to gain a better understanding of the model’s prediction accuracy. The recall refers to the ratio of correctly predicted positive observations to the all observations in an actual class. The precision refers to the ratio of correctly predicted positive observations to the total predicted positive observations. The f1-score refers to the weighted average of precision and recall, which therefore considers both false positives and false negatives; f1 is usually more useful, especially for an uneven class distribution, albeit not as intuitively understandable as accuracy. Table 6 gives the good prediction results for –N, –P, –K, –Ca, –S, –Mn, –Zn, and –Si, with f1-scores of over 0.95, while performance regarding –Fe was the worst. The relative small number of samples in the –Fe case (n = 31) might account for the poor predictive performance.

Deficiency is a relative concept, and a slight deficiency might be mistaken for full nutrition. In this study, another area for misclassification was among Fe, Mn, and Zn deficiencies which all shared some common symptoms. These nutrients are directly or indirectly related to chlorophyll formation or photosynthesis, the disruption of which generally causes chlorosis [38]. Furthermore, Zn deficiency was misclassified as Si deficiency as both of them caused brown spots in rice leaves. These noticeable values can be extracted from the confusion matrix (Figure 4).

4.4. Application Analysis

DCNNs demonstrate powerful capabilities in image recognition. Open source development frameworks lower the barriers to the application, as their versatile platform portability greatly facilitates the progression from concepts and plans to results and applications. New vision-problem applications can often use the architectures of networks already published in the literature alongside open source implementation to ease development, as prior studies might have solved various detailed technical problems such as the learning rate decay schedule or the hyperparameters. Optimizing hyperparameter settings is a significant challenge owing to the enormous time cost. Automatic model design represents a good solution as increasing numbers of applications are developed and shared [55].

Although some DCNNs can rescale image size, it is necessary to pay attention to the spatial resolution conditions of the rice leaf images. If the region of the identified object is too small in the overall image, the recognition accuracy will be affected. In addition, the background image can affect the recognition accuracy, especially in cases of multiclass identification. Unlike image recognition tasks involving the identification of people or car objects, the identification of nutrient deficiencies in rice leaves involves discerning subtle differences of texture in often similar images. Recognizing nutritional deficiencies in rice should consider surface texture as an identification feature, because this study aims to identify multiple images of nutrition deficiencies in rice. At the beginning, we used the original image (resolution 2976 × 3968), but, owing to the GPU memory constraint, we resized it to 600 × 800 and used a shallow CNN for the experiment. The result had low accuracy. In a subsequent experiment, cropping the image to an individual leaf effectively improved the recognition accuracy; therefore, any application of this technique needs to combine region-based CNN for the automatic extraction of the leaf parts in the images and thereby reduce the influence of the variable image backgrounds.

Compared with other datasets that are based on expert knowledge, the symptoms manifesting in the hydroponic experiments would be more precise, because other factors such as pests and diseases are eliminated under controlled conditions [27], and the data cover the entire rice growing period. However, we did not consider combinations of more than one deficiency factor in this study, which reduced the robustness of the model. Hydroponic experiments represent a time- and resource-intense means of data collecting to build a model with generalized ability. Images alone are always insufficient for supervised classification, so any application should employ data augmentation techniques such as cropping, rotating, and flipping before running recognition processes [56]. The application side should include a label function that uploads the confirmed prediction result to augment the dataset. As similar symptoms can lead to misclassification, offering the top-three predictions would be better than providing only one. Moreover, the symptoms associated with insect-pests and diseases are similar to the visual characteristics of nutrient deficiencies [57]; thus, combined studies are needed to gain better insight into the identification of practical problems.

This study in the short term will optimize the models and datasets for diagnosis of nutrient deficiencies in rice. It can then be integrated into a mobile diagnosis system to facilitate rice production by smallholder farmers. In the long run, if a model with greater generalization ability is proposed, the permissions of the mobile users to upload results with locations would provide a vast dataset to aid digital soil mapping and fertilization assessment from a macro perspective.

5. Conclusion

Different nutrient deficiencies alter the morphological characteristics of plant leaves in rice. In this study, four DCNNs, Inception-v3, ResNet50, NasNet-Large, and DenseNet121, were used to diagnose various nutrient deficiencies in rice plants based on image recognition using a dataset collected from hydroponic experiments. All the DCNNs obtained accuracies of over 90% and outperformed two traditional machine learning methods, color feature with SVM and HOG with SVM. The best result was obtained using DenseNet121 with the validation accuracy of 98.62% and test accuracy of 97.44%. These findings demonstrate that DCNNs are promising for fully automatic classification of nutrient deficiencies in rice.

Future work should collect more outdoor images and design field experiments to make a more extensive dataset and implement the region-based CNN object detection module to extract rice leaf images for diagnosis in practice. Conducting further hydroponic experiments to build an image dataset covering multiple deficiency factors could improve the models for better diagnosis of multinutrient deficiencies.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Zhe Xu and Xi Guo contributed equally to this work.

Acknowledgments

This research was supported by the National Key R&D Program of China (no. 2017YFD0301603).