Abstract

Leaf blight spot disease, caused by bacteria and fungi, poses a considerable threat to commercial plants, manifesting as yellow to brown color spots on the leaves and potentially leading to plant mortality and reduced agricultural productivity. The susceptibility of jasmine plants to this disease emphasizes the necessity for effective detection methods. In this study, we harness the power of a deep convolutional generative adversarial network (DCGAN) to generate a dataset of jasmine plant leaf disease images. Leveraging the capabilities of DCGAN, we curate a dataset comprising 10,000 images with two distinct classes specifically designed for segmentation applications. To evaluate the effectiveness of DCGAN-based generation, we propose and assess a novel loss function. For accurate segmentation of the leaf disease, we utilize a UNet architecture with a custom backbone based on the MobileNetV4 CNN. The proposed segmentation model yields an average pixel accuracy of 0.91 and an mIoU (mean intersection over union) of 0.95. Furthermore, we explore different UNet-based segmentation approaches and evaluate the performance of various backbones to assess their effectiveness. By leveraging deep learning techniques, including DCGAN for dataset generation and the UNet framework for precise segmentation, we significantly contribute to the development of effective methods for detecting and segmenting leaf diseases in jasmine plants.

1. Introduction

In recent times, there has been a notable rise in the occurrence of plant diseases caused by microorganisms such as bacteria, fungi, and viruses [1] in plants, animals, and humans. These infections present a significant threat to plants throughout different stages of agricultural production, ultimately resulting in reduced plant yield [2, 3]. The consequences of these diseases have far-reaching implications for human dependence on agriculture, encompassing vital necessities such as food, shelter, and clothing. This is especially notable in low-income countries [4, 5]. Jasmine plants, commonly cultivated in coastal regions of Southeast Asia [6], are known to be vulnerable to a range of leaf diseases, including Alternaria leaf blight spot [7]. This disease exhibits initial signs characterized by yellow patches with dark brown stains surrounded by yellow rings [8]. As the disease progresses, the spots grow larger, spreading across a significant portion of the leaves, eventually leading to blight. Notably, concentric rings can be observed within the lesions, and the disease also affects the stem, petiole, and flowers [9]. The final two stages are critical for disease detection, with the transition from yellow to brown in leaves referred to as the “brown stage.” Subsequently, the “final stage” involves the maximum coverage of brown spots on the leaf, leading to the plant’s fatality. Identifying these crucial stages is essential, as timely action can be taken to address the issue effectively [10]. Various datasets, including cassava, tomato, cotton, and tobacco, have been utilized to report several CNN-based approaches for detecting plant leaf diseases [1116]. Nevertheless, the research on jasmine plant leaf spot disease detection remains limited. The scarcity of suitable datasets poses a significant challenge in the development of CNN-based detection algorithms capable of detecting various stages of jasmine plant leaf spot disease. In the past, several segmentation and morphological methods have been reported for grapes and other leaves [1719]. However, there is a need for a semantic segmentation method specifically designed to extract the leaf spot features of jasmine plants.

The contribution of this work is as follows:(1)This study introduces a novel leaf image augmentation strategy employing DCGAN, resulting in the generation of an expanded dataset with 10,000 synthetic jasmine plant images. Diverging from conventional methods, our approach exhibits superior scalability and image quality. Comparative analyses underscore the effectiveness of our DCGAN-based augmentation, positioning it as an advanced and impactful contribution in dataset expansion techniques.(2)Our proposed methodology for identifying the “brown stage” and “final stage” of leaf spot disease in jasmine leaves introduces an original approach using UNet-based semantic segmentation, specifically ResUNet with a custom CNN backbone. Outperforming traditional methods, our approach achieves heightened accuracy and efficiency. Comparative evaluations highlight its superiority in disease stage recognition, marking it as a significant advancement over current identification techniques.(3)This research explores various semantic segmentation techniques and pretrained CNN backbones for leaf spot identification. The proposed model, boasting an mIoU of 0.95, surpasses alternative segmentation methods, providing a more precise and reliable classification of disease stages. Comparative assessments underscore the effectiveness of our model in capturing nuanced details, establishing it as a leading solution in the field of leaf spot identification.

Section 2 discusses recent works proposed for leaf disease detection in the literature. Sections 3 and 4 introduce the proposed model and present the experimentation results. Finally, Section 5 provides the study’s conclusion.

In recent years, remarkable progress has been achieved in detecting diseases from leaf images. These approaches can be broadly classified into two main groups: traditional detection methods and deep learning-based detection methods. In addition, this section will delve into various augmentation techniques employed to expand the dataset.

2.1. Traditional Detection Methods

A leaf stage recognition system was developed by incorporating K-mean clustering approaches [20] to focus on specific areas that play a crucial role in leaf disease detection. Geetha et al. [21] proposed four preprocessing steps to reduce noise in the leaf image dataset. Furthermore, Annabel et al. [22] utilized traditional detection techniques, including the K-nearest neighbor (KNN) algorithm, to classify plant leaves based on morphological features such as color, intensity, and size. For color analysis, Narmadha and Arulvadivu [23] reported the conversion of primary leaf colors into LAB color space and employed clustering algorithms. In the work of Gupta et al. [24], an automated strategic removal of the background was performed, and the desired diseased portion was extracted for mildew disease detection from cherry leaves. In addition, Kurmi and Gangwar [25] employed color transformation for seed region identification in leaf analysis. Literature [26] describes several methods used in precision agriculture. However, achieving high classification accuracy in leaf spot detection has proven to be a challenge for most machine-learning approaches. In this context, various literature studies have explored deep learning methods for leaf morphology identification, which will be discussed in the following subsection.

2.2. Deep Learning-Based Detection Methods

The detection of tomato plant disease through deep learning-based segmentation has been previously explored in the works of Shoaib et al. [27] and Agarwal et al. [14]. Another study by Xie et al. [28] proposes a technique utilizing a fully convolutional neural network (FCN) for the segmentation of maize leaf disease. Prior studies have presented deep neural network-based classification models for plant diseases. Hridoy et al. [29] employed a deep neural network approach to identify betel leaf diseases. Kaur et al. [30] introduced a semiautomatic CNN model for soybean leaf disease classification. Haridasan et al. [31] developed a CNN-based detection model for paddy leaf diseases. Furthermore, Alsabai et al. [32] proposed a hybrid deep learning approach, incorporating improved Salp swarm optimization, for the multiclass detection of grape diseases. Shoaib et al. [33] focused on addressing the challenge of accurately identifying diseased spots amidst complex field conditions. They trained their proposed system using a dataset comprising crop leaf images with both healthy and diseased sections. The algorithm’s performance was evaluated using metrics like accuracy and intersectional union ratio (IoU) to segment lesion regions from the images precisely. In a different context, Lin et al. [34] propose a semantic segmentation model that employs convolutional neural networks (CNNs) to recognize and segment powdery mildew in individual pixel-level images of cucumber leaf. Their approach achieved a joint intersection ratio score of 79.54% and a dice accuracy of 81.5% based on 20 test images. Finally, Soliman et al. [35] presented work that proposed employing deep learning techniques to detect plant lesions by extracting hidden patterns from plant leaf disease. Despite the availability of plant disease datasets such as the PlantVillage dataset [36], the AgriVision collection [37], and the Plant Disease Identification dataset, implementing CNN-based detection algorithms requires large datasets. Past research such as Kumar et al. [38], Sladojevic et al. [39], and He et al. [40] examined the significant consequences of crop diseases on food security and economic losses in India’s agriculture-reliant rural regions. It underscored the requirement for innovative computer vision methods to autonomously identify and categorize these diseases, with studies showing diverse approaches and notable successes, especially in deep learning-based techniques. The following subsection will discuss various augmentation techniques used to increase the plant disease dataset.

2.3. Various Augmentation Techniques

Rotate, flip, shift, and scale techniques were employed to augment the leaf dataset [41, 42]. In addition, a combination of rotation and shift was explored to increase the dataset further [43]. By utilizing GAN-based augmentation, the dataset’s enlargement resulted in a 20% increase in classification accuracy [44]. Another study employing a detection framework saw an improvement of 7.4% in classification accuracy [45]. Data augmentation is of utmost importance in efficiently enhancing the dataset for detection and classification approaches. A novel augmentation method will be detailed in the next section.

3. Methodology

This research focuses on enhancing the detection of disease spots on Jasmine leaves, particularly brown-stage and final-stage spots that are challenging to identify accurately. To overcome limited data, a GAN-based augmentation model is employed to expand the leaf dataset used for segmentation. The study explores the effectiveness of UNet, WUNet, U2Net, and ResUNet architectures in this context while also investigating different segmentation backbones to optimize the detection performance, as shown in Figure 1.

3.1. Dataset

In this study, Figure 2 presents image samples depicting different stages of diseased leaves. The dataset for these images was collaboratively developed with experts from Krishi Vigyan Kendra, Karnataka, India, who utilized digital cameras to capture a total of 1000 images. These images cover four stages of Alternaria leaf blight spot disease, including 450 images for the brown stage, as illustrated in Figure 2(a), where the blight spot covers a quarter of the leaf, and 550 images for the final stage, as depicted in Figure 2(b), which covers a larger area of the leaf with blight spots. To enhance the dataset, generative advisory network-based augmentation techniques were employed. It is worth mentioning that the early stage of leaf spot disease was not considered in this study. Instead, the focus was on the later stages of brown stage and final stage, which are crucial for understanding disease progression. Further details regarding the application of augmentation techniques can be found in the subsequent section.

3.2. Data Augmentation Using DCGAN

Ian Goodfellow and his colleagues pioneered the creation of DCGAN (deep convolutional generative adversarial network) in 2015 [46]. The DCGAN’s conditional input allows the generator to produce synthetic samples based on specified conditions. Convolutional neural networks (CNNs) are widely adopted in GANs, particularly for image processing, delivering remarkable results in various computer vision tasks. The generator takes a compressed representation of the training image set, consisting of 1000 images, and generates new images with a resolution of 256 × 256 pixels in RGB format. A 100-dimensional vector with random values between 0 and 1 augments the input image generation process. To achieve the desired resolution for generated images, the generator incorporates convolutional transpose layers, while the discriminator relies on two convolutional layers with 256 neurons each and LeakyReLU activation. The training process utilizes the SGD optimizer and focuses on minimizing the Ladv loss. The aim is to prevent the discriminator from accurately distinguishing fake images. During training, the GAN model aims for a Frechet inception distance (FID) score below 15 as a performance measure. Training involves 200 epochs with a batch size of 32, and the similarity between generated and template images is evaluated using the structural similarity index (SSIM) and signal-to-noise ratio (SNR) metrics. The overall methodology is illustrated in Figure 3.

3.3. Proposed Segmentation Model for Jasmine Plant Leaf Disease Detection

Segmentation of images is a crucial aspect of computer vision, wherein an image is divided into different regions and assigned specific class labels to create a map that provides information about each pixel of the image. A custom backbone based on MobileNetV4 is integrated into the UNet-based architectures to detect critical types of leaf spot diseases in jasmine plants. Integrating MobileNetV4 into various UNet frameworks, including UNet, WUNet, U2Net, and ResUNet, involves utilizing it as the encoder component. It replaces conventional convolutional layers, seamlessly integrating its efficient multiscale feature extraction capabilities. This reduces model parameters significantly compared to traditional architectures while preserving the decoder’s precision, thereby reducing computational load. Our choice of these semantic segmentation models was driven by specific strengths: UNet’s efficiency in preserving structural elements, U2Net’s lightweight design for real-time segmentation without compromising precision, WUNet’s adaptability to resource constraints, and ResUNet’s balance between accuracy and efficiency. We conducted experiments to determine the optimal model for jasmine leaf disease detection. MobileNetV4 is a big architecture. Its implementation at the encoder part is shown in Figure 4 and detail information is provided in Table 1. The proposed CNN network uses a novel computation technique called depthwise separable convolution, which bears similarities to traditional convolution but involves a two-stage calculation process. Unlike the conventional approach, where a single convolutional calculation is performed per layer, depthwise separable convolution divides the process into two phases. The first stage encompasses a separate convolution operation with a 3 × 3 kernel for each input channel, followed by batch normalization and activation. This phase is referred to as depthwise convolution. The second stage involves further processing the output channels from the depthwise convolution using a 1 × 1 pointwise convolution. This pointwise convolution is applied across all depthwise convolution output channels. Overall, depthwise separable convolution significantly enhances computational efficiency by reducing the computational load. Table 1 presents a comprehensive description, providing details about convolution layers 1 and 2. For clarity, in this study, we denote the depthwise convolution layer as “conv_dw” and the pointwise convolution layer as “conv_pw.” This process is repeated for layers 3 to 6, and the final convolutional layer is identified as “layer 7.” Notably, Table 1 showcases the parameter reduction at each sequential layer, highlighting the achieved computational efficiency.

UNet is an encoder-decoder model comprising two distinct networks, namely, the contraction network and the expansion network. The contraction network, referred to as the encoder, is responsible for extracting pertinent features from the leaf image [47]. On the other hand, the expansion network, known as the decoder, reconstructs the segmentation map using the encoded features obtained from the encoder [48]. The earlier proposed UNet model is designed with four blocks to extract spatial features from the image. Each block consists of two convolution layers with ReLU activation and a max-pooling layer, downsampling the input by a factor of 2 [49]. The proposed UNet model extends beyond the four blocks and includes additional convolution layers activated by the leaky ReLU activation function. These enhancements, along with the custom backbone, contribute to capturing low-level features essential for the leaf spot disease model. The overall network architecture is shown in Figure 5.

3.3.1. Comparison of Different UNet-Based Segmentation Approaches

In this research, we investigate and compare several UNet-based segmentation architectures, each offering distinctive design characteristics and advantages for leaf spot detection tasks. The UNet architecture features a symmetric encoder-decoder design, skillfully utilizing skip connections to concatenate feature maps from the encoder with corresponding decoder layers. This approach effectively preserves high-resolution information during the decoding process. WUNet, an extension of UNet, is commonly referred to as wide UNet [50]. It enhances the architecture by widening the convolutional layers with an increased number of channels. This design choice significantly improves the model’s capture of contextual information, potentially leading to enhanced segmentation performance. U2Net, a recent and specialized architecture, is purposefully tailored for salient object detection. Inspired by UNet, U2Net incorporates several improvements, including additional branches and attention mechanisms. These attention modules are crucial in highlighting salient features, rendering U2Net highly suitable for tasks requiring precise boundary detection. On the other hand, ResUNet, also known as residual UNet, is a variant of UNet that integrates residual connections derived from the ResNet architecture. By leveraging these residual connections, the model efficiently facilitates gradient flow during training, enabling the effective training of deeper architectures. This capability makes ResUNet [51] particularly well-suited for handling more complex segmentation tasks. Through a comprehensive evaluation and comparison of these U-Net-based models, we aim to gain valuable insights into their individual performance, strengths, and suitability for a diverse range of semantic segmentation challenges.

3.3.2. Assessing the Different Backbone Architectures for Segmentation Models

To assess the semantic functionality, all the models are trained with various pretrained networks, such as ResNet, EfficientNet, VGG16, and VGG19, as backbones. In addition, a custom backbone is utilized for the evaluation. The backbone models are employed in the encoder part of the various semantic segmentation models such as UNet, WUNet, U2Net, and ResUNet. Initially, a semantic segmentation model with baseline backbones is assessed. Subsequently, one-by-one, pretrained models and custom backbone CNN networks were used to assess the model. For this study, UNet with skip connection is employed as the segmentation model.

3.3.3. Steps Used for Leaf Spot Disease Detection Using the Custom Backbone UNet Framework
(1)Prepare the leaf dataset using DCGAN augmentation(2)Train the chosen segmentation model with input images and corresponding masks, considering performance metrics like mIoU, Dice, and pixel accuracy calculated using equations (1)–(4)(3)Train and fine-tune the segmentation model parameters.(4)Iteratively train the model until achieving a satisfactory training and validation accuracy curve, otherwise, repeat Step 3(5)Deploy the model for testing on a real image test set(6)Output the segmentation results to identify the brown stage and final stage of the leaf

The overall flowchart of the leaf spot disease detection is illustrated in Figure 4.

3.4. Training Details

The segmentation task involves using an augmentation model to generate a total of 5000 images for each type. During the training process, the loss function of the DCGAN is fine-tuned. The GAN model is trained for 100 epochs, with 5000 iterations in total. To initiate the process, the initial value of the loss function’s is set at 0.1 [52]. As iterative training progresses, the lambda value is updated to 0.01. Following the iterative training process, these augmented images, along with their corresponding masks, are passed to the segmentation block. The study employs various models, including UNet, WUNet, U2Net, and ResUNet, all utilizing a 3 × 3 kernel. Each model undergoes 300 training epochs.

3.5. Hyperparameter Tuning of Segmentation Models

The UNet, WUNet, U2Net, and ResUNet models were trained using various backbones, each with a batch size of 32, over 300 epochs. Here, the batch size determines how many samples are processed before updating the model’s parameters, while epochs represent the number of complete passes over the training data. The learning rate, a critical hyperparameter, was set to 0.0001 to balance learning speed and convergence. The Adam optimization method was used for model compilation, activating all convolutional layers with the ReLU activation function using a 3 × 3 kernel. An early-stop mechanism based on validation performance was implemented during training to prevent overfitting. In our proposed segmentation model with a custom backbone, the Adam optimizer was utilized with a learning rate of 0.001 and a batch size of 32. The model underwent training for 100 epochs with the ReLU activation function, following an iterative process to determine the optimal parameter settings.

3.6. Evaluation Metrics

To assess the efficacy of the DCGAN augmentation method, we analyze the similarity between the synthesized images and the template images. This evaluation utilizes well-established similarity metrics, including the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) [53]. These metrics offer valuable insights into the degree of resemblance between the generated images and the target images, enabling a thorough evaluation of DCGAN augmentation model performance. The segmentation tasks are evaluated based on the calculated metrics, which are determined by the following equations:

4. Result and Discussion

The FID score serves as a widely adopted metric for assessing the fidelity of generated images in relation to real images from a given dataset. In this context, the objective is to train the DCGAN in a manner that ensures the FID score remains stable and lies within the specified range of 13 to 15, as depicted in Figure 6. Sustaining the FID score within this designated range signifies that the generated images closely mirror the characteristics of the real images in the dataset, showcasing a notable level of visual quality and diversity. This consistency in the FID score reflects the success of the training process in achieving realistic and diverse image generation.

In this study, we shared the outcomes of our assessment of image generation models. The images generated during the brown stage garnered an SSIM score of and a PSNR score of . In contrast, images produced during the final stage achieved an SSIM score of and a PSNR score of . The SSIM score serves as an indicator of the structural similarity between the generated images and real images, while the PSNR score reflects the level of fidelity and noise present in the generated images. Higher SSIM scores suggest a closer resemblance to real images, whereas higher PSNR scores indicate enhanced image quality. Upon comparing both sets of generated images, it was observed that final-stage images attained superior scores in both SSIM and PSNR metrics. This implies that they exhibit better similarity and higher quality when compared to real images. These findings offer valuable insights into the performance of our image generation models and underscore the superior capabilities of the final stage in producing high-quality images. Figure 7 visually presents brown-stage-generated images obtained using DCGAN, showcasing the visual aspects of our research outcomes.

In Figure 8, the final-stage-generated images produced by DCGAN are displayed. The analysis reveals that the brown-stage-generated images outperform the final-stage images both qualitatively and quantitatively.

Figure 9 shows the segmentation results of four models: UNet, WUNet, U2Net, and ResNet, each equipped with a custom backbone. The comparison indicates that the UNet model with the custom backbone outperforms the WUNet, U2Net, and ResUNet models with the same custom backbone regarding segmentation performance.

In Figure 9, we present the training accuracy of several segmentation models, each integrating unique pretrained CNN networks in conjunction with our novel custom backbone. Notably, the UNet segmentation with the custom backbone emerges as particularly effective for detecting leaf spot diseases. The training accuracies depicted in Figures 10(b) and 10(c) display variations throughout the epochs. Concurrently, Figure 10(a) illustrates the performance of ResUNet, which exhibits similar fluctuations. However, a comparative analysis in Figure 10(d), representing the proposed UNet with the custom backbone, reveals that the latter demonstrates superior performance. This suggests that our innovative custom backbone enhances the UNet segmentation model’s efficacy in comparison to other configurations, underscoring its potential for accurate and robust leaf spot disease detection. Further details and insights into these results will be discussed in the subsequent sections.

In Table 2, performance metrics, namely, mean of IOU referred as mIoU and Dice coefficient referred as Dice, are shown for the two-stage leaf disease classification employing various backbone CNN networks. The results demonstrate that the proposed custom backbone combined with the UNet semantic segmentation yields superior outcomes. This innovative framework successfully extracts the low-level features for leaf spot disease detection, enhancing the accuracy of the classification process.

Table 3 presents the outcomes of the evaluation conducted to determine the most suitable backbone for the ResUNet model, focusing on the overall segmentation process and considering various performance metrics. In this analysis, ResUNet exhibited robust performance, achieving an impressive mIoU of 0.91, a Dice coefficient of 0.96, and a pixel accuracy of 0.95. These metrics collectively gauge the model’s efficacy in accurately segmenting images. Moving on to the evaluation of ResUNet with MobileNetV4 as its backbone, as illustrated in Table 4, the segmentation model demonstrated exceptional performance. MobileNetV4 outperformed other backbones, securing the highest scores across all evaluated metrics: an mIoU of 0.91, a Dice coefficient of 0.96, and a pixel accuracy of 0.95. These results underscore the notable enhancement in segmentation capabilities achieved by coupling ResUNet with MobileNetV4. Contrastingly, when EfficientNet served as the backbone, there was a slight decline in performance, reflected in an mIoU of 0.82, a Dice coefficient of 0.72, and a pixel accuracy of 0.73. Similarly, ResNet, as a backbone, exhibited the lowest performance among the configurations assessed, with an mIoU of 0.72, a Dice coefficient of 0.69, and a pixel accuracy of 0.71. These nuanced findings underscore the importance of carefully selecting a compatible backbone for the ResUNet segmentation model. MobileNetV4 emerges as the optimal choice, demonstrating superior segmentation accuracy across multiple performance metrics. This detailed analysis provides comprehensive insights into the comparative performance of different backbones, facilitating informed decisions about model architecture.

Figure 11 offers a visual depiction of pixel accuracy metrics derived from the comprehensive evaluation of four distinct models: UNet, WUNet, U2Net, and ResUNet. It is noteworthy that each of these models is configured with its unique backbone architecture. Notably, our proposed UNet for semantic segmentation stands out for its remarkable performance, a feat amplified by the integration of a custom backbone. In the specific case of the custom backbone working in tandem with ResUNet, the results are particularly impressive, achieving the highest pixel accuracy recorded at an exceptional 0.98. This underscores the effectiveness of the custom backbone in enhancing the segmentation capabilities of ResUNet. To delve deeper into the comparative analysis of pixel accuracy metrics among these models, UNet demonstrated a pixel accuracy of 0.90, WUNet recorded a pixel accuracy of 0.85, and U2Net yielded a pixel accuracy of 0.87. These individual outcomes emphasize the superior performance of our proposed ResUNet with a custom backbone, especially when contrasted with other segmentation models explored in this study that employed diverse backbone configurations. This detailed examination of pixel accuracy metrics not only highlights the exemplary performance of the proposed ResUNet but also provides valuable insights into the relative strengths of each model. The integration of a custom backbone, particularly in conjunction with ResUNet, emerges as a pivotal factor in achieving outstanding pixel accuracy.

Figure 12 provides a detailed view of the confusion matrix associated with four distinct models: UNet, WUNet, U2Net, and ResUNet. Each of these models utilizes diverse backbones to predict both the brown and final stages of leaf disease. A standout observation is the remarkable performance achieved by our proposed backbone in conjunction with ResUNet, resulting in an impressive prediction accuracy of 95%.

This outstanding accuracy underscores the efficacy of our proposed backbone when integrated with ResUNet, showcasing its capability to accurately predict both brown and final stages of leaf disease. The synergy between the custom backbone and ResUNet evidently contributes to superior predictive outcomes.

In conclusion, the results presented in Figure 12 affirm the excellence of our proposed approach. 95% prediction accuracy demonstrates the practical success of our model in effectively handling the complexity of leaf disease prediction. This achievement not only highlights the advancements made in the field but also serves as a testament to the potential impact of innovative backbone configurations in enhancing the overall performance of segmentation models. The combination of a well-designed backbone with ResUNet stands out as a key factor in achieving this commendable accuracy.

5. Conclusion

In conclusion, this paper introduces a groundbreaking segmentation approach for effectively detecting leaf spot disease. The study employs various baseline models (UNet, WUNet, U2Net, and ResUNet), each integrated with distinct pretrained CNN network backbones in the encoder path, leading to significant improvements in segmentation efficiency. One of the key contributions of this research is the proposal of a custom backbone specifically tailored for UNet segmentation, which demonstrated exceptional accuracy in precisely delineating spots associated with both brown-stage and final-stage leaf spot diseases. In addition, the study explores the efficacy of DCGAN-based augmentation, a semantic and efficient process that successfully generates 10,000 images (5,000 images for each type). This augmentation technique significantly enriches the dataset, resulting in notable performance enhancements for the segmentation models. Specifically, our proposed DCGAN augmentation achieved an impressive SSIM score of and a PSNR score of . The proposed approach exhibits remarkable potential in advancing leaf spot disease detection and has practical implications for agricultural research and applications. The study’s promising results underscore the importance of employing efficient segmentation techniques and augmentations to elevate the accuracy and reliability of disease classification processes. Furthermore, the integration of the custom backbone has proven to be particularly beneficial, enabling the detection model to capture low-level features of brown spots of varying sizes with an impressive mIoU of 0.95. This customized backbone can be implemented in lightweight networks suitable for mobile-based applications. Despite the heightened computational complexity and extended training time associated with the larger and more diverse dataset, our deployed segmentation model exhibited improved efficiency. Initially, segmentation results were suboptimal, with mIoU and Dice scores falling below the 0.5 range. However, substantial enhancements were observed post-augmentation, with reported mIoU reaching 0.91 and Dice reaching 0.96, underscoring the effectiveness of DCGAN augmentation in refining segmentation accuracy and consistency. In addition, our proposed segmentation model effectively handled the larger and more complex dataset, achieving a pixel accuracy of 0.96 and achieving efficient segmentation. It is worth noting that the augmentation process was introduced to address data-related efficiency challenges, yet our segmentation model proved capable of managing these complexities adeptly. Overall, the findings of this research open up new avenues for more effective leaf spot disease detection, offering valuable insights into the application of efficient segmentation methods and augmentations in the field of agriculture. With continued development and implementation, the proposed approach has the potential to make a significant impact on crop disease management and contribute to the advancement of agricultural practices.

Data Availability

The datasets generated during and/or analyzed during the current study can be publicly available and accessible.

Ethical Approval

This article contains no studies with human participants or animals performed by any of the authors.

Conflicts of Interest

All the authors declare that they have no conflicts of interest.

Acknowledgments

Open-access publication funding will be provided by the Manipal Academy of Higher Education, Manipal.