Abstract
Mixed defects have become increasingly popular in defect detection and one of the hottest research areas in wafer maps. Postprocessing methods used to solve the overlapping problem in mass mixed defects have a poor detection speed, which is insufficient for rapid defect detection. In this paper, the fastsoft nonmaximum suppression (fsNMS) method is proposed to solve this problem. The score of the detection box is updated by optimizing the penalty distribution function. Further, this paper analyzes the performance of the fsNMS method in wafer defect detection. As a penalty, the logistic function is used, and experiments are conducted using singlestage and twostage detectors. The final results show that, compared to the softNMS, the efficiency for the singlestage and twostage detectors is increased on average by 9.63% and 21.72%, respectively.
1. Introduction
Defect detection is an important application of object detection that has received a lot of attention. For the semiconductor industry, wafer map defect detection has become a major defect detection problem. The semiconductor manufacturing process involves tens of complex steps, which can lead to defects due to numerous reasons [1, 2]. Visualizing and identifying defect patterns is essential for preventing defect generation. Defect pattern recognition (DPR) provides engineers with a reference for dealing with manufacturingrelated problems by identifying wafer surface defects [3]. Currently, with a gradual reduction in wafer size and an increase in the complexity of production processes, the number of mixed complex defects (which combine multiple basic defects) has been increasing. When mixed defects are generated, defect detection becomes more complicated, especially when testing tens of millions of wafer maps in industrial production, which requires a high level of accuracy and speed, both online and offline.
Numerous approaches have been proposed in the literature to tackle the problem of hybrid wafer defect detection in recent years, ranging from manual feature recognition to deep learning networks for automatic feature recognition. Deep convolutional neural networks have demonstrated a very good performance in computer vision [4, 5]. When applied to the field of industrial image detection, deep learningbased object detection methods have proven to be very beneficial since engineers do not have to develop specific defect models and the datadriven approach does not require domainspecific prior knowledge.
Among deep learningbased detectors, twostage detectors (e.g., RCNN [6] series) improve accuracy while their efficiency is lacking, and singlestage detectors (e.g., YOLO [7], SSD [8], and RetinaNet [9]) redesign the overall network structure but partly lose precision. In the stage of regression of the candidate box, these approaches are subject to postprocessing. The main purpose of postprocessing is to remove redundant candidate boxes. The extracted candidate boxes will produce cluttered detections during the refinement of localization, such as multiple extracted candidate boxes will be regressed to the same region of interest (RoI) in the postprocessing stage. The detector uses a greedy nonmaximum suppression (greedy NMS) algorithm to reduce the number of falsepositive boxes. Greedy NMS was presented by Dalal and Triggs [10], and a bounding box with maximum detection fraction is selected and suppresses its neighboring boxes using a predefined intersection over union (IoU) threshold. In detecting complex mixed wafer defects based on deep convolutional neural networks, greedy NMS drastically reduces the screening of falsepositive boxes in the postprocessing stage, but mixed defects are still difficult to detect. This causes the detector lose mass positive boxes at a certain threshold, while causes a decrease in the average precision. During mixed defect detection, softNMS [11] can eliminate more falsepositive boxes, increasing accuracy enormously. In industrial detection, however, detecting thousands of wafer defects is very inefficient and often insufficient.
To solve this problem, this paper proposes an improved fastsoft nonmaximum suppression (fsNMS) postprocessing algorithm to improve detection efficiency by optimizing the distribution of penalty terms in softNMS [11], so as to better apply to large quantities of industrial production. Experiments are performed on some baseline detectors. The results show the effectiveness of object detection in wafer map detection and the efficiency and precision of the postprocessing stage after replacing fsNMS. It is concluded that our approach is effective for both singlestage and twostage detectors.
2. Related Work
This section mainly introduces the shortcomings of wafer map detection based on deep learning and general object detection algorithm (Section 2.1) and then expounds the problems existing in the traditional NMS and its improved algorithm (Section 2.2).
2.1. Wafer Map Detection and General Object Detection
Wafer map detection. Recently, many studies have attempted to classify wafer maps based on convolutional neural networks (CNNs). Nakazawa and Kulkarni [12] proposed a CNN method for wafer map pattern classification and image retrieval and studied the classification of 22 types of mixed defects. Mixed defects have a large degree of mutual occlusion, and the average recognition accuracy only reached 91%. However, the accuracy used in defect detection is far from adequate. Kyeong and Kim [1] applied CNNs to classify mixed defect patterns of wafer maps and established a separate model for each single defect pattern (whether there is a corresponding model when multiple defect patterns are mixed on a wafer), which contains 16 defect types. On the test set of mixed defects, the detection efficiency of each wafer map is 0.13 s and the accuracy is 98%. However, for the defect detection of a large number of wafer maps, Kyeong’s method improves accuracy and reduces detection efficiency.
General object detection. In recent years, object detection is popularized by both twostage and singlestage detectors. Twostage detectors divide a detection task into two phases, namely, the extraction RoI phase and the classification and regression phase for RoIs. RCNN [8] used a selective search method [13] to locate RoIs in the input image and then a classifier to classify them. SPP Net [14], Fast RCNN [15], and Faster RCNN [16] are gradually developed. With the emergence of the region proposal network (RPN) [16], the efficiency of the detector has been greatly improved, and the detector can be trained endtoend. The anchorbased approach is widely used in object detection, and the proposed RCNN is a milestone. Since then, FPN [17] combined ResNet [18] and ResNeXt [19], which is essential for small object detection, and the performance of small object recognition has been greatly improved, the detection efficiency can reach 5 fps under a single GPU. RFCN [20] replaces the fullconnection layer with a positionsensitive fraction graph, doubling the detection efficiency compared with [16]. Cascade RCNN [21] explored the cascade architecture of RCNN and extended it to multistage detectors, which train a series of detectors with increasing IoU thresholds to tackle the problem of overfitting in training and quality mismatch in inference. However, such cascade detectors generate more parameters, resulting in a decrease in detection efficiency. Mask RCNN [22] added the mask branch based on [16], refined the detection results using multitask learning, and predicted its mask while detecting the bounding box, so that its detection efficiency can still reach 5 fps with a single GPU.
On the other hand, singlestage detectors (such as YOLO [7, 23] and SSD [8]) reduce the stage of RoI extraction and directly predict the bounding box and classification probability with the deep convolutional neural network, which is simpler and faster than the twostage detector. After the introduction of focal loss [10], its precision is improved. At the same time, it is aimed at solving the problem of a serious imbalance between positive and negative samples, but the overall network detection efficiency of RetinaNet is far inferior to that of the YOLO series and SSD.
2.2. Nonmaximum Suppression
NMS is widely used in computer vision postprocessing algorithms. In the general object detection methods (Section 2.1), manual processing and greedy NMS are still used as postprocessing methods. Recently, softNMS [11] proposed an improved NMS, which reduces the score of the adjacent candidate box by adding a penalty rather than discarding the candidate box whose score is lower than the threshold. The algorithm is satisfactory in improving AP, but there are still candidates with high overlap false positives, and the algorithm efficiency is insufficient. Learning NMS [24] designed a complex deep neural network, which requires only box and score as input to implement NMS. Fitness NMS [25] proposed the regression loss of the object box matching IoU maximization, which is combined with [11] to improve precision, and the loss converges well. Adaptive NMS [26] considered the relationship between sparse and dense objects in crowd detection. Increasing the NMS threshold to retain neighboring detection boxes with high overlap based on [11] is an effective solution for crowded scenes, and a module for density prediction is designed for learning density scores. KL loss [27] presented a bounding box regression loss for learning the difference between transformation and location of bounding boxes, estimated the confidence of localization as well as the location on the baseline, and predicted its complex probability distribution to guide the NMS to retain more accurate localized bounding boxes.
The above postprocessing methods are effective means in general object detection, but as the complexity of the parameters or network structure increases, the inference efficiency will instead reduce. At present, most networks still use greedy NMS as the postprocessing method, which requires a fast postprocessing algorithm to solve the efficiency problem and ensure that accuracy is not lost.
3. Proposed FastSoft Nonmaximum Suppression Algorithm
In this section, the proposed wafer map postprocessing algorithm is presented in detail. For the problem faced, the problems of softNMS are first analyzed (Section 3.1), then improvement ideas and methods are elaborated (Section 3.2), and finally, the training and inference processes are introduced (Section 3.3).
3.1. Problems with SoftNMS
In the wafer map detection task, the postprocessing stage is essential, and the detection effect is unsatisfactory because the greedy NMS pruning branch is very strict. As shown in Figure 1, when the object overlaps, the score of Scratch will be insufficient. Although some of the extracted detection boxes cover the parts that are not covered by the highestscoring box, they can still extract the object and the scores of the extracted detection boxes are very low. Then, some positive samples of Scratch will be filtered by the greedy NMS threshold.
(a)
(b)
To solve this problem, the softNMS [11] algorithm presents a rescoring formula as shown below [26].
The penalty is added to the score in greedy NMS when the IoU is greater than the threshold . The score of other detection boxes with high overlap with the highest scoring box needs to be reduced, which is a promising way to improve greedy NMS, and the scores of detection boxes with higher overlap with M should be decayed more because they have higher false alarm rates. In softNMS, a linear attenuation term (2) and a Gaussian attenuation term (3) are designed. Since the IoU is not a continuous value, the linear attenuation term generates an abrupt penalty, while the Gaussian attenuation term adds redundant parameters, and the algorithm time complexity reaches . Wafer map detection that produces dense stacking will be very unfriendly to the detector, especially that it will cause the false detection rate to be very high and the detection rate of positive samples does not meet certain requirements, which will have a greater impact on industrial production. SoftNMS effectively solves the problem of dense overlap in wafer maps. However, when it is applied to industrial production, the efficiency of softNMS in the test phase may be far more enough. The computational complexity increases with the increase of parameters, resulting in the inefficiency of processing a large number of samples. For this problem, the following requirements are imposed on the postprocessing process: (1)The number of wafer map defects in industrial inspection is too large, and the detection speed is improved under the premise of ensuring the detection precision(2)The sample elimination process cannot affect the distribution of positive and negative samples, where a noncontinuous penalty will lead to sudden changes in the ranking queue, to ensure that the penalty imposed on the score needs to be a continuous penalty value(3)For the imposed penalty, when the overlap between the highest score extraction box within a range and other boxes is high, the penalty needs to be increased and vice versa
3.2. Improved fsNMS
According to the above analysis, the logistic function is used to solve this problem, as in Equation (4). where in Equation (1), is set here, , from this transformation.
When the IoU score of the candidate box falls into a certain threshold range, this distribution belongs to an exponential distribution, which is a generalized linear model, such as the Bernoulli distribution and the Poisson distribution. The logistic function based on the generalized model is used as a penalty term. The logistic function is a continuous function with a range of 0 to 1. This property of the logistic function ensures that the probability estimated by the logistic model will never be greater than 1 or less than 0, which can be used as a penalty function. It is worth noting that, as stated in Section 3.1, IoU is not a continuous function, but rather a nonlinear function composed of explanatory variables . If it is nonlinear, then imposing a sudden penalty can lead to a change in the ranking list, which can be transformed into a linear function. It would be transformed into a linear function, the resultant variable and independent variable would be transformed into a linear relationship, and the penalty value would become a continuous value, which would not affect the rankings. If the highest extraction box in each candidate range has a high degree of overlap with in this range, the penalty needs to increase gradually, and the penalty term is close to 0.5, . The highest score box in the candidate range has a low rate of overlapping with in this range. On the contrary, the penalty should be reduced and the penalty tends to be 0. As a result, the positive samples with low scores and difficulty to detect cannot be easily removed from the ranking queue.
In addition, when the variance reaches a certain degree, the results of the standard normal distribution become similar to the logistic function in [29]. When compared with Gaussian penalty, the logistic function has fewer parameters. At the same time, the dimension of the distribution function is reduced, as is the number of calculations. Especially in inference for object detection, it will have good results as well as an improved speed.

Algorithm 1 shows the fsNMS algorithm. represents the final detection set, which means that the detection boxes screened by the algorithm will be sorted in , and the final output target box and score will be obtained. Starting from a set of extracted boxes with corresponding scores, the top score is chosen first and moved from set to set . Then, we calculate the overlap between (in set ) and , and compare it with . A penalty is set to the score of extraction box with degrees of overlap greater than . Other scores remain unchanged, and all extraction boxes can be sorted in . Among them, of fsNMS optimizes the penalty distribution of softNMS according to (5), which causes it to be more efficient. The overall algorithm flowchart is shown in Figure 2.
3.3. Implementation Details
This subsection elaborates on datasets, evaluation metrics, and experimentspecific parameters and describes in detail the training and inference processes of the network. A model pretrained on ImageNet [30] is used in the experiments to initialize the detection network. The complete training and testing code was built on Pytorch [31] and mmdetection [32]. The settings of mmdetection are followed if some hyperparameters are not mentioned in this experiment.
Datasets. Experiments on the 6 category wafer map datasets [28] used 2 k training images and 382 validation sets and 429 test images. There are 3.1 k wafer bin maps (WBMs), including 6.2 k objects. In wafer testing, the final test results on the wafer are stored in the WBMs, which consist of binary values, and the WBM has six classical defect patterns (a–f) in our experiment. According to [33], these generally divided into these categories, i.e., “Center,” “EdgeLoc,” “Loc,” “Donut,” “Scratch,” and “EdgeRing.” The size of each wafer map is , and some specific image information is shown in Figure 3.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
Evaluation metrics. Two evaluation metrics are used in our experiments. The AP and time are used to evaluate the applicability of the network and apply a statistical parameter of precision () to our experiment. Next, if the defect detection effect is evaluated and the detection positioning performance is evaluated, the detection index and four parameters are defined as follows:
True positive (TP): predicting positive, the actual is positive.
False positive (FP): predicting positive, the actual is negative.
False negative (FN): predicting negative, the actual is negative.
True negative (TN): predicting positive, the actual is negative. (i)AP. is the ratio of the number of correctly predicted WBMs to the number of WBMs tested, and is the ratio of the number of correctly predicted WBMs to all ground truths of WBMs. After each object is classified, a confidence level is an output, and a confidence level threshold is set to obtain a pair of . Taking different confidence level thresholds, more pairs of  can be obtained, and the maximum value of corresponding to all the recall greater than the specified recall is used as the maximum under the currently specified recall .
Next, the WBM classification task is a multiclassification task, and mAP is the average precision for all categories. Thus, the mAP is used to evaluate the overall effect as follows:
With AP being the in the th class and is the total number of classes of WBMs being evaluated. The average precision (AP) metric averages the AP across IoU thresholds from 0.5 to 0.95 with an interval of 0.05. For box AP, , (AP at different IoU thresholds), and , (AP at different scales, where small () size is and medium () size is ) are also reported. (ii)Time. Time to detect all wafer map test sets. The runtime is measured on a single NVIDIA Tesla P100 GPU.
Experimental setting. Experiments use a single GPU to train the detector for 24 epochs, and the other model parameters are listed in Table 1. For the twostage detectors, the baselines in the experiments are to use heuristic methods for model initialization and optimization. To avoid the model oscillation caused by a large learning rate (lr), a warmup strategy is used in the initial 500 iterations, which caused the model to stabilize slowly. After the model is relatively stable, a preset lr of 0.02 is used for training. Among them, the initial lr for Cascade Mask RCNN [21] and Grid RCNN [34] is set to 0.002 because of the divergence of the loss function caused by the gradient explosion in the training process. After the 16th and 22nd epochs, the lr is reduced by 0.1, respectively. For each type of detector, the image size is adjusted to , and the aspect ratio uses the same design parameters. The detectors use a stochastic gradient descent (SGD) optimizer with a weight decay of 0.0001 and momentum 0.9. The batch size of the dataset is set to 8. For singlestage detectors, the specific parameters of the baseline are listed in Table 1. Due to the nature of the feature extraction part of the network, the image size and aspect ratio follow the setting of mmdetection.
Training. Experiments are trained on some baseline detectors, as shown in Figure 4. For the backbone part of the object detection network in twostage (RPN in the RoI extraction stage and the classification localization stage) using a model pretrained on ImageNet [30] by ResNeXt101 [19], the experiments are compared to the performance of ResNet [18] and ResNeXt [19] network, and the ResNeXt network is higher than ResNet in terms of accuracy. For the extraction of the RoI phase, the experiment explores the feature pyramid network (FPN) [17]. The RCNN network with FPN backbone can extract RoI features from different levels of the feature pyramid. The backbone of ResNeXt based on ResNetFPN is used for feature extraction, which has better improvement in precision and speed [22]. In the training stage, greedy NMS is used for postprocessing after RPN extraction, and the threshold value is 0.7 [32] in the RPN extraction stage. All detectors are trained for 24 epochs, and the average accuracy of baseline detector classification reached 98.89% on the validation sets of the wafer map. For the backbone part of the singlestage detector, Darknet [23], VGG [8], and ResNet [9] are used for feature extraction.
Inference. In inference, the greedy NMS is replaced with softNMS and fsNMS for postprocessing, respectively. Compared with the twostage network structure, it is worth noting that in the twostage shown in Figure 4, the NMS with a threshold of 0.7 is used in the RoI extraction stage (RPN) and the fsNMS with a threshold of 0.5 in the classification and localization stage (after fully connectional layer). On the same amount of wafer map test sets, the detection speed will be improved compared to softNMS [11], and the precision will not be affected. For singlestage detectors, the threshold of 0.45 for NMS in inference is better. Since the standard Gaussian penalty considered in Section 3.2 is similar to the logistic function to some extent, the imposed penalty will have the same effect, and the AP is unaffected. Secondly, compared with the linear function, the logistic function has the same efficiency as the general linear function, and the time complexity of our algorithm reaches , which is the same as that of the linear penalty in (3).
4. Experimental Results
This section mainly describes the results of the comparative experiments, including the comparison of detection efficiency in inference of softNMS and fsNMS (Section 4.1) and the comparison of detection precision between greedy NMS and fsNMS (Section 4.2). The experiments are carried out on some baseline detectors.
4.1. Efficiency Comparison Test
Our algorithm is first tested against some baseline detectors to better understand how it affects efficiency. In inference, since the time efficiency of softNMS is far more enough, the processed method improves the detection efficiency by optimizing the distribution model of penalty. As shown in Figure 5, for the twostage detector, comparison experiments are performed on Mask RCNN [22], Mask scoring RCNN [36], Cascade Mask RCNN [21], Grid RCNN [34], and Libra RCNN [35]. The performance of the proposed method is validated by replacing the postprocessing component. In inference, the greedy NMS is used in the RoI extraction stage, while the fsNMS is used in the classification and localization stages. Regardless of the relative position or angle of the object or different image features, the detector can find accurate objects in the key area. The detection precision would not be affected; this method improved the detection speed. As shown in Figure 5, the efficiency of the twostage detector based on the extraction box increases by 21.72% on average, especially in the Mask RCNN [22] detector by 25.8%. For the singlestage detector, the experiment was carried out on YOLO v3 [23], SSD 300 [8], and RetinaNet [9]. The filtering thresholds were set to 0.45, 0.45, and 0.5, respectively. The detection efficiency of singlestage detectors based on extraction box improved by 9.63% on average.
The overall time of the singlestage detectors using the improved algorithm is increased compared with that of the twostage detector. Due to a large number of extraction boxes of the singlestage detector, the postprocessing algorithm of rescore will cause the detector efficiency not as good as that of the greedy NMS of the strict pruning branch, and the efficiency of detection (Figure 5) and the detection effect (more clutter extraction boxes will be generated) is not as good as the twostage detector using the improved algorithm.
4.2. AP Comparison Test
In terms of the overall detector precision, the experiment replaced only the postprocessing of inference, comparing the detector precision using traditional greedy NMS and fsNMS, where threshold settings remain the same. The ablation study is shown in Figure 6, where the postprocessing algorithm is charged to perform the comparison experiments. The abscissa of Figure 6 represents the time of fsNMS, and the ordinate represents the growth of mAP. From the perspective of inference time, the time of the twostage detector is almost within 60 ms. The overall efficiency is greater than onestage detector. On the test sets, according to Section 3.3, AP evaluates the correct positioning of the object. The regression of relative to is more accurate, and and find different details of the object for the boxes of different scales. In postprocessing, the precision of the baseline detector is improved, as well as the location of the object at different scales. According to (8), the fsNMS algorithm improves the average precision of the detector for each category of object location, and the twostage detector on average mAP is increased by 1.76%.
Some recent singlestage detectors all have better detection precision and efficiency compared with the twostage detectors, showing a 2.7% improvement in mAP. RetinaNet extracts a large amount of anchor in the extraction stage, reaching 100 k [9]. For such twostage detectors [16], the total number of extracted boxes is only 20 k, and only a small fraction of them eventually coincide with the ground truth box. Therefore, anchorbased singlestage detectors are dependent on the postprocessing algorithm, and the improvement in precision will be higher than the twostage detectors. Overall, the fsNMS algorithm is effective in both the singlestage and twostage detectors and achieves the same effect as softNMS in improving precision.
According to Table 2, Mask RCNN uses two types of postprocessing methods (i.e., greedy NMS and fsNMS). And fsNMS obtains an improvement of 0.014 on mAP compared to greedy NMS. Secondly, the experimental results show that some hardtodetect (e.g., too much overlap between ranges) objects appear with lower scores, as shown in Figure 7. At the same time, it can be clearly seen that the algorithm rescores the results after imposing penalties on scores.
5. Conclusion
This paper proposed a novel fsNMS algorithm for the postprocessing stage of wafer map detection. Firstly, we discussed several key issues relating to the inefficiency of the traditional NMS algorithm and proposed an improved fsNMS algorithm to solve this problem. The algorithm rescores the score of the detection box by optimizing the penalty distribution model of softNMS, with the objective of improving the detection efficiency of inference and ensuring the stability of the precision. Meanwhile, the object detection method was explored and applied to the defect detection of the wafer map to improve the efficiency of industrial detection.
The experiments used base on wafer map datasets. The results show that in inference, the fsNMS algorithm outperforms traditional NMS in anchorbased detection precision. From the test results, highly overlapped defect objects will produce many falsepositive boxes (Figure 7) that cannot be completely eliminated. This provides a direction for our future research.
Data Availability
The original data are https://github.com/Junliangwangdhu/WaferMap.
Conflicts of Interest
The authors declare that they have no conflicts of interest.