Abstract

Wafer bin map (WBM) represents specific defect pattern that provides information for diagnosing root causes of low yield in semiconductor manufacturing. In practice, most semiconductor engineers use subjective and time-consuming eyeball analysis to assess WBM patterns. Given shrinking feature sizes and increasing wafer sizes, various types of WBMs occur; thus, relying on human vision to judge defect patterns is complex, inconsistent, and unreliable. In this study, a clustering ensemble approach is proposed to bridge the gap, facilitating WBM pattern extraction and assisting engineer to recognize systematic defect patterns efficiently. The clustering ensemble approach not only generates diverse clusters in data space, but also integrates them in label space. First, the mountain function is used to transform data by using pattern density. Subsequently, k-means and particle swarm optimization (PSO) clustering algorithms are used to generate diversity partitions and various label results. Finally, the adaptive response theory (ART) neural network is used to attain consensus partitions and integration. An experiment was conducted to evaluate the effectiveness of proposed WBMs clustering ensemble approach. Several criterions in terms of sum of squared error, precision, recall, and F-measure were used for evaluating clustering results. The numerical results showed that the proposed approach outperforms the other individual clustering algorithm.

1. Introduction

To maintain their profitability and growth despite continual technology migration, semiconductor manufacturing companies provide wafer manufacturing services generating value for their customers through yield enhancement, cost reduction, on-time delivery, and cycle time reduction [1, 2]. The consumer market requires that semiconductor products exhibiting increasing complexity be rapidly developed and delivered to market. Technology continues to advance and required functionalities are increasing; thus, engineers have a drastically decreased amount of time to ensure yield enhancement and diagnose defects [3].

The lengthy process of semiconductor manufacturing involves hundreds of steps, in which big data including the wafer lot history, recipe, inline metrology measurement, equipment sensor value, defect inspection, and electrical test data are automatically generated and recorded. Semiconductor companies experience challenges integrating big data from various sources into a platform or data warehouse and lack intelligent analytics solutions to extract useful manufacturing intelligence and support decision making regarding production planning, process control, equipment monitoring, and yield enhancement. Scant intelligent solutions have been developed based on data mining, soft computing, and evolutionary algorithms to enhance the operational effectiveness of semiconductor manufacturing [47].

Circuit probe (CP) testing is used to evaluate each die on the wafer after the wafer fabrication processes. Wafer bin maps (WBMs) represent the results of a CP test and provide crucial information regarding process abnormalities, facilitating the diagnosis of low-yield problems in semiconductor manufacturing. In WBM failure patterns the spatial dependences across wafers express systematic and random effects. Various failure patterns are required; these pattern types facilitate rapidly identifying the associate root causes of low yield [8]. Based on the defect size, shape, and location on the wafer, the WBM can be expressed as specific patterns such as rings, circles, edges, and curves. Defective dies caused by random particles are difficult to completely remove and typically exhibit nonspecific patterns. Most WBM patterns consisted of a systematic pattern and a random defect [810].

In practice, thousands of WBMs are generated for inspection and engineers must spend substantial time on pattern judgment rather than determining the assignable causes of low yield. Grouping similar WBMs into the same cluster can enable engineers to effectively diagnose defects. The complicated processes and diverse products fabricated in semiconductor manufacturing can yield various WBM types, making it difficult to detect systematic patterns by using only eyeball analysis.

Clustering analysis is used to partition data into several groups in which the observations are homogeneous within a group and heterogeneous between groups. Clustering analysis has been widely applied in applications such as grouping [11] and pattern extraction [12]. However, most conventional clustering algorithms influence the result based on the data type, algorithm parameter settings, and prior information. For example, the -means algorithm is used to analyze substantial amount of data that exhibit time complexity [13]. However, the results of the -means algorithm depend on the initially selected centroid and predefined number of clusters. To address the disadvantages of the -means algorithm evolutionary methods have been developed to conduct data clustering such as the genetic algorithm (GA) and particle swarm optimization (PSO) [14]. PSO is particularly advantageous because it requires less parameter adjustment compared with the GA [15].

Combining results by applying distinct algorithms to the same data set or algorithm by using various parameter settings yields high-quality clusters. Based on the criteria of the clustering objectives, no individual clustering algorithm is suitable for whole problem and data type. Compared with individual clustering algorithms, clustering ensembles that combine multiple clustering results yield superior clustering effectiveness regarding robustness and stability, incorporating conflicting results across partitions [16]. Instead of searching for an optimal partition, clustering ensembles capture a consensus partition by integrating diverse partitions from various clustering algorithms. Clustering ensembles have been developed to improve the accuracy, robustness, and stability of clustering; such ensembles typically involve two steps. The first step involves generating a basic set of partitions that can be similar to or distinct from those of various parameters and cluster algorithms [17]. The second step involves combining the basic set of partitions by using a consensus function [18]. However, with the shrinking integrated circuit feature size and complicated manufacturing process, the WBM patterns become more complex because of various defect density, die size, and wafer rotation. It is difficult to extract defect pattern by single specific clustering approach and needs to incorporate different clustering aspects for various complicated WBM patterns.

To bridge the need in real setting, this study proposes a WBM clustering ensemble approach to facilitate WBM defect pattern extraction. First, the target bin value is categorized into binary value and the wafer maps are transformed from two-dimensional to one-dimensional data. Second, -means and PSO clustering algorithms are used to generate various diversity partitions. Subsequently, the clustering results are regarded as label representations to facilitate aggregating the diversity partition by using an adaptive response theory (ART) neural network. To evaluate the validity of the proposed method, an experimental analysis was conducted using six typical patterns found in the fabrication of semiconductor wafers. Using various parameter settings, the proposed cluster ensembles that combine diverse partitions instead of using the original features outperform individual clustering methods such as -means and PSO.

The remainder of this study is organized as follows. Section 2 introduces a fundamental WBM. Section 3 presents the proposed approach to the WBM clustering problem. Section 4 provides experimental comparisons, applying the proposed approach to analyze the WBM clustering problem. Section 5 offers a conclusion and the findings and future research directions are discussed.

A WBM is a two-dimensional failure pattern. Based on various defects types, random, systematic, and mixed failure patterns are primary types of WBMs generated during semiconductor fabrication [19, 20]. Random failure patterns are typically caused by random particles or noises in the manufacturing environment. In practice, completely eliminating these random defects is difficult. Systematic failure patterns show the spatial correlation across wafers such as rings, crescent moon, edge, and circles. Figure 1 shows typical WBM patterns which are transformed into binary values for visualization and analysis. The dies that pass the functional test are denoted as 0 and the defective dies are denoted as 1. Based on the systematic patterns, domain engineers can rapidly determine the assignable causes of defects [8]. Mixed failure patterns comprise the random and systematic defects on a wafer. The mixed pattern can be identified if the degree of the random defect is slight.

Defect diagnosis of facilitating yield enhancement is critical in the rapid development of semiconductor manufacturing technology. An effective method of ensuring that the causes of process variation are assignable is analyzing the spatial defect patterns on wafers. WBMs provide crucial guidance, enabling engineers to rapidly determine the potential root causes of defects by identifying patterns. Most studies have used neural network and model-based approaches to extract common WBM patterns. Hsu and Chien [8] integrated spatial statistical analysis and an ART neural network to conduct WBM clustering and associated the patterns with manufacturing defects to facilitate defect diagnosis. In addition to ART neural network, Liu and Chien [10] applied moment invariant for shape clustering of WBMs. Model-based clustering algorithms are used to construct a model for each cluster and compare the likelihood values between clusters to identify defect patterns. Wang et al. [21] used model-based clustering, applying a Gaussian expectation maximization algorithm to estimate defect patterns. Hwang and Kuo [22] modeled global defects and local defects in clusters exhibiting ellipsoidal patterns and local defects in clusters exhibiting linear or curvilinear patterns. Yuan and Kuo [23] used Bayesian inference to identify the patterns of spatial defects in WBMs. Driven by continuous migration of semiconductor manufacturing technology, the more complicated types of WBM patterns have been occurred due to the increase of wafer size and shrinkage of critical dimensions on specific aspect of complex WBM pattern and little research has evaluated using the clustering ensemble approach to analyze WBMs and extract failure patterns.

3. Proposed Approach

The terminologies and notations used in this study are as follows:: number of gross dies;: number of wafers;: number of particles;: number of clusters;: number of bad dies;: wafer index, ;: dimension index, ;: cluster index, ;: particle index, ;: clustering result index, ;: bad die index, ;: clustering subobjective in PSO clustering, ;: uniform random number in the interval ;: inertia weight of velocity update;: weight of clustering subobjective;: personal best position acceleration constants;: global best position acceleration constants;: a normalization factor;: a constant for approximate density shape in mountain function;: the th bad die on a wafer;: the number of WBMs in the th cluster;: the number of WBMs in the th cluster of th particle;: subset of WBMs in the th cluster;: maximum value in the WBM data;: vector of the th cluster centroid, ;: vector centroid of the th cluster of th particle;: vector centroids of the th particle, ;: position of the th particle at the th dimension;: velocity of the th particle at the th dimension;: personal best position () of the th particle at th dimension;: global best position () at the th dimension;: vector of the th WBM, ;: vector position of the th particle, ;: vector velocity of the th particle, ;: vector personal best of the th particle, ;: vector global best position, .

3.1. Problem Definition of WBM Clustering Ensemble

Clustering ensembles can be regarded as two-stage partitions, in which various clustering algorithms are used to assess the data space at the first stage and consensus function is used to assess the label space at the second stage. Figure 2 shows the two-stage clustering perspective. Consensus function is used to develop a clustering combination based on the diversity of the cluster labels derived at the first stage.

Let denote a set of WBMs and denote a set of partitions based on clustering results. The various partitions of represent a label assigned to by the th algorithm. Each label vector is used to construct a representation , in which the partitions of comprise a set of labels for each wafer , . Therefore, the difficulty of constructing a clustering ensemble is locating a new partition that provides a consensus partition satisfying the label information derived from each individual clustering result of the original WBM. For each label , a binary membership indicator matrix is constructed, containing a column for each cluster. All values of a row in the are denoted as 1 if the row corresponds to an object. Furthermore, the space of a consensus partition changes from the original features into features. For example, Table 1 shows eight WBMs grouped using three clustering algorithms (); the three clustering results are transformed into clustering labels that are transformed into binary representations (Table 2). Regarding consensus partitions, the binary membership indicator matrix is used to determine a final clustering result, using a consensus model based on the eight features ().

3.2. Data Transformation

The binary representation of good and bad dies is shown in Figure 3(a). Although this binary representation is useful for visualisation, displaying the spatial relation of each bad die across a wafer is difficult.

To quantify the spatial relations and increase the density of a specific feature, the mountain function is used to transform the binary value into a continuous value. The mountain method is used to determine the approximate cluster center by estimating the probability density function of a feature [24]. Instead of using a grid node, a modified mountain function can employ data points by using a correlation self-comparison [25]. The modified mountain function for a bad die on a wafer is defined as follows:where and is the distance between dices and . Parameter is the normalization factor for the distance between bad die and the wafer centroid . Parameter is a constant. Parameter determines the approximate density shape of the wafer. Figure 3(b) shows an example of WBM transformation. Two types of data are used to generate a basic set of partitions. Moreover, each WBM must sequentially transform from a two-dimensional map into a one-dimensional data vector [8]. Such vectors are used to conduct further clustering analysis.

3.3. Diverse Partitions Generation by -Means and PSO Clustering

Both -means and PSO clustering algorithms are used to generate basic partitions. To consider the spatial relations across a wafer, both the binary and continuous values are used to determine distinct clustering results by using -means and PSO clustering. Subsequently, various numbers of clusters are used for comparison.

-means is an unsupervised method of clustering analysis [13] used to group data into several predefined numbers of clusters by employing a similarity measure such as the Euclidean distance. The objective function of the -means algorithm is to minimize the within-cluster difference, that is, the sum of the square error (SSE) which is determined using (3). The -means algorithm consists of the following steps as shown in Procedure 1:

(1) Randomly select data as the centroid of cluster
(2) Repeat
For each data vector, assign each data into the group with respect to the closest centroid by
   minimum Euclidean distance.
  recalculate the new centroid based on all data within the group.
end for
(3) Steps 1 and 2 are iterated until there is no data change.

Data clustering is regarded as an optimisation problem. PSO is an evolutionary algorithm [14] which is used to search for optimal solutions based on the interactions amongst particles; it requires adjusting fewer parameters compared with using other evolutionary algorithms. van der Merwe and Engelbrecht [26] proposed a hybrid algorithm for clustering data, in which the initial swarm is determined using the -means result and PSO is used to refine the cluster results.

A single particle represents the cluster centroid vectors: . A swarm defines a number of candidate clusters. To consider the maximal homogeneity within a cluster and heterogeneity between clusters, a fitness function is used to maximize the intercluster separation and minimize the intracluster distance and quantisation errorwhere is a matrix representing the assignment of the WBMs to the clusters of the th particle. The following quantization error equation is used to evaluate the level of clustering performance:In addition, is the maximum average Euclidean distance of particle to the assigned clusters andis the minimum Euclidean distance between any pair of clusters. Procedure 2 shows the steps involved in the PSO clustering algorithm.

(1)Initialize each particle with cluster centroids.
(2)For iteration to do
For each particle do
For each data pattern
 calculate the Euclidean distance to all cluster centroids and assign pattern to cluster
  which has the minimum distance
end for
 calculate the fitness function
end for
 find the personal best and global best positions of each particle.
 update the cluster centroids by the update velocity equation (i) and update coordinate equation (ii).
   (i)
                   (ii)
end for
(3) Step 2 is iterated until these is no data change

3.4. Consensus Partition by Adaptive Response Theory

ART has been used in numerous areas such as pattern recognition and spatial analysis [27]. Regarding the unstable learning conditions caused by new data, ART can be used to address stability and plasticity because it addresses the balance between stability and plasticity, match and reset, and search and direct access [8]. Because the input labels are binary, the ART1 neural network [27] algorithm is used to attain a consensus partition of WBMs.

The consensus partition approach is as follows.

Step 1. Apply -means and PSO clustering algorithms and use various parameters (e.g., various numbers of clusters and types of input data) to generate diverse clusters.

Step 2. Transform the original clustering label into binary representation matrix as an input for ART1 neural network.

Step 3. Apply ART1 neural network to aggregate the diverse partitions.

4. Numerical Experiments

In this section, this study conducts a numerical study to demonstrate the effectiveness of the proposed clustering ensemble approach. Six typical WBM patterns from semiconductor fabrication were used such as moon, edge, and sector. In the experiments, the percentage of defective dies in six patterns is designed based on real cases. Without losing generality of WBM patterns, the data have been systematically transformed for proprietary information protection of the case company. Total 650 chips were exposed on a wafer. Based on various degrees of noise, each pattern type was used to generate 10 WBMs for estimating the validity of proposed clustering ensemble approach. The noise in WBM could be caused from random particles across a wafer and test bias in CP test, which result in generating bad die randomly on a wafer and generating good die within a group of bad dies. It means that some bad dices are shown as good dice and the density of bad die could be sparse. For example, the value of degree of noise is 0.02 which represents total 2% good die and bad dies are inverse.

The proposed WBM clustering ensemble approach was compared with -means, PSO clustering method, and the algorithm proposed by Hsu and Chien [8]. Six numbers of clusters were used for single -means methods and single PSO clustering algorithms. Table 3 showed the parameter settings for PSO clustering. The number of clusters extracted by ART1 neural network is sensitive to the vigilance threshold value. The high vigilance threshold is used to produce more clusters and the similarity within a cluster is high. In contrast, the low vigilance threshold results in fewer numbers of clusters. However, the similarity within a cluster could be low. To compare the parameter setting of ART1 vigilance threshold, various values were used as shown in Figure 4. Each clustering performance was evaluated in terms of the SSE and number of clusters. The SSE is used to compare the cohesion amongst various clustering results, and a small SSE indicates that the WBM within a cluster is highly similar. The number of clusters represents the effectiveness of the WBM grouping. According to the objective of clustering is to group the WBM into few clusters in which the similarities among the WBMs within a cluster are high as possible. Therefore, the setting of ART1 vigilance threshold value is used as 0.50 in the numerical experiments.

WBM clustering is to identify the similar type of WBM into the same cluster. To consider only six types of WBMs that were used in the experiments, the actual number of clusters should be six. Based on the various degree of noise in WBM generation as shown in Table 4, several individual clustering methods including ART1 [8], -means clustering, and PSO clustering were used for evaluating clustering performance. Table 4 shows that the ART1 neural network yielded a lower SSE compared with the other methods. However, the ART1 neural network separates the WBM into 15 clusters as shown in Figure 5. The ART1 neural network yields unnecessary partitions for the similar type of WBM pattern. In order to generate diverse clustering partitions for clustering ensemble method, four combinations with various data scale and clustering algorithms including -means by binary value (KB), -means by continuous value (KC), PSO by binary value (PB), and PSO by continuous value (PC) are used. Regardless of the individual clustering results based on six numbers of clusters, using -means clustering and PSO clustering individually yielded larger SSE values than using ART1 only.

Table 4 also shows the clustering ensembles that use various types of input data. For example, the clustering ensemble method KB&PB integrates the six results including the -means algorithm by three kinds of clusters (i.e., ) and PSO clustering by three kinds of clusters (i.e., ), respectively, to form the WBM clustering via label space. In general, the clustering ensembles demonstrate smaller SSE values than do individual clustering algorithms such as the -means or PSO clustering algorithms.

In addition to compare the similarity within the cluster, an index called specificity was used to evaluate the efficiency of the evolved cluster over representing the true clusters [28]. The specificity is defined as follows:where is the number of true WBM patterns covered by the number of evolved WBM patterns and is the total number of evolved WBM patterns. As shown in the ART1 neural network clustering results, the total number of evolved WBM clusters is 15 and number of true WBM clusters is 6. Then, the specificity is 0.4. Table 5 shows the results of specificity among clustering methods. The ART1 neural network has the lowest specificity due to the large number of clusters. The specificity of individual clustering is 1 because the number of evolved WBM patterns is fixed as 6. Furthermore, compared with individual clustering algorithms, combining various clustering ensembles yields not only smaller SSE values, but also smaller numbers of clusters. Thus, the homogeneity within a cluster can be improved using proposed approach. The threshold of ART1 neural network yields maximal cluster numbers. Therefore, the proposed clustering ensemble approach considering diversity partitions has better results regarding the SSE and number of clusters than individual clustering methods.

To evaluate the results among various clustering ensembles and to assess cluster validity, WBM class labels are employed based on six pattern types as shown in Figure 6. Thus, the indices including precision and recall are two classification-oriented measures [29] defined as follows:where TP (true positive) is the number of WBMs correctly classified into WBM patterns, FP (false positive) is the number of WBMs incorrectly classified, and FN (false negative) is the number of WBMs that need to be classified, but not to be determined incorrectly. The precision measure is used to assess how many WBMs classified as Pattern (a) are actually Pattern (a). The recall measure is used to assess how many samples of Pattern (a) are correctly classified.

However, a trade-off exists between precision and recall; therefore, when one of these measures increases, the other decreases. The -measure is a harmonic mean of the precision and recall which is defined as follows:

Specifically, the -measure represents the interaction between the actual and classification results (i.e., TP). If the classification result is close to the actual value, the -measure is high.

Tables 6, 7, and 8 show a summary of various metrics among six types of WBM in precision, recall, and -measure, respectively. As shown in Figure 6, Patterns (b) and (c) are similar in the wafer edge, demonstrating smaller average precision and recall values compared with the other patterns. The clustering ensembles which generate partitions by using -means make it difficult to identify in both Patterns (b) and (c). Using a mountain function transformation enables considering the defect density of the spatial relations between the good and bad dies across a wafer. Based on the -measure, the clustering ensembles obtained using all generated partitions exhibit larger precision and recall values and superior levels of performance regarding each pattern compared with the other methods. Thus, the partitions generated by using -means and PSO clustering in various data types must be considered.

The practical viability of the proposed approach was examined. The results show that the ART1 neural network performing into data space directly leads to worse clustering performance in terms of precision. However, the true types of WBM can be identified through transforming original data space into label space and performing consensus partition by ART1 neural network. The proposed cluster ensemble approach can get better performance with fewer numbers of clusters than other conventional clustering approaches including -means, PSO clustering, and ART1 neural network.

5. Conclusion

WBMs provide important information for engineers to rapidly find the potential root cause by identifying patterns correctly. As the driven force for semiconductor manufacturing technology, WBM identification to the correct pattern becomes more difficult because the same type of patterns is influenced by various factors such as die size, pattern density, and noise degree. Relying on only engineers’ experiences of visual inspections and personal judgments in the map patterns is not only subjective, and inconsistent, but also very time-consuming and inefficient. Therefore, grouping similar WBM quickly helps engineer to use more time to diagnose the root cause of low yield.

Considering the requirements of clustering WBMs in practice, a cluster ensemble approach was proposed to facilitate extracting the common defect pattern of WBMs, enhancing failure diagnosis and yield enhancement. The advantage of the proposed method is to yield high-quality clusters by applying distinct algorithms to the same data set and by using various parameter settings. The robustness of clustering ensemble is higher than individual clustering method because the clustering from various aspects including algorithms and parameter setting is integrated into a consensus result.

The proposed clustering ensemble has two stages. At the first stage, diversity partitions are generated using two types of input data: various cluster numbers and distinct clustering algorithms. At the second stage, a consensus partition is attained using these diverse partitions. The numerical analysis demonstrated that the clustering ensemble is superior to using individual -means or PSO clustering algorithms. The results demonstrate that the proposed approach can effectively group the WBMs into several clusters based on their similarity in label space. Thus, engineers can have more time to focus the assignable cause of low yield instead of extracting defect patterns.

Clustering is an exploratory approach. In this study, we assume that the number of clusters is known. Evaluating the clustering ensemble approach, prior information is required regarding the cluster numbers. Further research can be conducted regarding self-tuning the cluster number in clustering ensembles.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This research is supported by National Science Council, Taiwan (NSC 102-2221-E-155-093; MOST 103-2221-E-155-029-MY2). The author would like to thank Mr. Tsu-An Chao for his kind assistance. The author also wishes to thank the editors and two anonymous referees for their insightful comments and suggestions.