Abstract

In semiconductor back-end production, the die attach process is one of the most critical steps affecting overall productivity. Optimization of this process can be modeled as a pick-and-place problem known to be NP-hard. Typical approaches are rule-based and metaheuristic methods. The two have high or low generalization ability, low or high performance, and short or long search time, respectively. The motivation of this paper is to develop a novel method involving only the strengths of these methods, i.e., high generalization ability and performance and short search time. We develop an interactive Q-learning in which two agents, a pick agent and a place agent, are trained and find a pick-and-place (PAP) path interactively. From experiments, we verified that the proposed approach finds a shorter path than the genetic algorithm given in previous research.

1. Introduction

The entire semiconductor manufacturing process can largely be divided into two sequential subprocesses of front-end production and back-end production. Both production processes involve many complicated steps for wafer fabrication, probe testing and sorting, assembly, final testing, etc. [1]. The front-end process refers to wafer fabrication (fab) and a wafer probe test called electrical die sorting (EDS), whereas the back-end process refers to preassembly, packaging (assembly), and burn-in and functional tests of individual semiconductor chips. Once the front-end processes are completed, the wafers are transferred to back-end production to facilitate their integration into electronic devices and to undergo a final performance test. Particularly, in back-end production, based on quality and location information of individual chips (die) derived from the EDS test, only good semiconductor chips are individually picked and attached to the support structure (e.g., the lead frame) on a strip by an automatic robot arm. This process is called the die attach process [2]. In semiconductor back-end production, this process is regarded as one of the most critical steps since it is the first packaging layer in contact with the die, which demands a high level of operational precision. Thus, optimization of the die attach process is important to maximize the overall productivity in semiconductor back-end production.

Optimization of the die attach process can be formulated as a typical pick-and-place (PAP) problem. The PAP problem is to find the shortest path to pick every component (good dies) and place it on the plate (strip). Because it is a well-known NP-hard problem [3], heuristic approaches such as the rule-based and metaheuristic methods are usually adopted. The rule-based approach is to define a proper dispatching rule and find the shortest path using the predefined rule. Park et al. [1] introduced 11 rules for PAP of the die attach process, which are categorized according to direction and starting point (e.g., the counterclockwise path starting from the center). Huang et al. [4] applied a greedy rule, which is to pick the nearest component and place it on the nearest plate, to solve a PAP problem of multirobot coordination. The metaheuristic approach is to design a metaheuristic algorithm, such as genetic algorithm (GA), particle swarm optimization (PSO), and Tabu search, and apply it to solve PAP problems. Park et al. [1] developed GA, which uses a binary matrix as an encoding scheme and batch and row-string crossover operations. Torabi et al. [5] solved a nozzle selection (pick) and component allocation (place) problem to minimize the workload of the bottleneck while maximizing all corresponding appropriateness factors by a multiobjective particle swarm optimization. Liu and Kozan [6] proposed a blocking job shop scheduling problem with robotic transportation and developed a hybrid algorithm based on a Tabu search.

Each approach can be evaluated by generalization ability, performance, and search time. The generalization ability reveals how easily the approach can be applied to solve various PAP problems once the model including rules and algorithms is defined or developed. The distance of a path determines the performance of the approach. Finally, the search time is the time to find a path. Rule-based approach is better than metaheuristic approach in terms of generalization ability and search time. That is, predefined rules can be applied to solve any PAP problems in a short time, but the metaheuristic approach designed for a problem cannot be applied to different ones and requires relatively long time to solve a problem. However, the performance of metaheuristic approach is better than that of rule-based approach because the path yielded from metaheuristic is usually shorter than that from rule-based approach.

Reinforcement learning explores the space of feasible solutions efficiently and effectively [7] and therefore has been actively applied to solve various optimization problems as alternatives of metaheuristic and rule-based approaches [814]. For example, Dou et al. [8] proposed a path planning method for mobile robots in intelligent warehouses based on Q-learning. In this method, the reward is designed to encourage fewer steps of the robots. As another example, Shiue et al. [9] applied a reinforcement learning approach to solve real-time scheduling problems in a smart factory, where Q-learning is used to determine the dispatching rule given remaining jobs.

The objective of this paper is to develop an approach that overcomes the weak points of the typical approach but incorporates the strengths. That is, this approach can show high performance and generalization capability and a shorter search time. With this motivation, we develop an interactive Q-learning method including two interactive agents. That is, a pick agent and a place agent transfer information, restrict or encourage certain actions, and have an impact on the training process of the other. This approach shows higher performance than the metaheuristic approach, relatively shorter training time, and a similar search time to the rule-based approach.

The rest of this paper is organized as follows. Section 2 briefly explains reinforcement learning and Q-learning, which is the basis of our approach. Section 3 states the PAP problem in a die attach process and develops a mathematical model. Section 4 develops the interactive Q model, and Section 5 compares the developed model and metaheuristic approach. Finally, Section 6 concludes the paper and suggests a future research direction.

2. Reinforcement Learning

Reinforcement learning is to train an agent to discover a sequence of actions that yields the highest reward by searching for many pairs of states and actions [15], as presented in Figure 1.

To be more specific, when the system stays at state and an agent performs action , the environment returns reward , and the system transfers to the new state . An episode is the sequence of an agent’s actions starting from initial state to terminal state . The purpose of the reinforcement learning is to find the action sequence (or state sequence ) with the trained agent experiencing many episodes.

Among the reinforcement learning methods, Q-learning selects an action on a state in a greedy manner with regard to Q-function. That is, the action achieving the maximum Q-function on a state is selected. Q-function for a pair of state and action is defined as the expected sum of discounted rewards when performing on state [16]. Q-table is a matrix whose (th component is the Q-function value of the state and the action, .

The Q-table is updated through episodes from the (old) Q-table by the Bellman equation [17]: where is time period of an episode; and are the updated Q-function and old Q-function for and , respectively; is the learning rate; is the discount rate; and is the reward when performing on . In this equation, is a weighted sum of the old value and the learned value, where the weights are obtained from the learning rate. The discount rate reflects the importance of the expected value of the next action .

3. Problem Statement and Mathematical Model

3.1. Problem Statement

In postfabrication process of the semiconductor manufacturing, the probe test is performed on individual die (chip) of wafers to identify defect of dies and classify dies by the level of its quality. Then, the only “good” dies (i.e., conforming chips) are assembled and packaged in the assembly step in back-end production [18, 19]. After the test, a robot arm repeatedly picks up a good die in a wafer and places it on a lead frame in a strip until every good die is transferred in the die attach process. The robot arm starts and ends the process at the origin. That is, the start point and end point are the same and fixed. The total moving distance highly depends on the order of pick and place (i.e., path). Therefore, the considered problem is to find the shortest path to transfer every good die to a lead frame with the robot arm. Note that the Manhattan distance metric is employed to calculate distance, because the robot arm moves only vertically or horizontally due to several technical issues.

We assume that the strip and wafer are located at the left and right sides on the coordinate, respectively, and are rectangular, as presented in Figure 2. Even though these assumptions may be unrealistic (e.g., the wafer is round), they do not affect the modeling and development of the interactive Q-learning approach.

3.2. Notations

Notations used in this paper are as follows.

Indices: Lead frame index, : Good die index, : Agent index, ( pick agent, : place agent)

Problem Parameters: Strip shape: Wafer shape: Coordinate of lead frame : Coordinate of good die : Horizontal distance between the origin and : Vertical distance between the origin and : Horizontal distance between the two consecutive lead frames: Vertical distance between the two consecutive lead frames: Horizontal distance between the origin and the first die: Vertical distance between the origin and the first die: Horizontal distance between two consecutive dies: Vertical distance between two consecutive dies

Model Parameters: Maximum number of iterations: State space of agent : Current state of agent : Action space of agent : Feasible action space of agent : Q-table of agent : Path, , where and are the starting and ending points of the path, respectively. is the point (good die if is odd and lead frame otherwise) and is the length of the path: Total distance of path

3.3. Mathematical Model

The mathematical model is presented as follows:The objective function in (2) is to minimize the total distance, wherewhere and are the coordinates of the selected good die and lead frame, respectively. denotes the distance from the origin to the first selected good die, and denotes the distance from the last selected lead frame to the origin.

Constraints (3) and (4) indicate that the robot arm picks a good die and places it on the empty lead frame. Constraints (5) and (6) show that the good dies are transferred until an empty strip is full; once the strip is full, it is replaced with a new one until every good die is transferred. Two typical example paths satisfying these constraints when and are and . As one can see, the path starts and ends at the origin, there is no duplicate of good dies in a path, and a new strip is installed only when the previous strip is full.

4. Interactive Q Model

4.1. Configuration and Deployment

The proposed interactive Q model is configured as shown in Figure 3.

As seen in Figure 3, the model consists of three layers: problem, agent, and path. The problem layer transfers the information of the problem (i.e., number and location of the remaining good dies in the wafer and empty lead frames in the strip) to the agent layers. The pick and place agents in the agent layer select a good die to be picked and a lead frame to which the good die is placed considering the information, respectively. The selected good die and lead frame are appended to the path. Finally, the information that the selected good die and lead frame are no longer feasible is transferred to the problem layer.

Agent consists of state space , action space , feasible action space , and Q-table . and are sets of locations of the robot arm before picking and placing, respectively. and are action spaces of the pick agent and place agent, respectively, where is to pick a good die , and is to place the selected good die on lead frame . Here, indicates that the robot arm returns to the origin. Feasible actions of the pick agent and place agent include good dies in a wafer and empty lead frames, respectively. That is, good dies in a wafer and empty lead frames are feasible. indicates a Q-function of when the current state is for and , where is the origin. Likewise, indicates a Q-function of when the current state is for and , where is the origin.

Pick and place agents act in a greedy manner with respect to Q-table, whose training method is explained in Section 4.2. The pick agent picks a good die with the maximum Q-value in a wafer (i.e., it selects the action with the maximum Q-value among feasible actions) when the current state is , as follows:where is the selected action of the pick agent. The place agent places the selected good die on an empty lead frame with the maximum Q-value, as follows:where is the selected action of the place agent. After the actions are determined, the feasible action space is updated.

Algorithm 1 shows the routing process of the interactive Q model.

Input: , , , , ,
Procedure: Initialize and
Initialize and
Initialize # : current state of the pick agent
Initialize
Until do
Increase by 1
If do# If a strip is full, the strip is replaced with a new one
# Pick agent procedure
# Select an action of the pick agent
# Update the current state of the place agent
# Update the feasible action space of the pick agent
# Place agent procedure
Increase by 1
# Select an action of the place agent
# Update the feasible action space of the place agent
Output:
4.2. Training Algorithm

and are updated using the Bellman equation presented in . When updating the Q-tables, they impact one other because the estimated optimal future values (i.e., and ) of the pick and place agents are the Q-function values of the place and pick agents, as given in (10) – (11).where and are updated (new) Q-functions, while and are old Q-functions. is the reward of the pick agent for selecting from , which is computed as the reciprocals of distances and . is also similarly defined and computed. Note that and are updated only when and are selected when the current states are and , respectively.

Algorithm 2 shows the training algorithm for and based on general Q-learning’s exploration strategy, considering both current and future reward. To be more concrete, Q-value of agent is updated by taking the weighted average of the current reward and future reward , which means that each agent whose current state is at time considers distance between and as well as between and . It trains Q-tables using the wafer including no bad dies, but the trained Q-tables can be used to solve every problem if the wafer and strip shapes are the same. Thus, the interactive Q-model has higher generalization capability than metaheuristic algorithms.

Input: , , , ,
Procedure: Initialize every element in , , , with arbitrary numbers
Initialize
Until ( and ordo
Initialize and
Initialize # : current state of the pick agent
Until do
Ifdo# If a strip is full, it is replaced with a new one
# Pick agent update
# Update the current state of the place agent
# Update the feasible action space of the pick agent
# Place agent update
# Select the action of the place agent
# Update the feasible action space of the place agent
Increase by 1
Output: ,

5. Experiment

In this section, we compare the performance of our interactive Q-learning model with those of the GA model and rule-based models presented in the literature [1].

5.1. Parameters

The problem parameters are obtained from Park et al. [1] as follows: , , , , , , , , , .

Four wafers are considered according to the yield rates of dies (80% and 90%) and the bad die distribution (bivariate normal and uniform distribution). The wafers are depicted in Figure 4, where colored cells denote bad dies.

Four rules are adopted from Park et al. [1], where they are defined as follows.

Rule 1. The robot arm moves good dies from the upper-left side of a wafer to empty lead frames in the upper-left side of a strip.

Rule 2. The robot arm starts moving the good dies from the upper-left side of a wafer to empty lead frames in the upper-right side of a strip.

Rule 3. The robot arm moves good dies from the upper-right side of a wafer to empty lead frames in the upper-right side of a strip.

Rule 4. The robot arm starts moving the good dies from the upper-right side of a wafer to empty lead frames in the upper-left side of a strip.

The parameters of GA were set as follows: the number of initial solutions: 200, the number of iterations: 100, and crossover operation: batch and row-string crossover operation. The interactive Q-learning model’s parameters are set as , , and , and the model is trained using the same sized wafer and strip. Note that the wafer for training is assumed to have no bad die, and the trained model can be applied to every problem if the shapes of the wafer and strip remain fixed.

5.2. Results

Figure 5 shows the performance comparison results.

Figure 5 shows that the proposed model outperforms the GA and rule-based models in every problem, implying that the model has sufficient generalization ability, as well as excellent performance. Specifically, 13.14%, 13.23%, 7.05%, and 8.09% of the total distance are decreased from GA for wafers A, B, C, and D, respectively. All rule-based models have produced longer total distances than GA for all wafers. From the experiment result, we can conclude that the interactive Q model outperforms GA, it is more effective when the yield rate is low or when the defective rate is high, and it is robust to the bad die distribution.

As for the running time, the proposed model and rule-based models take less than a second, but GA takes more than an hour to yield a path and is highly dependent on the number of iterations and initial solutions. In addition, the interactive Q model requires short training time. In our experiment, the training time when and is 254 seconds. That is, one episode requires only 0.254 seconds. In addition, yielding a path by means of the model takes less than 1 second. This may imply that the model can be trained and applied to solve the problem in real time.

6. Conclusion

In this paper, we addressed the PAP problem of the die attach process to maximize the overall productivity in semiconductor back-end production. Due to the NP-hardness of this problem, rule-based and metaheuristic approaches have been applied. These approaches, however, should be improved because the metaheuristic approach has low generalization ability. With this motivation, we developed an interactive Q-learning approach equipped with two interactive agents to find a path. The experiment revealed that the proposed model shows higher performance than GA and has almost the same search time as the rule-based approach.

In future work, one can modify our interactive Q-learning to solve various optimization problems in the semiconductor industry and PAP problems in other industries. In addition, a hybrid approach of Q-learning and other approaches such as rule-based and metaheuristic approaches can be developed to solve more complicated problems efficiently. That is, incorporating the proposed interactive Q-learning with rule-based approach can increase the generalization ability, which will solve PAP problems with different wafers.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work has supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2017R1A2B4006643).