#### Abstract

Temperature stability is critical to the consistency of product quality in the injection molding process, and it is very necessary to improve the temperature control accuracy under dynamic conditions. However, due to the large time delay, strong coupling, and the dynamic characteristics existing in the system, it is not an easy task to achieve precise temperature control in the injection molding process. In this paper, a new intelligent temperature compensation control strategy for the injection molding process under dynamic conditions is proposed in order to solve two key problems in the compensation control strategy: the compensation time and compensation quantity. A data-based feedforward iterative learning control (ILC) algorithm is designed to learn the optimal compensation time. Once the optimal compensation time is learned, a deep Q-learning algorithm which combined Q-learning with an artificial neural network (ANN) is proposed to learn the optimal compensation quantity. An experimental platform is designed to validate the superiority of the proposed method. Experimental results show that the proposed method can effectively improve temperature control accuracy under dynamic conditions. Meanwhile, the product consistency has also been improved.

#### 1. Introduction

Polymer injection molding is one of the most widely used processing methods for producing polymer products [1]. As a typical batch production process, product consistency is critical in the injection molding process [2]. Temperature is one of the two dominant factors that affect product consistency (the other one is pressure). And in the work reported by Gim et al. [3], temperature shows a stronger correlation. Therefore, improving the stability of temperature control is very important to improve product quality consistency. In general, in addition to the effects of machine design and materials, the stability of temperature is mainly determined by the following processing parameters: barrel temperature, screw rotation speed, back pressure, dwell time, and injection stroke. Limited by the processing technology and production efficiency, except for the barrel temperature, the other parameters are not controllable variables in the production process. Therefore, precise barrel temperature control is critical to improve the stability of melt temperature and product consistency.

However, it is challenging to achieve precise control of barrel temperature due to the two major particular characteristics, namely, large time delays and strong coupling [4]. Usually, the barrel is heated by using electrical resistance heating. The heaters are wrapped around the outside surface of the barrel. Therefore, heat is concentrated on the outside surface of the barrel and heating the material inside the barrel by means of heat conduction. Due to the thick barrel wall, it usually takes dozens of seconds or more until the heat is transferred to the internal part of the barrel. Thus, there is a huge time delay in the system. On the other hand, owing to the intermittent disturbances caused by (1) the ease of heat conduction, (2) the fresh raw materials entering, (3) changes in the environment, and (4) the shear heat generated during the operation, these make it a strong coupling system.

In most of the existing injection molding processes, the temperature is usually controlled by a proportional integral differential (PID) algorithm [5]. The PID algorithm has been widely applied in temperature control [6, 7]. It is easy to implement and can provide good robustness under static conditions. But the polymer injection molding process contains several iterative and repeated operations. Each operation is performed intermittently. Therefore, the barrel is in dynamic conditions during the injection molding process. Hence, the PID algorithm cannot provide a satisfactory barrel temperature control performance for the whole operation process. In the past decades, in order to solve the barrel temperature control problem, many advanced control methods have been proposed such as adaptive decoupling control [8], model predictive synchronous control [9], multivariable self-tuning predictive control [10], self-optimizing model predictive control [11], adaptive generalized predictive control [12], and response identification and internal model control [13]. These methods can effectively solve the problem of large time delay. However, most of these existing methods are based on feedback control strategies and cannot deal with the intermittent disturbances. On the other hand, these methods do not take into account the dynamic characteristics of the injection molding process.

Considering the repeated operation characteristics of the polymer injection molding process, it is feasible to use a feedforward compensation strategy to eliminate the effects of intermittent disturbances. For example, Yao et al. proposed a barrel temperature control method using the combination of a generalized predictive control with an iterative learning feedforward control [14]. The experimental results show that using a feedforward compensation strategy can effectively prevent large temperature variations caused by intermittent disturbances. There are two key points that must be confirmed when using a feedforward compensation strategy for the barrel temperature control: (1) the compensation time and (2) the compensation quantity. However, it is not an easy task to get the accurate compensation time and compensation quantity for the barrel temperature control. They are affected by many complex factors, such as raw materials, environment, injection molds, and process parameters. Therefore, it is difficult to obtain the accurate compensation time and compensation quantity through model-based methods.

According to the previous analysis, it is crucial to obtain the accurate compensation time and compensation quantity when using a feedforward compensation control strategy for the injection molding temperature control. Considering the repeated behavior of the system, iterative learning control (ILC) is recognized as an effective control method for the systems with repeated characteristics [15, 16]. ILC is a kind of data-based learning control method [17]. ILC generates feedforward control signals that apply to the next batch by using information from previous batches that have been stored. As a data-based control method, ILC does not require an accurate system model, so it can be easily adapted to a variety of complex conditions. Thus, it is feasible to use ILC to obtain the accurate compensation time. However, as a feedforward control strategy, ILC cannot achieve accurate compensation quantity control. As a typical batch production process, it is feasible to adopt a reinforcement learning (RL) algorithm to the control system [18].

In this paper, a new intelligent barrel temperature control method, which fully considers the dynamic characteristics of the injection molding process, is proposed. The proposed control method adopts a feedforward compensation strategy to improve the stability of temperature control under dynamic conditions. The proposed method first learns the optimal compensation time in each batch using a feedforward ILC algorithm. Subsequently, a deep Q-learning algorithm combining Q-learning with ANN is adopted to learn the optimal compensation quantity for each compensation period. With this innovation control strategy, the stability of temperature control can be effectively improved, and the product consistency can be significantly improved.

The rest of this paper is organized as follows. In Section 2, the system description is presented. Section 3 develops the feedforward compensation temperature control strategy. In Section 4, temperature control experiments with different conditions are performed. In Section 5, the product consistency experiments are given to verify the effectiveness of the developed control strategy in improving product consistency. Finally, conclusions are drawn in Section 6.

#### 2. System Description

The typical structure diagram of an injection molding machine injection unit is shown in Figure 1. It mainly consists of a screw, a barrel, a nozzle, and a heating system. In general, the heating system has multiple heating zones. For each heating zone, there is a heater and a temperature sensor. In most cases, the temperature set for each heating zone varies according to the requirements of the process. In order to save energy, the heating system is covered with a layer of thermal insulation materials. The system has no cooler system. Therefore, the cooling rate of the system is very slow.

During the injection molding process, the raw materials enter the barrel from the hopper. The raw materials are heated to molten state in the barrel. Then the screw rotates and retreats, thereby mixing the melt evenly and squeezing the melt to the front of the barrel. Finally, the screw moves forward to inject the melt into the mold. During this process, the temperature is affected by multiple factors:(1)The heating coil around the outside of the barrel is the dominant factor under static conditions(2)The shear heat, which is generated between the screw and the melt, is the other major factor affecting the temperature(3)The thermal coupling is caused by the different temperature conditions of different heating zones and the excellent thermal conductivity of metal(4)There are also the process parameters setting (screw rotation speed, back pressure, etc.), materials, molds, surrounding environment, and so on

Therefore, for such a complicated system, it is a great challenge to achieve precise temperature control. Previous studies have shown that it is impossible to solve this problem using a modeling approach [12].

#### 3. Methodology

##### 3.1. Feedforward ILC for Compensation Time Control

ILC is a data-based control method; compared to traditional model-based control methods, it does not require an accurate system model and only needs a small amount of historical data from previous batches. The objective of ILC is to generate a feedforward control signal by utilizing information from previous batches and reduce tracking error by iteration. In this paper, a feedforward ILC-based controller is designed to obtain the temperature compensation time.

The schematic representation of the ILC-based compensation time control algorithm is shown in Figure 2. represents the number of iteration, represents the sample time, and is the total number of samples in each batch. In batch , denotes the system output at step , denotes the input at step , denotes the control target at step* t*, denotes the temperature error at step , and denotes the time error. The control algorithm included mainly consists of three parts: (1) a feedforward controller, (2) a learning function, and (3) temperature error conversion. The feedforward controller is used to handle the relationship between the previous batch input and the next batch input. The learning function is used to calculate input changes for the next batch. The temperature error is converted into compensation time error according to the temperature error conversion.

The purpose of the compensation time control algorithm is to obtain the optimal compensation time. In the practical control process, the temperature sensor returns the temperature of each sampling period. The data of each cycle in each batch are saved in the memory. At the end of a batch, the temperature error curve of the whole batch is obtained by comparing with the reference value. The time error is obtained by analyzing the temperature error curve. The system input is given bywhere is a filter, is the control input of the previous batch, and is the change in control input of the next batch which is obtained by the learning function. It can be calculated as follows: and represent the weighting matrices, and is a Toeplitz matrix. In the learning process, a time constant, , is selected as the maximum permissible error. The learning process ends until the time error .

##### 3.2. Reinforcement Learning for Compensation Quantity Control

In the problem of RL, an agent is able to improve its performance from its own experience by interacting with the environment [19]. A typical agent mainly consists of the following three parts: state , action , and reward . The target of the agent is to learn the optimal action policy to maximize the total rewards it will receive in the future. In a time step , the agent selects an action based on the current state ; the system performs the selected action and returns a reward and then goes to the next state . Meanwhile, the state-action values are recorded. In this paper Q-learning is adopted to learn the optimal temperature compensation quantity. Q-learning is a form of model-free off-policy RL algorithm [20]. As a kind of widely used RL algorithm, Q-learning finds the optimal policy by learning the optimal action value function in each state.

The action value function is updated as follows:where is the discount factor, is the learning rate, and is the action space.

The schematic representation of the RL-based temperature compensation quantity control method is shown in Figure 3. The sampling temperature , temperature error , and control input constitute the state space ; it can be expressed byThe action space is expressed as is the minimum control voltage value, and . Three actions are available: increase input (), decrease input (), and 0 means no modification is made; the system performs the previous temperature control signal.

In order to prevent the algorithm from falling into the local optimal solution, the algorithm is adopted to choose the action. The algorithm chooses an action in a state with a probability . It can be described as is a small positive constant. During the control process, the algorithm will choose a greedy action with a probability or randomly select an action with a probability .

The system will perform the selected action and return a reward and then goes to the next state . The agent will receive a positive reward (+1) if the temperature error is less than the maximum permissible error. If the temperature error is greater than the maximum permissible error, but less than the previous temperature error , the agent will receive a small positive reward (+0.1). In other cases, the agent will receive a reward (-1). In summary, the reward function can be defined as follows:

Conventionally, the Q-learning method requires a lookup table to store the value of each state-action pair. However, it is impossible to build a lookup table for actual control problems. The state space of the temperature control process is very large. It may lead to a series of problems when using a lookup table. For example, it may cause high computational cost, or the algorithm cannot converge. In order to deal with the problem of a large number of state-action pairs in the temperature control, in this paper, neural networks are applied to approximate the optimal action value function. In each step , the action value can be expressed as follows:where is the number of hidden nodes of the neural network, is the number of neural network inputs, and represent the input and output of the th hidden nodes, and are the connection weights, and is the th element of the state .

According to (5), the weight matrix of the ANN is updated as follows: represents the connection weight matrix or , and is a vector of the gradient, and it is calculated as follows:

Using neural networks to approximate the action value functions will solve some existing problems. However, some new problems will arise when RL is combined with neural networks. Firstly, there are the correlations between the state-action pairs [21]. In an actual control problem, the current state is usually very similar to the adjacent sampling period, so the correlation between the samples is very strong. Secondly, the target value is affected by network weight. The update of network weight in the current step will affect the target value of the other steps. Therefore, if network weights are updated in every step, this will lead to a very long time to converge or even to diverge.

In this paper, a mechanism called “experience replay” is adopted to solve the problems caused by using neural networks [22]. During the learning process, the state-action pair of every sampling period is first stored. Then you can randomly select data from the memory to update the network, thereby eliminating correlations between the state-action pairs. Furthermore, different from the traditional online neural network update strategy, the connection weights of the neural network are updated offline. The connection weights will be updated after the end of a batch. With this offline update strategy, the impact of weight update in the traditional online update strategy can be effectively eliminated. These innovations not only effectively reduce the computation time, but also make the proposed method more feasible.

#### 4. Temperature Control Experiments

##### 4.1. Experiment Setups

An experimental platform is designed to validate the proposed method, as shown in Figure 4. In order to meet the experimental needs, a modified injection molding machine is used for experiments. A new set of injection molding machine control system, as shown in Figure 4, is established on the original injection molding machine. The parameters of the injection molding machine related to the experiments are listed in Table 1. There are five heating zones including one-nozzle zone; the remaining four zones are numbered 1-4 from the hopper to nozzle. The materials used for the experiments are polypropylene (PP) and polystyrene (PS). The thermophysical properties of PP and PS are shown in Table 2. The parameters of the Q-learning algorithm are listed in Table 3.

##### 4.2. Single Zone Temperature Control with Open Loop Control

To show the large time delays and strong coupling of the heating system, an open loop control strategy is applied to zone 3. The target temperature is set at 180°C, and all zones are heated from room temperature. In addition to zone 3, the other zones have no input. The temperature results of zone 3, zone 2, and zone 4 are shown in Figure 5(a). The control voltage is shown in Figure 5(b); 10 V means heating with the maximum power of the coil. The temperature of zone 3 reaches the target temperature after 1023 seconds; then stop heating. It can be seen that, after stopping heating, the temperature of zone 3 continues to rise until the maximum temperature (184.2°C) is reached at 1085 seconds. After that, the temperature of zone 3 gradually drops. During this process, the temperature of zone 2 and zone 4 will also rise owing to heat conduction. It can be observed that even if the temperature of zone 3 begins to decrease, the temperature of zone 2 and zone 4 will still rise as long as there are temperature differences. Therefore, the heating system is a large time delay and strong coupling system.

##### 4.3. Temperature Control with Dynamic Conditions

As mentioned before, during the injection molding process, the heating system is in a dynamic condition. In order to validate the proposed temperature control strategy under dynamic conditions, several sets of comparative experiments are conducted. The proposed temperature control strategy is compared with the generalized predictive control (GPC) method and the PID method. The PID algorithm is the most widely used control algorithm in the field of temperature control; the discrete PID model can be expressed as follows:where

represents the sample time, represents the change in control input, , , and are errors in different sampling periods, is the control cycle, is the proportional gain, is the integral time, and is the derivative time. The GPC algorithm has been widely used in the field of temperature control in recent years and has achieved excellent performance. It can be expressed as follows:where represents the sample time, represents the control input, is the weighting factor, and is the step size of control time domain.

With material PS, a set of experiments are carried out. All experiments start from the static conditions. The temperature results of zone 3 are shown in Figure 6(a). The set temperature is 220°C. It can be observed that all the three control methods can obtain excellent control performance under static conditions. The temperature variations are very small at first. Then the system switches to dynamic conditions. Firstly, the screw moves forward to inject the melt. During this process, the raw material will flow into the barrel. Usually, the raw material is at the room temperature. Therefore, it will cause a temperature drop. As shown in Figure 6(a), there is a large temperature drop when using the GPC and PID control method. Particularly with the PID method, the temperature dropped by about 2.5°C. However, when using the proposed control method, the temperature has only dropped by about 0.4°C. At the same time, as shown in Figure 6(b), the control voltage will increase. But due to the time delay, the temperature will not rise immediately. After the injection is completed, the system enters the plastication process. The screw will rotate and retreat, and the screw speed is shown in Figure 6(c). During this process, the temperature will rise caused by the shear heat and the increased input. After the plastication process is finished, the system returns to static conditions again. Due to the influence of thermal inertia, the temperature will continue to rise for a while. Then, the temperature will slowly drop. Finally, the temperature will stabilize at the set value. Throughout this process, temperature overshoot will occur. It can be observed that the maximum temperature overshoot is only 0.5°C, when using the proposed method. However, when using the PID method, the maximum temperature overshoot is about 4.5°C. Large temperature overshoots will affect the quality of products or even lead to material decomposition. The GPC method can effectively reduce temperature overshoot. The maximum temperature overshoot is about 1.3°C under the GPC method.

##### 4.4. Temperature Control with Different Materials

The thermophysical properties of different polymer materials vary greatly. The change of material will significantly affect the temperature control performance. In order to check the performance of the proposed method after changing the material, another experiment is conducted. In this experiment, the polymer material PS is replaced by PP. The screw periodically performs the action to simulate the actual production process. The other conditions remain the same.

The results are shown in Figure 7. It is observed that the temperature variations are very large at the beginning; the maximum temperature variation is larger than 2.5°C. This is mainly caused by the difference in heat capacity. As shown in Table 2, the heat capacity of PP is much larger than PS. After changing materials, the feedforward compensation strategy has not been updated. At this time, the learning process will be executed to learning the new compensation strategy. After about 1500 seconds, the maximum temperature variation is reduced to 0.5°C. This indicates that the system has already learned the new compensation strategy for PP. After that, the learning process will temporarily stop, and the new compensation strategy will be stored. When the maximum temperature variation is larger than the set value, the learning process will be reactivated to obtain the new compensation strategy.

#### 5. Products Consistency Experiments

As mentioned before, the barrel temperature control is critical to product consistency. And the purpose of this paper is to improve the consistency of injection molding products by improving the stability of the temperature. Thus, in this section, products consistency experiments are performed to further verify the validity of the proposed method by comparing with the GPC method and the PID method. In this paper, the product consistency is reflected by the product weight. There are multiple definitions of the quality of injection molding products such as weight, warpage, surface properties, mechanical properties, optical properties, and crystallization [23]. And warpage is an important factor reflecting product quality. Temperature unevenness is an important factor leading to warpage. But many other factors can also cause warpage, such as cooling time, ejection mechanism, and injection pressure. The purpose of this paper is to improve the stability of the process through temperature compensation control to improve product consistency. Product weight can reflect process stability and product consistency very well. For example, in the work reported by Zhou et al. [24], they claimed that the product weight has a close relationship to other quality properties and the product weight can be a good indicator of process stability. On the other hand, the weight of the product can be very convenient to measure and quantify. Therefore, it is feasible to use the weight of the product to reflect the consistency of the product.

##### 5.1. Experiment Setups

The experiments are conducted on the aforementioned experiment platform. A strip product is used for the experiments. The product is produced using a double cavity mold. The CAD model of the molded product is shown in Figure 8(a). The picture of the actual molded product is shown in Figure 8(b). The polymer material used in the experiments is PP. The settings of corresponding process parameters are listed in Table 4. In addition to these parameters that directly affect the melt temperature in the barrel, injection velocity is also an important factor that affects the temperature of the melt entering the cavity. And in order to eliminate the effects of different injection velocities, the injection velocities are set to the same for all experiments. The cooled products are weighted by a METTLER TOLEDO analytical balance LE204E with a minimum resolution 0.0001 g. Furthermore, in order to reduce the impact of machine fluctuations on product weight, a total of 80 cycles producing will be conducted in each set of experiments. The first 30 cycles will be abandoned; only the remaining 50 cycles will be used for measurement.

**(a)**

**(b)**

##### 5.2. Product Consistency under Different Temperature Control Methods

The results of the product weight are shown in Figure 9. It can be observed that, under the proposed temperature control method, both products have very small weight variations. The maximum weight variation is less than 0.4 g. This indicates that excellent product consistency can be achieved when using the proposed control method. However, when using the PID method, the product weight consistency deteriorated significantly. The maximum weight variation is about 1.2 g. Further, the difference in weight between product one and product two is sometimes very large even in the same batch. This indicates that there is a filling imbalance due to temperature variation. The weight consistency with the GPC method is better than the PID control method, but the product weight variations are still relatively large when compared with the proposed method.

**(a)**

**(b)**

Additionally, several statistical methods are used to analyze and compare the data. As shown in Table 5, the mean absolute deviation is used to describe the degree of data dispersion. The weight repeatability is calculated as follows:where denotes the sample number, is the total number of samples, is the average weight, and is the weight of sample. The smaller the value of , the better the weight repeatability. The statistical analysis of results shows that, under the proposed control method, the mean absolute deviation and the repeatability of product weight are much better than the PID and GPC method. On the other hand, the difference between different cavities is also very small, which implies that excellent consistency can be achieved in different batches and different cavities.

##### 5.3. Product Consistency with Different Process Parameters

In addition to the effect of barrel temperature, the melt temperature in the barrel is also affected by the following process parameters: screw rotation speed, back pressure, dwell time, and injection stroke. The effect of these process parameters on melt temperature consistency is shown in Table 6. It can be seen that, reducing the screw rotation speed, the heat generated by shearing will reduce, and the melt temperature difference due to the screw rotation speed will also reduce. Therefore, reducing the screw rotation speed can increase the consistency of the melt temperature to a certain extent. On the other hand, as the back pressure increases during the plastication process, the speed at which the screw retreats decreases. Thus, the melt stays longer in the barrel and the material in the barrel is denser. Therefore, increasing the back pressure can increase the consistency of the melt temperature and melt temperature. Increasing the dwell time allows the melt in the barrel to be warmed up for a longer period of time, and the heat conduction is more fully carried out, thereby improving the melt temperature consistency. However, it will increase the cycle time, which is unacceptable for a batch production process. In the actual production process, in order to increase production efficiency, the dwell time is usually set as small as possible. Therefore, in this paper, the dwell time is not selected. Decreasing the injection stroke can improve the melt temperature consistency. But the injection stroke is limited by the mold and cannot be adjusted at will during the production process. In summary, due to process conditions or production efficiency limitations, the screw rotation speed and back pressure are selected to justify the effectiveness of the proposed method.

In this set of experiments, the back pressure and screw rotation speed are selected as the variable. The other process parameters remain unchanged during the experiment. The experimental designs are listed in Table 7.

As shown in Figure 10(a), the back pressure changes from 2 MPa to 3 MPa after 20 cycles. Due to the change in back pressure, the shear heat generated during operation will change, and the temperature will also change. Therefore, the product consistency will be affected, resulting in a sudden change in product weight. Subsequently, as the temperature returns to stability, the product weight gradually returns to the original range. When using the proposed method, the product weight can return to the original range in about 5 cycles. However, when using the PID method and GPC method, it will take more than 10 cycles until the product weight returns to stability. Analogously, as shown in Figure 10(b), when the screw rotation speed changes from 90 rpm to 60 rpm, there is also a sudden change in the product weight. Different from back pressure changes, reducing the screw rotation speed will lower the temperature, leading to an increase in the product weight. With the proposed method, the product weight can quickly return to the original range due to the fast recovery of temperature. This indicates that the proposed method can effectively eliminate temperature changes due to process parameters change, thus improving product consistency.

**(a)**

**(b)**

The statistical analysis of results with process parameters change is listed in Table 8. It can be observed that as the process parameters change, the mean absolute deviation and repeatability will deteriorate. However, compared with the PID and GPC method, the proposed method can still achieve excellent results. On the other hand, it can be seen that, after reducing the screw rotation speed, the product consistency will improve. This implies that reducing the screw speed will make the temperature distribution more uniform and leads to an increase in product consistency.

#### 6. Conclusions

In this paper, a new intelligent temperature compensation control strategy, which fully considers the dynamic characteristics of the injection molding process, is proposed. Based on the presented experimental results, the conclusions can be drawn as follows:(1)The proposed barrel temperature control method significantly reduced the temperature variations under dynamic conditions. The experimental results indicated that the maximum temperature variation is about ±0.5°C under the proposed method, compared with GPC (±2.5°C) and PID (±4.5°C)(2)The proposed method can effectively improve the consistency of injection molding products in different batches and different cavities. Statistical analysis of results showed that the repeatability is less than 0.3% when using the proposed method, which is much better than the PID method (about 1%) and GPC method (about 0.7%)(3)The proposed strategy is a kind of data-based learning control method. It does not require an accurate system model, and the system can improve its performance by learning from its historical data. Therefore, it can be easily applied to different machines, materials, and process conditions

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant Numbers 51635006, 51675199) and the Fundamental Research Funds for the Central Universities (Grant Numbers 2016YXZD059, 2015ZDTD028).