Research Article  Open Access
Xiao Wang, Deyi Xu, Na Qu, Tianqi Liu, Fang Qu, Guowei Zhang, "Predictive Maintenance and Sensitivity Analysis for Equipment with Multiple Quality States", Mathematical Problems in Engineering, vol. 2021, Article ID 4914372, 10 pages, 2021. https://doi.org/10.1155/2021/4914372
Predictive Maintenance and Sensitivity Analysis for Equipment with Multiple Quality States
Abstract
This paper discusses the predictive maintenance (PM) problem of a single equipment system. It is assumed that the equipment has deteriorating quality states as it operates, resulting in multiple yield levels represented as system observation states. We cast the equipment deterioration as discretestate and continuoustime semiMarkov decision process (SMDP) model and solve the SMDP problem in reinforcement learning (RL) framework using the strategybased method. In doing so, the goal is to maximize the system average reward rate (SARR) and generate the optimal maintenance strategy for given observation states. Further, the PM time is capable of being produced by a simulation method. In order to prove the advantage of our proposed method, we introduce the standard sequential preventive maintenance algorithm with unequal time interval. Our proposed method is compared with the sequential preventive maintenance algorithm in a test objective of SARR, and the results tell us that our proposed method can outperform the sequential preventive maintenance algorithm. In the end, the sensitivity analysis of some parameters on the PM time is given.
1. Introduction
In real production system, equipment deterioration is almost universal with use, age, and other causes. If the maintenance is not performed, eventually the failure or severe malfunction can occur. Operating the equipment in a deteriorating state often brings about higher production cost and lower product quality. Therefore, an effective maintenance policy is very essential in industrial practice. The periodic or agebased preventive maintenance strategy often leads to inadequate maintenance or over maintenance, in which over maintenance will cause unnecessary interference to production, resulting in the decreased production efficiency and increased production cost. The aim of conditionbased maintenance is to see if the maintenance decision should be performed according to the current system state [1]. Nevertheless, the more valuable issue is to determine the future maintenance time in the current system state, which is called PM in this paper.
There are few theoretical and practical researches on PM in a strict sense compared with conditionbased maintenance [2]. In some literature, conditionbased maintenance has been classified as PM, but the true “predictive” aspect of conditionbased maintenance decisions, such as anticipating and predicting the future state of the equipment, has not been reflected. There are few true PM methods that can conduct scheduled optimal future maintenance time by considering the deteriorating equipment condition. Existing methods of classifying equipment states are mainly divided into two types, operational state or failure state, and the goal of PM is only to predict the residual life [3–8]. For example, Sikorska et al. review a large number of pieces of literature related to prediction models, which are mainly utilized to predict the residual equipment life [9]. Jan et al. can evaluate the current states and predict the residual life for industrial equipment by a hidden semiMarkov model [10]. Schwendemann et al. present the prediction of residual life for bearings in grinding equipment under the premise of taking a more global view of the optimization problem involved such as the costs and time [11].
Moreover, in real industrial systems, such as semiconductor production and precision instruments, the deteriorating equipment states are closely related to the quality levels of the products [2]. Based on the extensive industrial practice, General Motors researchers have pointed out the important potential of the correlation between operation management including maintenance decisions and product quality to improve the performance of manufacturing systems [12]. Before the equipment breaks down, the fact is that when the equipment is in a deteriorating quality state, it can still operate, but the probability of producing unqualified products is increased [13]. For a long time, the issue of maintenance and quality is considered to be two relatively independent research fields, and the scholars and industrial people have done a lot of research work in these two fields. But the research on the correlation between equipment maintenance and product quality is still a brandnew field. In existing literature and industrial practices, it is usually assumed that the product quality problems are the Bernoulli and persistent quality problems [13, 14], while the multiple yield quality problems have more realistic and general significance, so it is more worthy of much deeper research. The multiple yield quality problems refer to the fact that the product quality problems occur independently but with a stage probability level. The reason for the stage probability level is that the equipment states gradually deteriorate and have multiple quality states. For multiple yield quality problems, there needs a balance between production and maintenance, and there is no simple and direct maintenance decision. In addition, related researches on equipment maintenance often assume that the production time and the maintenance time are unit time, and strong assumptions are also made about the equipment deterioration mode [15]. The maintenance decisions based on the above assumptions are lack of realistic basis.
Therefore, we claim that it is of great significance to make maintenance decisions by taking quality inspection data into account, which is able to keep the costs down and meet the needs of industrial production management. There are relatively few studies on the factors of production, maintenance, and quality, and no effective methods have been found to find a solution in the existing literature. We attempt to solve the equipment maintenance problem in production practice. Since the deteriorating equipment states cannot be directly observed, a large amount of realtime quality inspection information can be used as the implicit information. A discretestate continuoustime SMDP with a large number of yield stages is induced to describe the equipment deterioration process. However, it is worth noting that the production and maintenance time are random variables that follow general distributions based on realistic considerations. A strategy iterationbased RL method is put forward to guarantee the optimal strategy solution to the model. Furthermore, the future maintenance time corresponding to each observed state can be produced by a simulation method based on the fixed maintenance strategy, and the influences of the main technical parameters on the optimization goal of the system are analyzed. And finally, the advantages of our proposed RL method for solving such a dynamic environment problem are revealed compared with the sequential preventive maintenance algorithm with unequal time interval.
2. Problem Description
This paper investigates deteriorating equipment that has multiple discrete states. Assume that the equipment condition can be directly reflected by the condition monitoring measures such as the yield levels. A single type of product is produced, and each processed product is immediately inspected to identify an unqualified product or a qualified product. The inspection time and inspection cost are assumed to be zero. Due to the fault of the inspection equipment or the proficiency of the inspection workers and other reasons, there are certain inspection errors in product quality inspection. The inspection errors are mainly divided into two types [16]:(i)Type I error: that is the false detection with a probability e_{1} and the cost C_{e1}. The parameter C_{e1} includes the production cost per unit product and other related costs.(ii)Type II error: that is the missed detection with a probability e_{2} and the cost C_{e2}. The parameter C_{e2} includes the production cost per unit product and other possible costs such as the costs arising from quality and safety issues which is far beyond production costs.(iii)In addition, through the accurate inspection, the profit of producing a qualified product is and the cost of producing an unqualified product is R_{d}.
3. System Model
The sequential decisionmaking problem under uncertain conditions can be solved by analyzing the Markov process. A large number of researches related to this issue can be found in stochastic dynamic programming and other related literature [17–22]. However, in many of these studies, the Markov chains cannot define the characteristic of basic probability structure such as a general probability distribution of the sojourn times in each quality state. Then the problems are often described as SMDP because the SMDP represents a more realistic situation, and it is more suitable to model the deteriorating process of the equipment.
We employ a discretestate continuoustime SMDP model to present the deteriorating process of the single equipment system, as shown in Figure 1. Since the yield level y_{kl} cannot be obtained directly, the inspection information s = (k, p, b) is used as the observed system state, in which k is the number of subcycles in each productionmaintenance cycle, b is defined as the number of unqualified products, and p is defined as the number of products produced from when the equipment is last maintained or repaired. The action space is denoted as A(s) = {0, 1, 2}, where a = 0 represents to keep the equipment operating and produce new products; a = 1 means to stop the operation of the equipment and perform an imperfect (minor) maintenance action (corresponding to the MM in Figure 1); a = 2 represents the major repair action to be performed in the event of a failure or random failure of the equipment (corresponding to the MR in Figure 1). In the deteriorating process of the equipment, the decision point of the maintenance action is the time point for production and inspection of new products. By means of performing MM action, the yield level of the equipment can be restored to a certain intermediate state (e.g., y_{21}), after which the k + 1′th subcycle is initiated. The subcycle continues until a certain yield level limit appears or a stochastic malfunction occurs. At this point, the major repair is forced to be triggered to restore the yield level of the equipment to the best state (e.g., y_{11}), and then another updating subcycle is initiated.
In general, the equipment in a production system deteriorates as its condition is getting worse, which will lead to the result of the shorter sojourn time in each quality state. Therefore, this paper assumes that the sojourn time λ_{kl} under each yield level y_{kl} follows a gamma distribution Г (α_{kl}, β), and the of l can decrease λ_{kl}. That is, α_{k,l+1} = b_{s}α_{kl} (0 < b_{s} < 1). Meanwhile, it is assumed that the stochastic malfunction time interval also follows a gamma distribution under the k′th subcycle, where the shape parameter in the gamma distribution . Moreover, the random failure time interval also decreases gradually, which is presented by the following equation:
4. Policy IterationBased PM Method
The modelfree RL is divided into two algorithms, including value iterationbased algorithm and strategy iterationbased algorithm, respectively. Nevertheless, if it is used to solve SMDP problems, the value iterationbased RL algorithm is not suitable, mainly because this algorithm cannot guarantee that the average reward SMDP problems produce the optimal solution [23]. On the contrary, the strategy iterationbased RL algorithm can obtain accurate and satisfactory results. Therefore, this paper adopts the average reward strategy iterationbased RL method for finding a solution to our problem. The optimal maintenance strategy under the premise of maximizing SARR is given.
4.1. QP Learning Algorithm
The RL technology approaches the optimal strategy in the SMDP model through strategy iteration and learns the mapping from environment state to behavior through trial and error, so as to maximize the cumulative SARR from the environment [23]; namely,
The QP learning algorithm can accurately solve the SMDP problems based on average cumulative rewards. In each decision cycle, the current state s is transferred to state s′ under the decision a, and the updating expression is as follows [23]:where r (s, a, s') is the total immediate reward with the action a_{j} (j = 1, 2) when the state s is transferred to state s′; t (s, a, s') is the interval time with the action a_{j} (j = 1, 2) when the state s is transferred to state s′; ρ is the reward rate, which can be obtained by the following equation [24]:
α is defined as the learning rate, and the decreased rules are as follows [23]:where n_{max} is a large positive integer; α_{0} is the initial value of α; and α_{0} = 0.1. It should be noted that the value of α_{0} will have a certain influence on the final convergence of the RL algorithm, which can be referred to [25] for details. The parameter is the visitfactor representing visit times. In addition, the immediate rewards r (s, a, s') caused by state transitions are as follows:(i)r (s, a, s') = is defined as the profit of qualified product produced(ii)r (s, a, s') = −C_{e1} is defined as the loss of Type I error(iii)r (s, a, s') = −R_{d} is defined as the production cost per unit(iv)r (s, a, s') = −C_{e2} is defined as the loss of Type II error(v)r (s, a, s') = −C_{R} is defined as the loss of major repair(vi)r (s, a, s') = −C_{M} is defined as the loss of minor maintenance
The current strategy of QP learning algorithm is , and the value Q is updated with the value P. The processes of strategy evaluation and strategy improvement are executed repeatedly, and finally, the optimal maintenance strategy is obtained, which mainly includes three essential steps: exploration, strategy evaluation, and strategy improvement. The detailed process is depicted in Figure 2. Step 1: Initialization(i)Initialize the maintenance strategy P (s, a), a random value; initialize the maximum updating times of the strategy improving E_{max} and the maximum updating times of the strategy evaluation N_{max}; initialize the learning rate parameters and and the exploration rate parameters and ; set the increase times of the outer loop policy E = 1.(ii)According to the known maintenance strategy P (s, a), calculate the average reward rate ρ; initialize the stateaction value of the strategy evaluation process Q (s, a) = 0; set the current strategy updating number N = 1 and the visit times and . Step 2: Strategy Evaluation(i)Initialize the current state s = (1, 0, 0), the average failure interval T_{f}, and the cumulative state transition time T_{c}.(ii)Choose the greedy action a basing on the probability 1−p_{n}; otherwise, the random action a is selected based on the probability p_{n}.(iii)Simulate the decision action a in state s; the observation state is transformed to state s'. If a = 0, a new observation state is obtained, and the transition time t (s, a, s') and the reward r (s, a, s') between the state s and state s' are directly produced. The action value Q is updated by using equation (3): Update state and . If , jump to Step 2 (iv); otherwise, jump to Step 2 (v); if a = 1, the imperfect maintenance is performed, and the new observation state and immediate reward are obtained. Then the action value Q is updated, k = k + 1, and the program jumps to the Step 2 (v).(iv)When the major repair is performed, the corresponding immediate reward and state transition time are obtained, and the action value Q is updated. If N > N_{max}, jump to Step 3 (i); otherwise, jump to Step 2 (ii).(v)Update the visit factors and ; update the learning rate and the exploration rate p_{n}, and then jump to Step 2 (ii). Step 3: Strategy Improvement(i)Let P = Q and E = E + 1; if E = E_{max}, stop the learning process; otherwise jump to ib and continue learning.(ii)According to the action value P, calculate the optimal strategy π^{∗} by using the following equation:
4.2. Optimal PM Time
In Section 4.1, the optimal maintenance strategy π^{∗} of the deteriorating equipment can be obtained by the proposed method. In this section, the optimal maintenance strategy π^{∗} and the equipment deteriorating process model are used to estimate the future maintenance time corresponding to different observation states s_{i}. Firstly, onedimensional vector V_{d} of unqualified product state and onedimensional vector V_{t} of production time are defined. These two vectors record the accumulative quantity of unqualified product b and the production time t per unit product respectively. During the process from production to maintenance, the failure interval is the sum of the sojourn times in different quality states of the same deterioration mode. The initial action is a = 0, and the new observation state can be produced after the equipment goes through production and quality inspection. Based on the maintenance policy π^{∗}, the actions of a new state can be obtained until the equipment performs maintenance action. The vector V_{d} records the state from production to the maintenance process. The vector V_{t} can directly calculate the maintenance point in time of different states s, which is used as an effective PM time. In the simulation process, the state transfer process is random, the same state can be recorded for many times, and the average value is taken as the PM time for the observed state.
The detailed process for obtaining the PM time is shown in Figure 3. First, the parameters related to PM time are initialized, and then the production process of the equipment is simulated according to the known maintenance strategy π^{∗}. The quality state and the production process are random in the simulation process. The maintenance policy is applied to the model of Figure 3, the PM time corresponding to the observation state s_{i} is produced, and the mean value is formulated as the final estimate.
5. Simulation Study
The maintenance action is imperfect; that is, after maintenance, the quality state of the equipment will be improved, and the yield level also will be improved, but the equipment will not be restored to a new state. So, to what extent will the equipment be restored after the maintenance? This section mainly explains this process through the change of yield level before maintenance and after maintenance. Referring to the ideas of Zhu et al. [26], for two continuous deteriorating subcycles, the yield function relationship is as follows:
t represents the time since the equipment is last maintained or repaired; b_{k} is a degradation factor of equation (8), which is a value between 0 and 1; a_{k} is defined as an age degradation factor, which is a value between 0 and 1; D_{k} represents the time interval of the k'th subcycle. The discrete yield levels can be determined by equation (9), where L is the number of prespecified yield levels in each subcycle k.
5.1. Numerical Experiments
According to the problem description and the modeling description of the deteriorating equipment in this paper, the relevant parameters are assumed and given in Table 1. Other relevant parameters are explained as follows: the maximum updating times of the strategy improving E_{max} = 15; the maximum updating times of the strategy evaluation N_{max} = 10000; the visit factor is the visit times for a certain state, which is a changing value. The yield level is a discretization for the equipment states, and from the fuzzy point of view, it can be divided into four levels: excellent, good, medium, and poor. Each state corresponds to a certain interval time between failures. If the discretization level of the equipment is too high, the simulation state will jump frequently and cannot reflect the continuous production process under a certain condition. We assume that the critical yield level , and T_{c} ≥ T_{f} or is the condition for completion of a single strategy evaluation. Due to the randomness of quality inspection, the designed worst critical condition of the equipment is 0.6 in order to ensure the correct jump in the simulation process. Moreover, in the real simulation process, this condition only plays a role accidentally.

The method proposed in this paper is adopted for learning and the learning results are shown in Figure 4, which is compared with the sequential preventive maintenance algorithm [27]. As can be seen from the figure that the SARRs of the strategies learned by the methods are well convergent, and the proposed method is clearly much better than the sequential preventive maintenance algorithm according to the SARRs. This situation arises in part from the fact that the maintenance policy has not been coupled in the sequential preventive maintenance algorithm to maximize the total SARR.
5.2. Sensitivity Analysis of the Parameters
5.2.1. Impact of Decrease Factor of Sojourn Time
The sojourn time λ_{kl} for each state is related to the decrease factor of sojourn time b_{s}. The smaller b_{s} is, the change of λ_{kl} will be greater. Correspondingly, the PM time will also change. The PM time increases when b_{s} decreases, as shown in Figure 5. The reason is that the equipment will be maintained for a considerable period of time to produce qualified products with a high probability when b_{s} is smaller, and the expected SARR in the long run will increase. For example, when b_{s} decrease from 1 to 0.6, the expected SARR changes from 20.6 to 30.
5.2.2. Impact of Quality Detection Error
Figure 6 shows that the PM time for each observed state shows slight declines as the probability of Type II error e_{2} increases from 0 to 0.1. The reason is that the increase of e_{2} can result in a reduction of longrun expected SARR, and it decreases from 31.8 to 30.6. At the same time, the PM time is not sensitive to the change of Type II error e_{2}; this is due to the fact that the cost C_{e2} of Type II error C_{e2} = 100 is comparatively small. Similarly, the PM time shows slight declines when e_{1} continues to increase, because C_{e1} is comparatively small and the growth parameter e_{1} can result in a reduction of longrun expected SARR.
5.2.3. Impact of the Cost or Profit
(1) Impact of the Cost C_{f}. The parameter C_{f} refers to the cost of wrongly identifying a qualified product as an unqualified product. From Figure 7, we can see that the PM time decreases as the cost C_{f} increases; this is due to the fact that increase in C_{f} leads to a decrease in the longterm expected SARR. Meanwhile, Figure 7 shows that the PM time seems to be insensitive to the change of C_{f}, which is caused by the assumption of a very small false detection probability p_{f} in this paper.
(2) Impact of the Cost C_{n}. The parameter C_{n} is the cost of wrongly identifying an unqualified product as a qualified product. As shown in Figure 8, when the inspection cost C_{n} increases, the PM time decreases; this is because the expected SARR in the long run decreases as the cost C_{n} increases. Figure 8 also shows that the PM time is not sensitive to the change of C_{n}, which is caused by the assumption of a very small probability of missed detection p_{n} in this paper.
5.2.4. Impact of Initial Quality Deterioration Rate k_{y}
The coefficient describes the initial deterioration rate of the equipment, as shown in Figure 9. The PM time is not sensitive to the change of the coefficient; this is due to the fact that the change of the coefficient in a certain extent basically makes no difference to the SARR.
6. Conclusion
In this paper, we propose a PM method for single deteriorating equipment having multiple yield quality problems. It is assumed that the yield stage is coupled with the equipment quality state, and a stochastic breakdown can also occur besides the quality failure. Moreover, the equipment cannot return to normal operating condition without repair. We assume that there are two decision actions including MM and MR in each observation state. The preventive maintenance is MM, which can be performed in a deteriorating quality state, while the MR is forced to be implemented in a failure state. A discretestate continuoustime SMDP model is proposed to present the deterioration process of the equipment. The QP method in the RL framework is utilized to solve the SMDP model. Given the product quality inspection data with certain detection errors, the optimal maintenance strategy based on each observed state is produced by taking into account the goal of maximizing the longrun expected SARR. The PM time is capable of being achieved by a simulation method.
Through the simulation examples, it is proved that the proposed method adopted in this paper is capable of solving the PM problems of the equipment under dynamic environment. The experimental results also prove that the proposed method can outperform the standard sequential preventive maintenance method with unequal time interval. The change of maintenance action rules is further shown, which is not progressive with the increase of maintenance times and unqualified rate. It can also be observed that the PM time depends on the observed state, and it decreases as the total number of products produced increases and also decreases monotonically as the number of unqualified products increases for a given total number of products produced. Moreover, an increase in the number of maintenance times will also cause a decrease in the PM time. In addition, the influences of the main parameters on the optimization goal are also investigated.
Data Availability
The relevant data of calculation used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by the Natural Science Foundation of Liaoning Province under Grant 20180550746 and the National Science Foundation of China under Grant 61901283.
References
 Y. Zhou, B. Sun, W. Sun, and Z. Lei, “Tool wear condition monitoring based on a twolayer angle kernel extreme learning machine using sound sensor for milling process,” Journal of Intelligent Manufacturing, 2020. View at: Publisher Site  Google Scholar
 Y. Zhou, B. Sun, and W. Sun, “A tool condition monitoring method based on twolayer angle kernel extreme learning machine and binary differential evolution for milling,” Measurement, vol. 166, 2020. View at: Publisher Site  Google Scholar
 S. Lu, Y.C. Tu, and H. Lu, “Predictive conditionbased maintenance for continuously deteriorating systems,” Quality and Reliability Engineering International, vol. 23, no. 1, pp. 71–81, 2007. View at: Publisher Site  Google Scholar
 S.J. Wu, N. Gebraeel, M. A. Lawley, and Y. Yih, “A neural network integrated decision support system for conditionbased optimal predictive maintenance policy,” IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, vol. 37, no. 2, pp. 226–236, 2007. View at: Publisher Site  Google Scholar
 K. A. Kaiser and N. Z. Gebraeel, “Predictive maintenance management using sensorbased degradation models,” IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, vol. 39, no. 4, pp. 840–849, 2009. View at: Publisher Site  Google Scholar
 M. Y. You, F. Liu, W. Wang, and G. Meng, “Statistically planned and individually improved predictive maintenance management for continuously monitored degrading systems,” IEEE Transactions on Reliability, vol. 59, no. 4, pp. 744–753, 2012. View at: Publisher Site  Google Scholar
 X. Han, Z. Wang, M. Xie, Y. He, Y. Lia, and W. Wanga, “Remaining useful life prediction and predictive maintenance strategies for multistate manufacturing systems considering functional dependence,” Reliability Engineering and System Safety, vol. 210, 2021. View at: Publisher Site  Google Scholar
 T. P. Carvalho, F. A. A. M. N. Soares, R. Vita, R. D. P. Francisco, J. P. Basto, and S. G. S. Alcalá, “A systematic literature review of machine learning methods applied to predictive maintenance,” Computers & Industrial Engineering, vol. 137, 2019. View at: Publisher Site  Google Scholar
 J. Z. Sikorska, M. Hodkiewicz, and L. Ma, “Prognostic modelling options for remaining useful life estimation by industry,” Mechanical Systems and Signal Processing, vol. 25, no. 5, pp. 1803–1836, 2001. View at: Publisher Site  Google Scholar
 L. Jan, L. Dimiccoli, and H. Sahli, “Hidden semimarkov models for predictive maintenance,” Mathematical Problems in Engineering, vol. 2015, Article ID 278120, 23 pages, 2015. View at: Publisher Site  Google Scholar
 S. Schwendemann, Z. Amjad, and A. S. Hoc, “A survey of machinelearning techniques for condition monitoring and predictive maintenance of bearings in grinding machines,” Computers in Industry, vol. 125, 2021. View at: Publisher Site  Google Scholar
 R. R. Inman, D. E. Blumenfeld, N. Huang, and J. Li, “Designing production systems for quality: research opportunities from an automotive industry perspective,” International Journal of Production Research, vol. 41, no. 9, pp. 1953–1971, 2003. View at: Publisher Site  Google Scholar
 I. C. Schick, S. B. Gershwin, and J. Kim, “Quality/quantity modeling and analysis of production lines subject to uncertainty,” Report, Laboratory For Manufacturing And Productivity, Massachusetts Institute of Technology, Cambridge, MA, USA, 2005. View at: Google Scholar
 A. Farahani and T. Hamid, “Integrated optimization of quality and maintenance: a literature review,” Computers & Industrial Engineering, vol. 151, Article ID 106924, 2021. View at: Publisher Site  Google Scholar
 Z. Xu and D. Zhou, “Realtime prediction method research on reliability for a class of dynamic systems,” Control Engineering, vol. 15, no. 1, pp. 85–87, 2008. View at: Google Scholar
 M. L. Puterman, Markov Decision Processes, WileyInterscience, New York, NY, USA, 1994.
 S. BlochMercier, “A preventive maintenance policy with sequential checking procedure for a markov deteriorating system,” European Journal of Operational Research, vol. 142, no. 3, pp. 548–576, 2002. View at: Publisher Site  Google Scholar
 C. Chen, Y. Chen, and J. Yuan, “On a dynamic preventive maintenance policy for a system under inspection,” Reliability Engineering & System Safety, vol. 80, no. 1, pp. 41–47, 2003. View at: Publisher Site  Google Scholar
 J. H. Chiang and J. Yuan, “Optimal maintenance policy for a markovian system under periodic inspection,” Reliability Engineering & System Safety, vol. 71, no. 2, pp. 165–172, 2001. View at: Publisher Site  Google Scholar
 M. Ohnishi, T. Morioka, and T. Ibaraki, “Optimal minimalrepair and replacement problem of discretetime markovian deterioration system under incomplete state information,” Computers & Industrial Engineering, vol. 27, no. 1–4, pp. 409–412, 1994. View at: Publisher Site  Google Scholar
 H. Kawai, J. Koyanagi, and M. Ohnishi, “Optimal maintenance problems for markovian deteriorating systems,” Stochastic Models in Reliability and Maintenance, pp. 193–218, 2002. View at: Publisher Site  Google Scholar
 L. Gong and K. Tang, “Monitoring machine operations using online sensors,” European Journal of Operational Research, vol. 96, no. 3, pp. 479–492, 1997. View at: Publisher Site  Google Scholar
 A. Gosavi, SimulationBased Optimization: Parametric Optimization Techniques and Reinforcement Learning, Kluwer Academic Publishers, Norwell, MA, USA, 2003.
 T. K. Das, A. Gosavi, S. Mahadevan, and N. Marchalleck, “Solving semimarkov decision problems using average reward reinforcement learning,” Management Science, vol. 45, no. 4, pp. 560–574, 1999. View at: Publisher Site  Google Scholar
 W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, WileyInterscience, New York, NY, USA, 2007.
 H. Zhu, F. Liu, X. Shao, Q. Liu, and Y. Deng, “A costbased selective maintenance decisionmaking method for machining line,” Quality and Reliability Engineering International, vol. 27, no. 2, pp. 191–201, 2011. View at: Publisher Site  Google Scholar
 D. G. Nguyen and D. N. P. Murthy, “Optimal preventive maintenance policies for repairable systems,” Operations Research, vol. 29, no. 6, pp. 1181–1194, 1981. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2021 Xiao Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.