Abstract

Background. Coal washing is a complicated process and difficult to control, which has many controlling parameters with strong coupling relationship. It is still a challenge to realize the self-perception, self-adjustment, and self-evaluation of coal washing machine, improve the quality of coal washing, ensure production safety, and reduce labor cost. Methods. Through the intelligent transformation of jig, this paper proposes an intelligent washing method with cooperated deep reinforcement learning and evolutionary computation. First, it designs a fault warning method based on statistical analysis, helping to recover the normal running state of jig with manual maintenance. Then, it constructs a regulation strategy generation method with deep reinforcement learning supported by the fusion of artificial experience and historical data. Last, for the lack of monitoring data caused by poor communication quality and environment, the regulation strategy prediction method with evolutionary computation and surrogate model is proposed. Results. In practice, this method shows accurate fault warning accuracy and rapid cleaned coal ash adjustment response ability. Conclusions. This shows that the method proposed in this paper is of great significance for intelligent washing and can better cope with the special situation when the washing equipment sensing data are missing.

1. Introduction

Coal preparation plants use various gravity processes to process most of the raw coal, which are known for their low cost and high process efficiency. Jig coal washing is the process of physically separating different substances from raw coal to form coal products of various quality specifications [1]. Intelligent coal washing can reduce the impact of manual activities on the separation process, shorten the delay time of process control, and improve efficiency. With the construction of intelligent mines, especially the demand for safe and high-quality production, it has become particularly important to quickly identify washing faults and independently adjust the washing process [2]. In addition to the basic problems of severe aging of raw coal and low sorting rate, the process of washing equipment is also affected by many factors, such as frequency, air pressure, air valve adjustment, hydraulic cylinder, coal gangue valve opening, and float weight, all of which have an important impact on washing quality [3]. How to monitor the working status of fixtures in real time during the cleaning process, accurately perceive harsh working conditions, and effectively ensure production quality remains a challenge [4, 5]. The core of intelligent cleaning is to collect multidimensional operation data of jigs, quickly mine real-time status, and timely return accurate control strategies [6]. In previous studies, the coal separation process was regarded as a physical and mathematical modeling problem, and the control strategy was usually generated by referring to the established model [710]. With the ability to obtain massive sensor data and deeply mine available information, Internet of Things [11] and big data [12] techniques are potential to achieve intelligent coal washing. In terms of the jig running state, diagnosing the faults in time is the key to ensuring the safe operation of the system. Wang et al. [13] proposed a fault diagnosis method based on AlexNet convolutional neural network (CNN) from a data-driven perspective. When the washing equipment is running normally, monitoring the control parameters of the equipment and adjusting them in a timely manner can help effectively adjust the washing quality. Wang et al. [1] discovered the ability to intelligently control the density of heavy medium separation. The control process is implemented through the following program. Based on online scanning of raw coal ash content, the properties of excess coal are first calculated to predict and optimize process parameters including heavy medium separation density. The required density for optimizing the circulating medium will be transmitted to the control system. Wang et al. [14] developed an intelligent analysis system for raw coal float and sink test data, high-precision intelligent monitoring instruments and equipment for position, liquid level, and ash content, and highly reliable sensor equipment.

However, in the process of intelligent extraction and analysis of real-time data, the current methods mostly use statistical analysis, which has large errors and is difficult to integrate into the existing artificial experience. In addition, the rough environment of the coal mine washing site has seriously affected the stability of network communication, resulting in a large amount of data missing, which is difficult to meet the needs of real-time analysis and return the control strategy. In view of this, this paper embeds the existing artificial experience into the machine real-time control process by adopting deep reinforcement learning. When the collected data cannot meet the requirements of the analysis, we build a surrogate model with early accumulated historical data and use our proposed auto-differential evolution algorithm to quickly solve the potential operation scheme. At the time of automatically controlling the washing process of the jig, the detection of jig fault is also monitored to ensure the washing efficiency.

2. Proposed Methods

2.1. Hardware Infrastructure

The overall architecture of the hardware foundation relied on by the proposed method is shown in Figure 1, which includes three parts: intelligent sensing of state data, intelligent analysis of state data, and generation and transmission of regulatory strategies. First, intelligent perception of data requires installing different types of sensors in the critical control of the jig, realizing the real-time collection of cleaned coal ash content, wind pressure, water pressure, hydraulic value, medium coal and gangue gate opening, buoy counterweight, coal gangue bucket lifting amount, and buoy value. These data are collected by maintaining a synchronized or similar data acquisition frequency, which is convenient for postprocessing. Then, the collected data are gathered in the data server via the OPC protocol, the current jig operational fault is analyzed, and the on-site personnel are informed to deal with it in time. Last, regulatory models based on deep reinforcement learning and evolutionary computation are constructed to generate regulatory strategies for the current jig operation, and the generated regulation strategy is sent back to the control end via the OPC protocol, realizing the automatic operation of jig. It is worth noting that the regulation strategy generated by this method does not consider the parameter adjustment range of jig in the actual production, so it is necessary to set the strategy filter at the PLC end to fine-tune the received regulation strategy, ensuring the safe operation of the jig. The technical core of this framework is the intelligent analysis method of real-time data for the regulatory model. The regulatory model of deep reinforcement learning and evolutionary computation will be described in detail below.

2.2. Regulation Model

In industrial processes, the safety and reliability of mechanical systems determine the quality of products. Timely diagnosis of small faults is the key to ensuring the safe operation of the system and suppressing the deterioration of faults. Given that the important prerequisite for regulating the jig is to maintain it under normal operating conditions, it is necessary to monitor the working status of the jigger in real time. When operating faults occur, warn the driver to handle them in a timely manner to improve regulation performance. The regulation model of the jigging machine adopts collaborative deep reinforcement learning and evolutionary computing (hereinafter referred to as DEIS), as shown in Figure 2. In this model, assuming that the sampling frequency of all jig sensors is set to , a total of pieces of data were collected within one second, each containing 32 values, including clean coal ash, air pressure, water pressure, hydraulic cylinder, medium coal and gangue door, float weight, coal gangue bucket volume, and float value. When the communication is good and the aggregated data volume exceeds half of the sampling value, a control strategy is adopted to generate the operation of the fixture. At the same time, warning messages for gate overload, scouring, and compaction are determined based on the height of the bucket, the amplitude of the float, and the opening of the gate. When the communication blocking amount and convergence data amount are less than or equal to half of the sampling value, use differential evolution algorithm to generate the adjustment strategy for the operation of the jig, and the warning information of network communication problems is fed back. Because network communication is very important for intelligent controlling, we use the sound and light warning to inform the jig driver to deal with the network problems in time.

2.2.1. Regulation Strategy with Deep Reinforcement Learning

(1) Basic Definition of DQN(1)Definition of State. The core elements of deep reinforcement learning are state , action , and reward [15]. Corresponding to the operation process of the jig, we record the key control parameters of the jig as , including the throughput frequency , air pressure , water pressure , air valve , the combination of valve opening , hydraulic cylinder , gangue gate , buoy counterweight , coal gangue bucket amount , and buoy value at time . Overall,(2)Definition of Action. Action corresponds to the single adjustment step size of abovementioned parameters, including the throughput frequency , air pressure , water pressure , air valve , the combination of valve opening , hydraulic cylinder , gangue gate , buoy counterweight , coal gangue bucket amount , and buoy value . Overall,For the convenience of operation, the adjustment step size of elements is set following operating habit, so the elements of are all discrete. Different elements have various step sizes. For example, the adjustment step sizes of the throughput frequency , air pressure , water pressure , air valve , the combination of valve opening , hydraulic cylinder , gangue gate , buoy counterweight , coal gangue bucket amount , and buoy value are 1 Hz, 0.001 Mpa, 0.1 MPa, 1%, , 0.1 MPa, , 50 g, 0.1 m, and 1 cm. In engineering applications, the machines are usually not allowed to operate frequently or on a large scale. Hence, a constraint is applied to the performed action as follows:where means the number of nonzero elements and is the threshold meeting the practical requirements. This operation also can reduce the space size of the search space and downgrade the pressure of training the mapping models.(3)Definition of Reward. It is worth noting that due to the inertia of the operation process, the change of cleaned coal ash content within minutes after executing strategy has higher stability and credibility, that is, more accurate. The cleaned coal ash needs to be controlled within a given range , where is the expected cleaned coal ash and is the tolerance. In the evaluation period, assume that the initial cleaned coal ash is and the adjusted cleaned coal ash after performing is ; the reward of performing is given as follows:It is worth noting that if is within , there is no need performing an , so is ensured to be larger than 0. means that if the cleaned coal ash is adjusted to , is 0, and equals to 1. If is away from the expected , is negative, and assigns to 0. In this way, is controlled between [0, 1]. In addition, to motivate the DQN to quickly adopt efficient actions, is further developed to , where is more clearly divided into two parts of lower and higher ranges. This operation can reduce the probability of action selection faults caused by reward value calculation errors.

(2) Training of DQN. With the generalization ability of deep neural network, the mapping model between (, ) and can be established. The mapping model has ability to recommend the best to maximize , that is, to realize the regulation of cleaned coal ash fastest. As shown in Figure 3, this paper uses the typical deep reinforcement learning model (Deep Q-learning Network, DQN) to build the mapping relationship with (, ) and . The first half of Figure 3 is the training of DQN, and the second half is the application of DQN. The training data come from the automatically collected data during the operation of jig and drivers’ experience. To transfer the drivers’ experience to the formatted training data, is determined by drivers with experience, and during the period of collecting the training data, the machine state and the action are recorded, and the reward is also archived. The jig operation data are continually collected before the application of DQN. In this way, once an operation is conducted by the jig driver, a piece of operation data , namely, training data, is collected. When a certain amount of training data is collected, the DQN is trained and a simulator is used to illustrate the performance of the trained DQN. The effectiveness of initially trained DQN is tested with two aspects: (1) whether an effective action can be recommended; (2) whether the actual reward is near to the expected reward. Considering that the recommended action is not safe to be directly performed on the jig, the effectiveness of DQN is manually judged. When the trained DQN is not sufficient as a controller, the training data are still recorded and the DQN is constantly updated. Once the trained DQN can recommend effective action and predict relatively correct reward, the trained DQN is employed. To reduce the error of verifying the effectiveness of DQN, overall tests are conducted. If the success rate meets the requirements, the DQN is applied. Although DQN is applied as a controller, we still set an emergency measurement. After repeated regulation and control, if the jig does not meet the expected requirements, jig will exit the automatic control with alarming and will be controlled manually.

Considering that fully connected neural networks are capable of learning complex patterns and relationships in data, the neural network implementing DQN is a fully connected neural network. In the training data, a number of pieces of data are same, due to the same initial machine states. So, samples with the biggest difference are selected in the specific training process and normalized from the recorded operation dataset . Although equation (2) shows that just contains 10 elements, parts of elements, like , contain 4 members from different operating rooms. So, actually 32 elements are contained in as mentioned in Section 2.2. Also, also contains 32 elements, and the input of trained neural network, i.e., (, ), is with 64 dimensions. That means the neural network should have deep layers and big number of neurons. In this paper, the number of neurons in all the hidden layers is [64, 32, 16, 8, 4], respectively. For the learning rate, when the learning rate is too high, the model may experience oscillations or divergence during the training process, resulting in the inability to converge to the optimal solution. In this case, the model may skip the optimal solution or wander around the optimal solution, unable to achieve good training results. When the learning rate is too low, the training speed of the model will become very slow, and it may require more iterations to converge to the optimal solution. So, in our training process, we set two training stages with higher and lower learning rates, respectively. In the first stage, the learning rate is 0.01, while in the second stage, the learning rate is 0.001.

When the input of the sigmoid function approaches positive or negative infinity, the output approaches 0 or 1, which helps to improve the stability and convergence of the neural network. So, the hidden layers are connected by the sigmoid function, and the activation function of the output layer is a linear function. Then, predictive values of the training samples are obtained by forward propagation , where , . is the input of layer from layer, and and are the weights and bias of layer. Calculate the prediction error , and adjust the weights and bias of the network along the negative gradient direction of the prediction error, where is the union of weights and bias , is the output of the DQN, and is the number of trained samples. When the error requirements are met, stop the training on the DQN.

(3) Application of DQN. After the training of DQN, the running state of jig at the time is taken as the input of DQN to obtain the predicted performance value of different regulated actions. For all , the control action with the largest predicted value is . Then, is the subsequent regulation strategy for jig. In order to balance exploration and exploitation, the Epsilon greedy strategy is employed, which strikes a balance between exploring new options (with a probability of ) and exploiting the best-known option (with a probability of ). The Epsilon greedy strategy is given as follows:where means selecting a random regulatory strategy, which needs to consider the parameter constraints in terms of the safe use of the jig. Furthermore, to avoid the significant adjustment of the jig parameters, the parameter nearest to the current setting is first adjusted as selected . Only one parameter is adjusted at a time, and the other parameters are temporarily not adjusted. If the previous adjustment is not effective, the other parameters will be adjusted in turn.

2.2.2. Regulatory Strategies with Auto-Differential Evolution

The generation of jig regulation strategy is essentially a solution of a high-dimensional optimization problem. Unlike the deep reinforcement learning generating regulatory strategies, evolutionary computation does not require known jig states, i.e., , but only a surrogate model evaluating candidate solutions and parameter boundaries is needed. Evolutionary algorithms, such as differential evolution (DE) [1619], are effective in solving the high-dimensional optimization problems. However, to avoid large adjustment of equipment parameters, DE can only locally fine-tune the control parameters. Moreover, considering the timeliness requirement of online processes, DE needs to have the ability of rapid convergence. In view of this, this paper proposes auto-differential evolution (Au-DE) to improve the autonomous learning ability of the control parameters and accelerate the search convergence. The basic operation of Au-DE is to randomly generate individuals, and in each generation, the mutation operator, cross over operator, and selection operator are constantly performed to the population until the last generation , until the evolutionary process is over. Assuming that the optimization problem is a constrained minimization problem with dimensions, the optimization objective is , where is the objective function. A solution is shown as

Then, the population individual (candidate solution) can be further expressed aswhere . G is the current generation number, and Gmax is the maximum generation number.

For the individual in generation , the corresponding mutation operator is (0, 1]; is a scaling factor controlling the difference vector .

If exceeds the boundary, it is repaired bywhere and are the lower and higher boundaries on the dimension, respectively.

The goal of the crossover operator is to produce intermediate solutions:andwhere is the crossover probability.

The selection operator is used to select individuals with better fitness among parent and child individuals by

In original DE, and are randomly generated within [0, 1]. Though this operation has strong global search ability, it is not suitable for quickly finding optimal solutions and returning the regulatory strategies in time. In our proposed Au-DE, the successful and helping a population individual in generation to find the better child solution are recorded and reused. In this mode, is generated bywhere is the memory of successful , , is the total number of memory entries, and is the Cauchy distribution function. If , ; if , is generated by repeatedly executing equation (9), until it meets requirements. is initialized to 0.5. From the second generation, is updated bywhere is the memory site, initialized by 1. means the weighted sum of which is the successful . is computed by

In general, once new memories are updated, . If , . If no new memories are provided, the memory of the current entry is not updated. Similarly, is generated bywhere is the memory of successful , , is the total number of memory entries, and is the normal distribution function. is initialized to 0.5. From the second generation, is updated by

Since Au-DE can only solve the problem of minimization or maximization and the cleaned coal ash needs to be controlled within a given range , the optimization objective needs to be modified to , where . is the cleaned coal ash at the time . This paper constructs the operation data of jig and each piece of manual experience as a matrix, where corresponds to a group of jig operation parameter (input) and a cleaned coal ash parameter (output), respectively, and is the number of entries in the data. The mapping model with a BP neural network is shown in Figure 4, where the hidden layer is connected by a sigmoid function, and the activation function of the output layer is linear function. The hidden layer is denoted as . During the training process, the weights and bias , the maximum number of training times , the hidden layer and its number of neurons, and the network learning rate are first initialized. Then, for the training samples, predicted values of the training samples are obtained by forward propagation . Calculate the prediction error , and adjust the weights and bias along the negative gradient direction of the prediction error, where , . is the output of the BP neural network, is the input of layer from the layer, and and represent the weights and bias values of the layer. is the total number of samples participating in the training, and is the actual performance value of the strategy. Finally, the training of the BP neural network is stopped when the error requirements are met or the training time is reached. Assuming that the parameter adjustment step size is , according to the current parameter set , set the parameter search range to [], where is the control parameter for search accuracy.

3. Results

In this paper, taking a washing workshop as an example, we set the test running time of one month to verify the effectiveness of the proposed method. In Section 3.1, the parameter setting of DEIS system including the initial parameters, warning threshold parameters, and the feasible interval are given. In Section 3.2, the accuracy of jig fault warning in DEIS system is given. In Section 3.3, the average response time of cleaned coal ash from unsatisfying to satisfying production demand is given via the automatic control of DEIS system. The qualified rate of cleaned coal ash in the DEIS system is given in Section 3.4.

3.1. System Parameter Setting

In this paper, the empirical threshold parameters for the key module warning are set as follows. The height of the coal in bucket is less than 0.39 m, water tank liquid level is above 4.2 meters, the throughput frequency is controlled between 23 and 30 Hz, the optimal bed looseness is controlled between 0.4 and 0.45, the buoy height is set between 7 and 9, hydraulic cylinder pressure is greater than 1.6 MPA, wind pressure is set between 0.036 and 0.042 MPa, air valve intake period ratio is 23%–32%, air valve exhaust period ratio is 17%–25%, air valve expansion period ratio is 50%–60%, the critical value of gangue valve opening is 22, the critical value of medium coal valve opening is 18, and the critical value of the valve opening amplitude is 3. Due to the randomness in the working process of jig and the error of sensor data, judging whether a module is abnormal also needs to judge the duration of the jig fault. The default duration thresholds of all module parameters are set as follows: the lowest point of buoy is higher than the set value for 0.8 seconds, buoy stops beating for 16 seconds, bucket load exceeds limit for 18 seconds, gate keeps opening for 0.8 seconds, wind pressure is lower than the preset value for 16 seconds, hydraulic value is lower than the dedfault set for 18 seconds, water tank pressure value is lower than the default set for 0.8 minutes. The duration of the abnormal looseness is 18 minutes, the upper limit of the buoy height is 0.8 seconds, and the upper limit of the cleaned coal ash exceeds the limit of 10 minutes. In addition, considering that the coal height in bucket is obtained by machine vision and the coal shape is usually irregular, the accuracy of the computation result on coal height in bucket is significantly influenced. So, it is only considered as the abnormal bucket amount when the proportion of abnormal bucket is higher than 0.8 in multiple bucket lifting.

In addition, the intake period ratio of the 6 air chambers is initialized at 24%, 19%, 19%, 18%, 16%, 16%, and 21%, respectively; the throughput frequency is initialized at 27 Hz. All parameters are passed to the background data analysis model through the software interface. Adjustment step length of throughput frequency, buoy, intake (exhaust) period and expansion period, and valve opening is set to ; search accuracy control parameter is set to 3; .

3.2. The Warning Accuracy of DEIS

At present, the DEIS system mainly considers the overlimit warning, bed overturning warning, stuck buoy warning, stuck gate warning, and stockpile warning. During the specific test, according to the warning information displayed by the DEIS system, the warning information is verified manually. According to the proportion of warning types in the test duration given in Figure 5, it can be seen that the system warning in this period mainly includes overlimit warning, bed overturning warning, stuck buoy warning, and stuck gate warning. Stuck buoy and stuck gate are two main types of warning information, which exceed 50% of all warning information. Error warning refers to the warning issued by the DEIS system, but no abnormality is found in actual verification. Among all the warning results, the false warning accounted for 0.82%. The error warning is mainly caused by two reasons: (1) the data returned by the sensor have obvious error, which affects the accuracy of data analysis; (2) after the warning was issued, the jig returned to normal, resulting in inconsistent verification results.

3.3. Response Duration of the DEIS

To test the regulation ability of the system, we set the ideal interval of cleaned coal ash as [16, 18]. After implementating the regulation strategy, the time duration from the cleaned coal ash meeting requirements to being stable in the ideal interval, i.e., [16, 18] is the response time. Within each test day, three response durations of regulatory strategies were randomly recorded and the mean was calculated. It is worth noting that the recorded response time only considers the jig regulation under normal working conditions, and other special cases, such as insufficient coal quantity and jig faults, are not within the statistical range. As can be seen from the response duration given in Figure 6, the average response duration after the implementation of the regulatory strategies is not more than 40 minutes, and the specific response time is affected by the following main factors:(1)Under the influence of actual working conditions, the change trend of cleaned coal ash obtained by jig washing has different inertia. When the inertia is large, the control strategy adopted needs a long time to control the ash content of cleaned coal to the ideal range. The inertia here means that under the joint influence of multiple parameters of the jig, the cleaned coal ash has the property of continuously increasing, decreasing, or not easily changing.(2)The randomness of deep reinforcement learning and evolutionary computation makes the regulation strategy not unique. Although different control strategies have the effect of adjusting the ash content of cleaned coal, the control speed is obviously different, that is, the response time is different.(3)The quality of communication will affect the speed of receiving and transmitting real-time data, especially affect the system to perceive the trend change of cleaned coal ash in time. When the communication quality is good, the system can quickly sense the change of cleaned coal ash and make timely regulatory response; otherwise, it needs to adopt a long-term multistage regulatory strategy.

3.4. Qualified Rate of Cleaned Coal Ash

During the test cycle, the measured value of cleaned coal ash was recorded every day, the duration of cleaned coal ash at [16, 18] and the working duration of the jig were counted, and the qualified rate (percentage) of cleaned coal ash was characterized by the proportion of cleaned coal ash at [16, 18] and the working duration of the jig, as shown in Figure 7. In the test period, the qualified rate of cleaned coal ash is between 32% and 64%, and the average qualified rate of cleaned coal ash is about 47%. On the whole, the qualified rate of cleaned coal ash has a large uncertainty and a large range of changes, mainly due to the following:(1)After the implementation of the control strategy, the cleaned coal ash cannot reach the expected range within the expected time, which is related to the operation inertia of the jig and has a high randomness. This leads to a long time of equipment control still can not make the cleaned coal ash to meet the requirements, even if it has a good trend.(2)The jig has been used for a long time, leading to wear or misalignment of some components, so that the control effect in the actual control process is lower than expected, and continuous regulation is needed to achieve the expected regulation effect, affecting the qualified rate of cleaned coal ash.(3)The failure of the jig installed in the sample collection will affect the normal operation of the jig. For example, the coal sampler may be stuck by the coal, resulting in the jig having to shut down the cleaning equipment, which affects the washing effect.

4. Discussion

Aiming at the problem of fault monitoring and intelligent adjustment of production state of jig in coal washing process, an intelligent washing method guided by deep reinforcement learning and evolutionary computing is proposed in this paper. In the test period, the proposed method has a shorter strategy response time and a higher washing quality adjustment ability. At the same time, in order to reduce the influence of jig faults on the washing efficiency, the equipment fault warning considered in the proposed method can accurately identify the fault of the key module and notify the worker to deal with it in time, which improves the washing efficiency to a certain extent. Although the response time of the strategy and the qualified rate of cleaned coal ash need to be further improved, it has certain reference significance for the construction of intelligent washing, especially the construction of expert system. Due to many uncertain factors in the actual production process, it is necessary to adopt advanced data interpolation technology to deal with the incomplete data received when the communication environment is poor and further explore the potential application value of the data.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (grant no. 62303465) and Natural Science Foundation of Shandong Province (grant no. ZR2022LZH017).