Data-Driven Approximated Optimal Control of Sulfur Flotation Process
Sulfur flotation process is a typical industry process with complex dynamics. For a sulfur flotation cell, the structure of the system model could be derived using first-principles and reaction kinetics. However, the model parameters cannot be obtained under certain working conditions. In this paper, by using adaptive dynamic programming (ADP), we establish a data-driven optimal control approach for the operation of a sulfur flotation cell without knowing the model parameters. By learning from the online production data, an initial admissible control policy iteratively converges to an approximated optimal control law, and the dependence of optimal control design on the full model knowledge is eliminated. A simulation environment of sulfur flotation process is constructed based on phenomenological model and industrial data. Some practical problems in the implementation of ADP, i.e., selection of basis functions, how to use the model structural information in the ADP-based control design, are investigated. The feasibility and performance of the proposed data-driven optimal control are tested in the simulation environment. The results indicate the potential of applying bioinspired control methods in flotation process.
Sulfur is an important element with wide applications in pharmacy industry, chemical engineering, light industry, and food industry, to name a few. In most cases, sulfur exists in the form of sulfide or sulfate minerals. So sulfur flotation process is usually integrated into mineral processing and hydrometallurgy plants to recover the valuable sulfur from the mineral residue as a secondary product. On the other hand, by using sulfur flotation, more advanced and economic technologies can be applied in metallurgy plants, e.g., direct leaching technology. The grade and recovery are two main technical indexes which define the overall performance of a sulfur flotation process. The operation objective of a sulfur flotation process is to recover as much as high-graded sulfur as possible from the mineral residue by finding the optimal combination of the manipulated variables in a real-time manner.
Sulfur flotation is a typical three-phase (solid-liquid-gas) process. Generally, the operation of this type of process is nontrivial due to its complex and nonlinear dynamics which is a synthetical outcome of the intricate interactions between the reactants, inlet conditions, manipulated variables, and disturbances. Therefore, the economical operation of flotation processes has aroused the interest from both mineral processing community and control community [1–3].
The flotation modeling approaches can be categorized into first-principle modeling  and soft-computing-based modeling . Lynch et al.  and Gorain et al.  developed empirical models for the entire flotation plant and subprocesses in a flotation system. Yianatos proposed a first-order kinetic flotation model . Koh and Schwarz studied the probabilistic modeling of the microprocesses, e.g., adhesion of particles to the bubbles in the pulp, in the flotation . The flotation rate constant in the model varies with feeding conditions and working conditions. Therefore, to precisely determine the flotation rate constant remains an active research topic . Bascur proposed a phenomenological model which considers both the particle population balance and hydraulic behavior . The phenomenological model combined population balance, transfer of particles between phases and liquid balance, and is able to simulate the effects of manipulated variables (e.g., tailing and air flow rate) on a flotation process .
In flotation plants, the operation of a flotation process is conducted in a “setting and control” manner. The basic control loops maintain the primary variables at set-points , which is achieved by conventional PID control. On a higher level, machine vision-based control systems, model-based control methods, e.g., Model Predictive Control (MPC), adaptive control, expert systems, neural network, fuzzy-logic, and their commercial software are developed and applied . The performance of the above model-based controllers relies on the model quality. Although the structure of a mathematical model could be obtained by applying physical/chemical principles and reaction kinetics, the model parameters vary with the working conditions and are not available under certain working conditions. Robust control approaches are the natural choice for these working conditions with modeling uncertainties. However, in some cases, the conservative feature of the robust control approaches keeps the derived controller away from optimal which in turn leads to economical loss.
In recent years, bioinspired control approaches have been wide-spreading. Learning is the basic feature of bioinspired control methods. It obtains the optimal or nearly optimal controller by learning from the “excitation-response” interactions with the environment , of which “try and error” is a typical example. Reinforcement learning (RL) is a bioinspired machine learning method. It is originally observed from the learning behavior in mammals . RL represents how an agent/animal interacts with the unknown environment and modifies its action to obtain a maximized reward. Adaptive dynamic programming is an integration of adaptive control, dynamic programming, and reinforcement learning [14, 15]. By using an “actor-critic” structure, it can approximate the solution of the Hamilton-Jacobi-Bellman (HJB) equation online without knowing the full knowledge of the system model, or even when the system model is unknown [16, 17]. Thus, if an initial admissible control policy is available, an approximated optimal control can be obtained by solving the HJB equation iteratively. The industrial applications of ADP include oil production , coal gasification , and energy scheduling , to name a few. In recent years, learning-based control has been applied in the operation and control of froth flotation process. In [21, 22], a dual-rate and a two-layer control strategy based on Q-learning and off-policy reinforcement learning were proposed and tested via experimental study, respectively.
The aim of this study is to design an alternative optimal control approach for a sulfur flotation cell when the model parameters are unknown and the manipulated variables are constrained. The rest of this paper is organized as follows. In Section 2, the classical flotation kinetic model developed by Bascur  is introduced and used in the analysis of the optimal control problem in Section 3. Then the model is adopted to simulate the real dynamics of a flotation cell in Section 4. An optimal control based on ADP is designed to iteratively improve the performance of an initial admissible controller by learning from the online operational data generated from the simulation environment. The performance of the proposed optimal control is compared with the traditional PI controller. The simulation results, advantages, and shortcomings as well as future extensions of the proposed control are analyzed in Section 6.
2. Process Description
Typically, a sulfur flotation process is composed of three stages with specific functions, namely, roughing, scavenging, and cleaning (Figure 1). The roughing stage performs the initial separation of sulfur from gangue. Recovery rate is the main concern of roughing stage. The tailing of roughing stage is delivered to scavenging stage to seek for the remaining sulfur. The tailing of scavenging stage is also the final tailing of the entire flotation process. The froth of roughing stage overflows to the cleaning stage. The scavenging froth as well as the cleaning tails are returned to the roughing stage. The overflow of the cleaning stage is the final product, i.e., the sulfur concentrate. The three stages work collaboratively to maximize the sulfur recovery and concentrate grade and minimize the grade of final tails.
The economical production of a flotation process relies on the optimal control of a single flotation cell which is the basic flotation unit. As shown in Figure 2, the operation of sulfur flotation in a flotation cell is based on the natural hydrophobicity of sulfur. First, slurry containing both valuable sulfur and base gangue is fed into the flotation cell. Then, by blowing air into the flotation cell, the sulfur particles in the slurry are attached to the bubbles. The bubbles float upward and carry the particles to the froth layer. Finally, the sulfur particles are collected in the concentrate launder via the natural overflow of the froth. Focused on a single flotation cell, the rest of this section presents the model structure which is then used along with the flotation mechanism to analyze the optimal control problem.
2.1. Flotation Behavior Analysis
The physical activities in a sulfur flotation cell involve the movement of solid particles driven by the flow of air and liquid. As illustrated in Figures 2 and 3 , considering the different roles of pulp and froth in determining the cell dynamics, the flotation cell is divided into two sections: froth volume and pulp volume. The froth volume and pulp volume are then subdivided in the liquid and air phases with which the mineral particles are associated. The states and movements of particles in a flotation cell mainly include the following:(i): free in the pulp (particles in the liquid phase of the pulp volume)(ii): attached in the pulp (particles attached to the bubbles in the pulp volume)(iii): free in the froth (particles in the liquid of the froth volume)(iv): attached in the froth (particles attached to the bubbles of the froth volume)
and(i): free particles in the pulp attached to the bubbles in the pulp volume with the attachment rate constant (ii): particles associated with the bubbles detached and enter the liquid phase in the pulp volume with the detachment rate constant (iii): free particles in the froth layer attached to the bubbles in the froth volume with the attachment rate constant (iv): particles associated with the bubbles detached and enter the liquid phase in the froth volume with the detachment rate constant (v): the particles in the input slurry (with liquid volume flow rate and the number of particles of size and mineralogical species per unit volume) feed into the flotation cell(vi): particles draining from the liquid phase of the froth volume (with the flow rate of and the number of particles of size and mineralogical species per unit volume of liquid in the froth volume, a constant indicating the segregation mechanism in the draining)(vii): particles entrained by the liquid flow from pulp volume to the froth volume (with the flow rate of )(viii): transport of particles attached on the bubbles in the pulp volume to the froth volume (with the air flow rate of )(ix): the particles in the pulp leave through the tailing port (with the flow rate of and the number of particles of size and mineralogical species per unit volume of liquid in the pulp)(x): exit of particles attached on the bubbles in the pulp volume to the tailings (with the aeration rate of )(xi): the particles free in the froth overflow to the concentrate lander (with the flow rate of )(xii): the particles attached to the bubbles in the froth overflow to the concentrate lander (with the flow rate of )
2.2. Phenomenological Model of a Flotation Cell
Based on the above simplifications, consider the hydraulic behavior, and apply the following assumptions:
Assumption 1. Each phase in the pulp and froth volume of the flotation cell is perfectly mixed.
Assumption 2. The particle size and shape are uniform.
then a process model, which is a combination of the particle population balance in the liquid phase and the air phase for the pulp and froth volume, is built to describe the dynamics of a flotation cell:
where (1a) and (1b) present the mass balance for mineralogical species , while (1c) and (1d) describe the liquid volume balance at the pulp volume and froth volume, andThe model (1a), (1b), (1c), (1d), and (1e) is a combination of the froth flotation models proposed in [10, 24]. It keeps the phenomenological behavior in , while it also approximates the dynamics of the pulp level in . The physical meanings of the model parameters and the variables in Figure 3 are listed in Table 1. In Section 5, this model is applied to simulate the real flotation cell.
3. Optimal Control Problem Analysis
To start with the optimal control problem analysis, the model is reformulated first to investigate how the manipulated variables affect the flotation performance.
As indicated in , and can be formulated as where is the thickness of film surrounding the bubble, is the mean size bubble diameter in the pulp region, is the average air holdup in the plateau border of froth volume, and and are two variables changing with working conditions.
Combining (1a), (1b), (1c), (1d), and (1e), (5), (6), assume there are two mineralogical classes in the pulp, i.e., sulfur rich mineral (class 1) and sulfur poor mineral (class 2); the original model can be reformulated as whereand for the mineralogical class where and are constrained in reasonable operation ranges and rationally adjusted to avoid too high or too low pulp level.
The reformulated model (7a), (7b), (7c), (7d), (7e), (7f), and (7g) indicates that air flow rate , tailing flow rate , and pulp level are important factors that determine the flotation efficiency. This conclusion is consistent with the site experience. In a flotation plant, the operators adjust these variables to obtain the desired recovery and concentrate grade which are determined by a higher-level decision maker. The concentrate grade and recovery are defined by where is the density of the final concentrate (), and are the sulfur grade of the two mineralogical classes, is the grade of the feed, andis the grade of the tailings. In practice, the optimal control of the entire flotation process can be transformed to the setting and control of and (determine the pulp level) and (determines the concentrate grade) of each cell to assure that, in each cell, a sufficient part of the froth is collected while undesired situations, e.g., pulp tuning, are avoided.
4. ADP-Based Optimal Control Design
The dynamic of a flotation cell can be expressed in a state-space format:in which, the states and the controlled and manipulated variables areand
If the model parameters are unknown under certain working conditions, then the model is only partially known and the performance of traditional optimal control method cannot be guaranteed. This situation also exists in other industry processes. To provide an optimal control alternative when the model parameters are unknown, this section studies the ADP-based optimal control of a flotation cell. ADP does not rely on a precise system model. It can, starting from an initial admissible controller, iteratively improve the controller via learning from the online operational data.
4.1. Preliminaries on Policy Iteration
Consider the following objective function:where is the performance integrand of the system states, and is the nonquadratic performance integrand of the input variables:where is a bounded, monotonic, odd function () and maps the control bounds imposed, is the inverse function of , , and is a diagonal matrix with its diagonal elements all positive.
The optimal control policy is where is the solution of a Hamilton-Jacobi-Bellman (HJB) equation:with .
The HJB equation is generally difficult to solve analytically. Therefore, Police Iteration (PI) approach is required:(1)Apply an initial stabilizing admissible controller to the system(2)For , solve the following Lyapunov equation for :(3)Update the control law(4)if stop criterion is met, then stop; otherwise, ; go to Step .
4.2. ADP-Based Input-Constrained Optimal Control
In the police iteration, full knowledge of and is required. In this study, a model-free PI technique is extended to the input-constrained case to eliminate this requirement.
To start with, reformulate asThenwhere is the control forced actually on the system, and is the control policy to be improved iteratively.
Consider a cost function satisfying (22):
Integrate on a time interval :
Assume and are two infinite sequences of linearly independent smooth basis functions on , where and for all . Approximate the cost function and the control policy aswhere , , andwherewithNote that the input is approximated by (29). This eliminates the requirement on the knowledge on , as indicated by (23). This problem can also be handled by constructing a NN-based approximation system, such as the method proposed in .
From (34), ifwhere , then and can be obtained aswhere . (41) along with (28)-(32) serves as a computational approach to approximate and generate an improved controller when an admissible control policy is given. By using this approach iteratively, starting from an admissible bounded control , a control sequence could be obtained and satisfies the following theorem. As the iteration is conducted after the online stage when the information needed for the calculation of and is collected, this method belongs to off-policy reinforcement learning [26, 27].
Theorem 4. Consider system (13) with input constraints; if there exists an optimal controller and Assumptions 1 and 2 hold, thenwhich is generated by the PI ((41), (28), and (29)) is an admissible control sequence for (13) on . The cost function satisfies the following Lyapunov equation (LE):Then for . In addition, , uniformly on .
Assumption 5. There exist and , such that for all where is the th row of .
5. Experimental Study
5.1. Simulation Setup
In order to test the feasibility and performance of the proposed optimal control, a testing environment based on the model introduced in Section 2 was constructed to simulate a real flotation cell. Consider the available instrumentation in the real flotation plant; the model (7a), (7b), (7c), (7d), (7e), (7f), and (7g) was reformulated to (46a), (46b), (46c), (46d), (46e), (46f), and (46g), which makes the simulation more close to practical. In the reformulated model, new system states , , , and are introduced. So , , , and .
The parameters of the model were identified from industrial data collected from a real plant . The initial value of system states is The model parameter set is the concentration and sulfur grade of the two mineralogical classes are:In order to test two different ways of control, i.e., setting the recovery rate and grade and setting the concentrations of each mineralogical species in both pulp and froth layer, the experimental study includes two tasks: (i)Task 1: the control objective is to obtain the desired recovery rate and grade:(ii)Task 2: the control objective is to obtain desired concentrations of each mineralogical species in both pulp and froth layer: In Task 1 and Task 2, the performance integrand was chosen as andrespectively, where
To handle the input constraint, was selected as hyperbolic tangent function, and .
The basis functions for the approximation of the control input were selected as where and are the explosion noises, and are random variables, , .
The basis functions for the approximation of the value function were selected as:for Task 1, andfor Task 2, respectively, where
The learning phase is 30 minutes. After the operational data was collected, an offline policy iteration was conducted until the stop criterion was met.
5.2. Simulation Results and Discussion
In Task 1, the initial admissible controller was selected as:
The resulting controller was
The simulation results of Task 1 are shown in Figures 4–14. Figure 12 shows that the recovery raised to after 1 hour and kept stable since then, while the variation of concentrate grade was small. Figures 13 and 14 demonstrate that the weighting variables for the approximation of control input gradually converge. Figures 4–10 present the variations of , , , , , , and pulp level, respectively. Figure 11 indicates the tailing flow rate and the air flow rate are within their limitations; see and . From these figures, it was found that the increase of recovery is mainly caused by the increase of (the concentration of the rich mineralogical class in the froth) and decrease of pulp level , which is consistent with site experience.
In Task 2, the initial admissible controller was selected as
The resulting controller is
The simulation results of Task 2 are shown in Figures 15–25. Figures 15 and 16 show that the concentrations of the rich mineralogical species in the pulp and froth layer converge to their desired setting values with small steady-state bias caused by approximation error, while the recovery increased to (Figure 23). This indicates that the desired recovery rate can be achieved by the setting and control of concentrations of each mineralogical species in the flotation cell. This is a useful rule in the optimal control of a flotation circuit composed of multiple cells. Figures 17, 18, 19, 20, and 21 demonstrate the variations of the concentration of poor mineralogical species in the pulp and froth layer, , , pulp level, respectively. Figure 22 indicates that the flow rate of tailing and air input are within their limitations; see and . Figures 24 and 25 present the convergence of the weighting variables for the approximation of control input, respectively.
In this study, it was observed that the selection of basis function affects the approximation accuracy, which could in turn affect the algorithm convergence and controller performance. Theoretically, more basis functions for the value function and control input could decrease the approximation error. In the two tasks, different basis functions for the approximation of value function were used due to different optimization objectives. The selection of basis functions for the approximation of control input should have certain physical sense. The model structural information could provide more insights in the basis function selection. The magnitude of explosion noise is also an important factor. A small explosion cannot guarantee the rank condition, while an overlarge explosion noise could cause overexcitation.
An ADP-based optimal control approach for a sulfur flotation cell is proposed as an alternative when the parameters of the system model are unknown. The experimental study verified the feasibility and performance of the approach, which indicates the potential of applying bioinspired control methods in complex industrial processes. The advantage of ADP-based optimal control is that it does not rely on the full knowledge of the process model. The difficulties in the implementation of an ADP-based control are in the design of an initial admissible controller and the selection of basis functions. This is problem-specific and could benefit from the structural information of the process model and site experience. In addition, in this study, the system states are assumed available online which is not always the case, so the integration of state-observer  or machine-vision  and ADP-based control for flotation process should be studied in future. Other future research topics include ADP robust to approximation error caused by limited learning time , the optimal control of a single flotation cell in more complicated cases, e.g., with state constraints, the optimal control of a flotation circuit composed of multiple flotation cells, the integration of ADP and classic model-based optimal control (e.g., MPC) and its application in flotation processes.
The data supporting the conclusion is available via reader’s request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported by the National Natural Science Foundation of China [grant number 61703441].
M. P. Jones, N. Johnson, E. Manlapig, and C. Thorne, Mineral and Coal Flotation Circuits - Their Simulation and Control (Developments in Mineral Processing), vol. 3, Elsevier Scientific Publishing Co, 1981.View at: Publisher Site
O. A. Bascur, Modelling and Computer Control of a Flotation Cell, University of Utah, Salt Lake City, Utah, USA, 1982.
Y. Jiang, Robust adaptive dynamic programming for continuous-time linear and nonlinear systems, Polytechnic Institute of New York University, New York, NY, US, 2014.
Y. Lv and X. Ren, “Approximate nash solutions for multiplayer mixed-zero-sum game with reinforcement learning,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp. 1–12, 2018.View at: Google Scholar
Z. Wen, L. J. Durlofsky, B. Van Roy, and K. Aziz, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, Wiley Online Library, 2013.
H. K. Khalil, Nonlinear Systems, Prentice Hall, New Jersey, NJ, USA, 3rd edition, 2002.