Research Article  Open Access
Mingfang He, "DataDriven Approximated Optimal Control of Sulfur Flotation Process", Complexity, vol. 2019, Article ID 4754508, 16 pages, 2019. https://doi.org/10.1155/2019/4754508
DataDriven Approximated Optimal Control of Sulfur Flotation Process
Abstract
Sulfur flotation process is a typical industry process with complex dynamics. For a sulfur flotation cell, the structure of the system model could be derived using firstprinciples and reaction kinetics. However, the model parameters cannot be obtained under certain working conditions. In this paper, by using adaptive dynamic programming (ADP), we establish a datadriven optimal control approach for the operation of a sulfur flotation cell without knowing the model parameters. By learning from the online production data, an initial admissible control policy iteratively converges to an approximated optimal control law, and the dependence of optimal control design on the full model knowledge is eliminated. A simulation environment of sulfur flotation process is constructed based on phenomenological model and industrial data. Some practical problems in the implementation of ADP, i.e., selection of basis functions, how to use the model structural information in the ADPbased control design, are investigated. The feasibility and performance of the proposed datadriven optimal control are tested in the simulation environment. The results indicate the potential of applying bioinspired control methods in flotation process.
1. Introduction
Sulfur is an important element with wide applications in pharmacy industry, chemical engineering, light industry, and food industry, to name a few. In most cases, sulfur exists in the form of sulfide or sulfate minerals. So sulfur flotation process is usually integrated into mineral processing and hydrometallurgy plants to recover the valuable sulfur from the mineral residue as a secondary product. On the other hand, by using sulfur flotation, more advanced and economic technologies can be applied in metallurgy plants, e.g., direct leaching technology. The grade and recovery are two main technical indexes which define the overall performance of a sulfur flotation process. The operation objective of a sulfur flotation process is to recover as much as highgraded sulfur as possible from the mineral residue by finding the optimal combination of the manipulated variables in a realtime manner.
Sulfur flotation is a typical threephase (solidliquidgas) process. Generally, the operation of this type of process is nontrivial due to its complex and nonlinear dynamics which is a synthetical outcome of the intricate interactions between the reactants, inlet conditions, manipulated variables, and disturbances. Therefore, the economical operation of flotation processes has aroused the interest from both mineral processing community and control community [1–3].
The flotation modeling approaches can be categorized into firstprinciple modeling [4] and softcomputingbased modeling [5]. Lynch et al. [6] and Gorain et al. [7] developed empirical models for the entire flotation plant and subprocesses in a flotation system. Yianatos proposed a firstorder kinetic flotation model [8]. Koh and Schwarz studied the probabilistic modeling of the microprocesses, e.g., adhesion of particles to the bubbles in the pulp, in the flotation [9]. The flotation rate constant in the model varies with feeding conditions and working conditions. Therefore, to precisely determine the flotation rate constant remains an active research topic [4]. Bascur proposed a phenomenological model which considers both the particle population balance and hydraulic behavior [10]. The phenomenological model combined population balance, transfer of particles between phases and liquid balance, and is able to simulate the effects of manipulated variables (e.g., tailing and air flow rate) on a flotation process [2].
In flotation plants, the operation of a flotation process is conducted in a “setting and control” manner. The basic control loops maintain the primary variables at setpoints [2], which is achieved by conventional PID control. On a higher level, machine visionbased control systems, modelbased control methods, e.g., Model Predictive Control (MPC), adaptive control, expert systems, neural network, fuzzylogic, and their commercial software are developed and applied [11]. The performance of the above modelbased controllers relies on the model quality. Although the structure of a mathematical model could be obtained by applying physical/chemical principles and reaction kinetics, the model parameters vary with the working conditions and are not available under certain working conditions. Robust control approaches are the natural choice for these working conditions with modeling uncertainties. However, in some cases, the conservative feature of the robust control approaches keeps the derived controller away from optimal which in turn leads to economical loss.
In recent years, bioinspired control approaches have been widespreading. Learning is the basic feature of bioinspired control methods. It obtains the optimal or nearly optimal controller by learning from the “excitationresponse” interactions with the environment [12], of which “try and error” is a typical example. Reinforcement learning (RL) is a bioinspired machine learning method. It is originally observed from the learning behavior in mammals [13]. RL represents how an agent/animal interacts with the unknown environment and modifies its action to obtain a maximized reward. Adaptive dynamic programming is an integration of adaptive control, dynamic programming, and reinforcement learning [14, 15]. By using an “actorcritic” structure, it can approximate the solution of the HamiltonJacobiBellman (HJB) equation online without knowing the full knowledge of the system model, or even when the system model is unknown [16, 17]. Thus, if an initial admissible control policy is available, an approximated optimal control can be obtained by solving the HJB equation iteratively. The industrial applications of ADP include oil production [18], coal gasification [19], and energy scheduling [20], to name a few. In recent years, learningbased control has been applied in the operation and control of froth flotation process. In [21, 22], a dualrate and a twolayer control strategy based on Qlearning and offpolicy reinforcement learning were proposed and tested via experimental study, respectively.
The aim of this study is to design an alternative optimal control approach for a sulfur flotation cell when the model parameters are unknown and the manipulated variables are constrained. The rest of this paper is organized as follows. In Section 2, the classical flotation kinetic model developed by Bascur [10] is introduced and used in the analysis of the optimal control problem in Section 3. Then the model is adopted to simulate the real dynamics of a flotation cell in Section 4. An optimal control based on ADP is designed to iteratively improve the performance of an initial admissible controller by learning from the online operational data generated from the simulation environment. The performance of the proposed optimal control is compared with the traditional PI controller. The simulation results, advantages, and shortcomings as well as future extensions of the proposed control are analyzed in Section 6.
2. Process Description
Typically, a sulfur flotation process is composed of three stages with specific functions, namely, roughing, scavenging, and cleaning (Figure 1). The roughing stage performs the initial separation of sulfur from gangue. Recovery rate is the main concern of roughing stage. The tailing of roughing stage is delivered to scavenging stage to seek for the remaining sulfur. The tailing of scavenging stage is also the final tailing of the entire flotation process. The froth of roughing stage overflows to the cleaning stage. The scavenging froth as well as the cleaning tails are returned to the roughing stage. The overflow of the cleaning stage is the final product, i.e., the sulfur concentrate. The three stages work collaboratively to maximize the sulfur recovery and concentrate grade and minimize the grade of final tails.
The economical production of a flotation process relies on the optimal control of a single flotation cell which is the basic flotation unit. As shown in Figure 2, the operation of sulfur flotation in a flotation cell is based on the natural hydrophobicity of sulfur. First, slurry containing both valuable sulfur and base gangue is fed into the flotation cell. Then, by blowing air into the flotation cell, the sulfur particles in the slurry are attached to the bubbles. The bubbles float upward and carry the particles to the froth layer. Finally, the sulfur particles are collected in the concentrate launder via the natural overflow of the froth. Focused on a single flotation cell, the rest of this section presents the model structure which is then used along with the flotation mechanism to analyze the optimal control problem.
2.1. Flotation Behavior Analysis
The physical activities in a sulfur flotation cell involve the movement of solid particles driven by the flow of air and liquid. As illustrated in Figures 2 and 3 [10], considering the different roles of pulp and froth in determining the cell dynamics, the flotation cell is divided into two sections: froth volume and pulp volume. The froth volume and pulp volume are then subdivided in the liquid and air phases with which the mineral particles are associated. The states and movements of particles in a flotation cell mainly include the following:(i): free in the pulp (particles in the liquid phase of the pulp volume)(ii): attached in the pulp (particles attached to the bubbles in the pulp volume)(iii): free in the froth (particles in the liquid of the froth volume)(iv): attached in the froth (particles attached to the bubbles of the froth volume)
and(i): free particles in the pulp attached to the bubbles in the pulp volume with the attachment rate constant (ii): particles associated with the bubbles detached and enter the liquid phase in the pulp volume with the detachment rate constant (iii): free particles in the froth layer attached to the bubbles in the froth volume with the attachment rate constant (iv): particles associated with the bubbles detached and enter the liquid phase in the froth volume with the detachment rate constant (v): the particles in the input slurry (with liquid volume flow rate and the number of particles of size and mineralogical species per unit volume) feed into the flotation cell(vi): particles draining from the liquid phase of the froth volume (with the flow rate of and the number of particles of size and mineralogical species per unit volume of liquid in the froth volume, a constant indicating the segregation mechanism in the draining)(vii): particles entrained by the liquid flow from pulp volume to the froth volume (with the flow rate of )(viii): transport of particles attached on the bubbles in the pulp volume to the froth volume (with the air flow rate of )(ix): the particles in the pulp leave through the tailing port (with the flow rate of and the number of particles of size and mineralogical species per unit volume of liquid in the pulp)(x): exit of particles attached on the bubbles in the pulp volume to the tailings (with the aeration rate of )(xi): the particles free in the froth overflow to the concentrate lander (with the flow rate of )(xii): the particles attached to the bubbles in the froth overflow to the concentrate lander (with the flow rate of )
2.2. Phenomenological Model of a Flotation Cell
Based on the above simplifications, consider the hydraulic behavior, and apply the following assumptions:
Assumption 1. Each phase in the pulp and froth volume of the flotation cell is perfectly mixed.
Assumption 2. The particle size and shape are uniform.
Assumption 3. The changes in the air volumes occur much faster than the liquid volumes, and steady state relations are used; i.e., , [10, 23].
then a process model, which is a combination of the particle population balance in the liquid phase and the air phase for the pulp and froth volume, is built to describe the dynamics of a flotation cell:
where (1a) and (1b) present the mass balance for mineralogical species , while (1c) and (1d) describe the liquid volume balance at the pulp volume and froth volume, andThe model (1a), (1b), (1c), (1d), and (1e) is a combination of the froth flotation models proposed in [10, 24]. It keeps the phenomenological behavior in [10], while it also approximates the dynamics of the pulp level in [24]. The physical meanings of the model parameters and the variables in Figure 3 are listed in Table 1. In Section 5, this model is applied to simulate the real flotation cell.

3. Optimal Control Problem Analysis
To start with the optimal control problem analysis, the model is reformulated first to investigate how the manipulated variables affect the flotation performance.
As indicated in [10], and can be formulated as where is the thickness of film surrounding the bubble, is the mean size bubble diameter in the pulp region, is the average air holdup in the plateau border of froth volume, and and are two variables changing with working conditions.
Combining (1a), (1b), (1c), (1d), and (1e), (5), (6), assume there are two mineralogical classes in the pulp, i.e., sulfur rich mineral (class 1) and sulfur poor mineral (class 2); the original model can be reformulated as whereand for the mineralogical class where and are constrained in reasonable operation ranges and rationally adjusted to avoid too high or too low pulp level.
The reformulated model (7a), (7b), (7c), (7d), (7e), (7f), and (7g) indicates that air flow rate , tailing flow rate , and pulp level are important factors that determine the flotation efficiency. This conclusion is consistent with the site experience. In a flotation plant, the operators adjust these variables to obtain the desired recovery and concentrate grade which are determined by a higherlevel decision maker. The concentrate grade and recovery are defined by where is the density of the final concentrate (), and are the sulfur grade of the two mineralogical classes, is the grade of the feed, andis the grade of the tailings. In practice, the optimal control of the entire flotation process can be transformed to the setting and control of and (determine the pulp level) and (determines the concentrate grade) of each cell to assure that, in each cell, a sufficient part of the froth is collected while undesired situations, e.g., pulp tuning, are avoided.
4. ADPBased Optimal Control Design
The dynamic of a flotation cell can be expressed in a statespace format:in which, the states and the controlled and manipulated variables areand
If the model parameters are unknown under certain working conditions, then the model is only partially known and the performance of traditional optimal control method cannot be guaranteed. This situation also exists in other industry processes. To provide an optimal control alternative when the model parameters are unknown, this section studies the ADPbased optimal control of a flotation cell. ADP does not rely on a precise system model. It can, starting from an initial admissible controller, iteratively improve the controller via learning from the online operational data.
4.1. Preliminaries on Policy Iteration
Consider the following objective function:where is the performance integrand of the system states, and is the nonquadratic performance integrand of the input variables:where is a bounded, monotonic, odd function () and maps the control bounds imposed, is the inverse function of , , and is a diagonal matrix with its diagonal elements all positive.
The optimal control policy is where is the solution of a HamiltonJacobiBellman (HJB) equation:with .
The HJB equation is generally difficult to solve analytically. Therefore, Police Iteration (PI) approach is required:(1)Apply an initial stabilizing admissible controller to the system(2)For , solve the following Lyapunov equation for :(3)Update the control law(4)if stop criterion is met, then stop; otherwise, ; go to Step .
4.2. ADPBased InputConstrained Optimal Control
In the police iteration, full knowledge of and is required. In this study, a modelfree PI technique is extended to the inputconstrained case to eliminate this requirement.
To start with, reformulate asThenwhere is the control forced actually on the system, and is the control policy to be improved iteratively.
Consider a cost function satisfying (22):
Integrate on a time interval :
Assume and are two infinite sequences of linearly independent smooth basis functions on , where and for all . Approximate the cost function and the control policy aswhere , , andwherewithNote that the input is approximated by (29). This eliminates the requirement on the knowledge on , as indicated by (23). This problem can also be handled by constructing a NNbased approximation system, such as the method proposed in [25].
In (28)(32), and are two sufficiently large integers. Substitute Eqs. (28)(32) into (27), and denote and as and :
In (33), there are variables to be determined in and (), and consider a sufficient long time sequence with ; then according to (33)whereand is the approximation error.
From (34), ifwhere , then and can be obtained aswhere . (41) along with (28)(32) serves as a computational approach to approximate and generate an improved controller when an admissible control policy is given. By using this approach iteratively, starting from an admissible bounded control , a control sequence could be obtained and satisfies the following theorem. As the iteration is conducted after the online stage when the information needed for the calculation of and is collected, this method belongs to offpolicy reinforcement learning [26, 27].
Theorem 4. Consider system (13) with input constraints; if there exists an optimal controller and Assumptions 1 and 2 hold, thenwhich is generated by the PI ((41), (28), and (29)) is an admissible control sequence for (13) on . The cost function satisfies the following Lyapunov equation (LE):Then for . In addition, , uniformly on .
Proof. The proof of this theorem follows the same lines of reasoning as in the proof of Theorem 1 and Theorem 3.1 in [28, 29], and it is omitted here for brevity.
Assumption 5. There exist and , such that for all where is the th row of .
Assumption 6. The closed loop system is composed of (16) andis ISS (InputtoState Stable) [30] when the exploration noise is considered as the input.
5. Experimental Study
5.1. Simulation Setup
In order to test the feasibility and performance of the proposed optimal control, a testing environment based on the model introduced in Section 2 was constructed to simulate a real flotation cell. Consider the available instrumentation in the real flotation plant; the model (7a), (7b), (7c), (7d), (7e), (7f), and (7g) was reformulated to (46a), (46b), (46c), (46d), (46e), (46f), and (46g), which makes the simulation more close to practical. In the reformulated model, new system states , , , and are introduced. So , , , and .
The parameters of the model were identified from industrial data collected from a real plant [10]. The initial value of system states is The model parameter set is the concentration and sulfur grade of the two mineralogical classes are:In order to test two different ways of control, i.e., setting the recovery rate and grade and setting the concentrations of each mineralogical species in both pulp and froth layer, the experimental study includes two tasks: (i)Task 1: the control objective is to obtain the desired recovery rate and grade:(ii)Task 2: the control objective is to obtain desired concentrations of each mineralogical species in both pulp and froth layer: In Task 1 and Task 2, the performance integrand was chosen as andrespectively, where
To handle the input constraint, was selected as hyperbolic tangent function, and .
The basis functions for the approximation of the control input were selected as where and are the explosion noises, and are random variables, , .
The basis functions for the approximation of the value function were selected as:for Task 1, andfor Task 2, respectively, where
The learning phase is 30 minutes. After the operational data was collected, an offline policy iteration was conducted until the stop criterion was met.
5.2. Simulation Results and Discussion
In Task 1, the initial admissible controller was selected as:
The resulting controller was
The simulation results of Task 1 are shown in Figures 4–14. Figure 12 shows that the recovery raised to after 1 hour and kept stable since then, while the variation of concentrate grade was small. Figures 13 and 14 demonstrate that the weighting variables for the approximation of control input gradually converge. Figures 4–10 present the variations of , , , , , , and pulp level, respectively. Figure 11 indicates the tailing flow rate and the air flow rate are within their limitations; see and . From these figures, it was found that the increase of recovery is mainly caused by the increase of (the concentration of the rich mineralogical class in the froth) and decrease of pulp level , which is consistent with site experience.
In Task 2, the initial admissible controller was selected as