Abstract

This paper addresses the disturbance change control problem with an active deformation adjustment mechanism on a 5-meter deployable antenna panel. A fuzzy neural network Q-learning control (FNNQL) strategy is proposed in this paper for the disturbance change to improve the accuracy of the antenna panel. In the proposed method, the error of the model disturbance is reduced by introducing the fuzzy radial basis function (RBF) neural network into Q-learning, and the parameters of the fuzzy RBF neural network were optimized and adjusted by a Q-learning method. This allows the FNNQL controller to have a strong adaptability to deal with the disturbance change. Finally, the proposed method has been adopted in the middle plate of a 5-meter deployable antenna panel, and it was found that the method could successfully adapt the model disturbance change in the antenna panel. Results of the simulation also show that the whole control system meets the required accuracy requirements.

1. Introduction

To achieve long-term continuous meteorological observations, a 5-meter deployable antenna has been arranged in the geosynchronous orbit [1, 2]. The 5-meter deployable antenna panel can be divided into five units, which are the upper left plate, the upper right plate, the lower left plate, the lower right plate, and the retainer plate, as shown in Figure 1. An active adjustment system was adopted to meet the observation requirements of the 5-meter deployable antenna, as shown in Figure 2. (1)Each of the reflecting panels was supported by more than one active actuators, and the control model of the active adjustment system is a typical multiple-input multiple-output (MIMO) system(2)The strong coupling effects are caused because the active actuator interacts with each other during the adjustment process

The control model of the active adjustment system is changed because of the disturbances which came from working conditions, model simplifications, and external interferences of the 5-meter deployable antenna. Added to that is that the disturbance can significantly affect the observation accuracy of the deployable antenna, particularly the disturbance change. To improve the observation accuracy of the deployable antenna, various control methods are worth our using for reference in the control process of the active adjustment system.

In the past few years, a great deal of progress had been made in relation to the control of MIMO systems and numerous control strategies were introduced [35]. The intelligent control methods for MIMO systems have been put forward by researchers combining fuzzy control theory with artificial intelligence and adaptive control [68]. Hamdy et al. proposed a novel inverted fuzzy decoupling scheme [9]; moreover, the Smith predictor (SP) based on a fuzzy decoupling scheme was proposed for a MIMO chemical process with multiple time delays. The schemes were achieved using fuzzy logic, thereby avoiding the reliance on model-based analyses. Furthermore, Hamdy and Ramadan proposed another fuzzy decoupling-based PI controller [10]. Fuzzy decoupling schemes are simple to design and easy to implement, demonstrating good decoupling capabilities. However, such methods are only applicable to cases wherein the output is a state quantity without the disturbances.

The discrete linear quadratic regulator (LQR) controller for the active adjustment system model ensured the observation accuracy requirement of the deployable antenna [11]. However, the LQR controller was dependent on the accurate control model, and it was presented without considering the disturbances.

In [12], an adaptive reduced dimension fuzzy decoupling control strategy based on fusion functions was proposed to eliminate the effect uncertainties due to coupled subsystems by designing and incorporating a fuzzy decoupling disturbance observer. But the method would no longer be applicable, for the disturbances were greater than the range of the set threshold.

According to [13], the Q-learning was considered to have a good capability for dealing with uncertainties. Thus, The Q-learning algorithm was applied for solving the mobile robot navigation in the dynamic environment problem by limiting the number of states based on a new definition for the state space [14]. The proposed method could reduce the dimension of the value table and improved the efficiency of the system. A novel Q-learning method was described for unknown discrete-time (DT) linear quadratic regulation problems [15]. In the method, Q-learning was proved to be an effective scheme for unknown dynamical systems without requiring a deep understanding of the system dynamics. The proposed two methods could deal with the uncertainties from the dynamic environment, but they were unable to cope with the disturbances from the model simplifications and external interferences.

A backward Q-learning strategy was presented based on the combination with the Sarsa algorithm and the Q-learning [16]. The proposed method was found to enable converging more quickly to true values and improve final performance. A heuristic accelerated Q-learning method was developed by the heuristic function to affect action selection [17], and results of the simulation showed that a simple heuristic function could enhance the performance of the Q-learning method. The proposed two strategies could enhance the performance of the control system, but the value table of the Q-learning was only for an individual control model, respectively; it was not used for the other control model.

Therefore, the key to solve the disturbance change is to make the parameters of Q-learning adapt to the control model change. But in the traditional Q-learning algorithm, the parameters of Q-learning are difficult to adapt to the control model change because the environment where the agent is located is spatially continuous. However, discretization of continuous state-action spaces could have huge impact on the dimension of the value table. One solution to this problem is introducing the fuzzy inference system (FIS) into Q-learning, which could be applied to discrete-time state mapping to continuous state-actions without affecting the dimension of value. During the last decade, a great deal of progress has been made on the area of combination the Q-learning method with the FIS.

A model-free method was proposed by using a Q-learning scheme to tune a supervisory controller for a low-energy building system online [18]. The training time and computational demands were reduced by basing the supervisor on sets of fuzzy rules. A fuzzy Q-learning system by combining a fuzzy inference system (FIS) with Q-learning was presented [19]. The advantages of the system were that with limited expert experience, the fuzzy system could enable Sarsa () to learn a certain amount of prior knowledge to improve the learning speed online. In combination with the above proposed strategies, the parameters of the fuzzy system were enhanced greatly by introducing the FIS into the Q-learning system. Nevertheless, the capability of the value applied to the model disturbance change was not mentioned.

An extended fuzzy rule was proposed to incorporate Q-learning by Kim and Lee [20]. In the method, the interpolation technique was adopted to represent the appropriate value for the current state and action pair in each extended fuzzy rule, and then, the resultant structure based on the fuzzy inference system had the capability of solving the continuous state and action. In 2012, Song et al. focused on the neural network to initialization of value [21] so that the robot could get more optimized policy than the traditional Q-learning algorithm. A method was presented to get the navigation strategy for agricultural robot by autonomous learning [22]. This method was proved to be useful in constructing the discretization environment in Q-learning based on the information of the direction and distance of the continuously discrete obstacles in the fuzzy theory. The all above-proposed strategies were to obtain the appropriate value by introducing the FIS into Q-learning. However, the applicability of the value is not verified to the disturbance from the control model.

The numerous existing strategies were widely used to robot navigation, but few mature methods were learned for the active adjustment system of a deployable antenna panel. In this paper, a high precision active control method named fuzzy neural network Q-learning control (FNNQL) strategy is proposed to overcome the model disturbance change of the active adjustment system of the deployable antenna panel. The main idea of the FNNQL controller is that the FIS is introduced into Q-learning, and the input of Q-learning is fuzzified. Therefore, the parameters of Q-learning are learned online by the fuzzy RBF neural network, which makes the value table suitable for the disturbance change from the control model. At the same time, the robustness of the control method is improved by ensuring the precision of the deployable antenna.

The FNNQL controller has the following advantages: (1)The error of the model disturbance is reduced because the input of Q-learning is fuzzified, and the accuracy of state transition is improved(2)A fuzzy rule base of the coupled MIMO system can construct using trial-and-error by Q-learning so that the accurate relationship between the premise and consequent parameters of the fuzzy system can be gotten(3)The outputs of the fuzzy system are value and action selection of Q-learning, respectively. And the value function and action can be modified by the RBF neural network with the change of state environment

2. Problem Statement

The active adjustment mechanism of the deployable antenna mathematical model can be described as follows [11]: where stands for the displacement before the adjustment, is the external force of the adjustment, and represents the transfer matrix of the external force and actual displacement.

To study the effect of disturbance changing for the antenna precision, the model disturbance can be described by the uncertainty matrix , which was randomly generated in the range of based on the 2-norm of control matrix , defined as where is the positive constant belonging to [1, 100]. Based on Equations (1) and (2), the discrete multivariable control model with disturbance can be described as

Since the active adjustment mechanism of a deployable antenna reflector is made of several active actuators, they affected each other during the antenna reflect panel adjustment process. Therefore, it is necessary to convert the complex control system into a more simplified control system. For any MIMO control system with a full rank transfer matrix, a compensator can be designed to decouple it [23].

Assuming that is the transfer function matrix of Equation (1), then where and are unit diagonal matrices, respectively, and is the full rank matrix. Thus, a compensator can be designed to decouple the MIMO system as shown in Equation (1).

Based on a unit diagonal matrix decoupling principle, Equation (4) can be written as where is the transfer function matrix of the feed-forward compensator.

According to Equation (5), we can derive

From Equation (6), suppose the output matrix of the feed-forward compensator is , the output equation of the feed-forward compensator can be written as

Rearranging Equation (7), we obtain

Substituting Equation (8) into Equations (1) and (3), the following equation can be obtained:

Multiplying both sides of Equation (9) by , Equation (9) can be further expressed as where is the state vector, , is the number of the actuators, and is an unknown vector representing the disturbance matrix. From Equation (10), the discrete multivariable decoupled system can be decomposed into many independent single variable state equations. In this section, the middle plate of the deployable antenna panel is taken as an example to test the validity of Q-learning, as shown in Figure 3. In addition, the 25 actuators are distributed on the middle plate of the deployable antenna as shown in it.

In this section, a simulation analysis is performed to test the validity of the Q-learning controller. During the simulation, control matrix was obtained using finite element software ANSYS (version 14.5), and the raw data from tests is shown in Table 1. The disturbance matrixes can be obtained in the case of , , and , according to Equation (2), which can be seen in Tables 24, respectively. The initial value is given as

The units are mm.

In the simulation, the value in Table 2 was introduced as disturbances into Equation (10) to assess the ability of the Q-learning controller. The displacement was adjusted as shown in Figure 4. In the figure, the curves were seen converged at the 25th step after large fluctuations, which indicated the system reached stability. Figure 5 shows the root mean square (RMS) error curve for the 25 actuators, and it is found that the displacements of 25 actuators are below 10-3 mm at the 8th step. The RMS is in the micron scale, which indicates a tendency to narrow. It is proved that the value table and the action selection obtained by continuous trial-and-error can adapt to the disturbance of 15%.

In the next simulation, the value table which was obtained with disturbance of 15% was used to deal with the disturbance form in the range of 30% to 90%, respectively. The results can be shown in Figures 6 and 7. As shown in the figures, since the disturbances for the control system were changed, the excessive overshooting was caused.

That shows that the value table obtained under the condition of the certain disturbance could not adapt to other disturbance changes. If the system is to be stable under the other disturbances, it is necessary to get the value corresponding to the disturbance through continuous trial-and-error, which increases the workload.

To keep the system’s stability with the consideration of disturbance change, the FNNQL control strategy was proposed in the following section. In the proposed controller, the two parts were included. In Part I: Fuzzy Parameter Learning, the fuzzy neural network was introduced into Q-learning, and the input of Q-learning is fuzzified. Meanwhile, the trial-and-error method was used in the Q-learning system to gain the fuzzy rule base. In Part II: Fuzzy Parameter Adjustment, with the decreasing of temporal-difference (TD) error, the parameters of the membership function were updated through the fuzzy RBF neural network [24]. The value and the state-selection of Q-learning could be obtained to adjust the control model with disturbance change while the control system reached stability.

3. Fuzzy Neural Network Q-Learning Controller Design

3.1. Part I: Fuzzy Parameter Learning

In this part, since the FIS was introduced into the Q-learning, the optimal parameters of the FNNQL controller would be obtained. The structure of the FNNQL controller can be shown in Figure 8, which can be divided into five layers. The 1st layer is the system input, which is followed by the 2nd layer, where the fuzzy is illustrated. And then, the 3rd layer is the activation operation, and the 4th layer is the action selection. Finally, the output results can be obtained in the 5th layer. The detailed description on each layer will be introduced in the following paragraphs.

3.1.1. Input Layer

In this layer, to reflect the characteristics of disturbance change in real time, the error of the control state and the rate of the error change should be taken as the inputs of the FNNQL controller. According to Equations (1) and (10), the error of the control state can be described as

Based on Equation (12), the rate of the error change can be described as

According to Figure 8, the inputs in this layer are also the previous parameter of the neural network FIS (NNFIS).

3.1.2. Fuzzy Layer

The inputs of the fuzzy layer are the outputs of the1st layer. They would be fuzzified in this layer. The quantization factors and are introduced into the fuzzified equations to map the inputs to the fuzzy domain accurately effectively. The fuzzified equations for and are defined, respectively, as

In the process of fuzzification, the three fuzzy subsets are defined for and , which can be defined as and . The membership function is Gauss.

Suppose can be divided into regions and can be divided into regions, which are defined, respectively, as

Based on Equations (15) and (16), the number of the fuzzy rules can be defined as

In this paper, the fuzzy membership function can be defined as where is the width of the membership function and is the centre of the membership function. Then, the membership functions of the two fuzzy inputs and can be defined as where represents the rule number, ; and are the centre and the width of , respectively; and and are the centre and the width of , respectively. The detailed inner structure of the node in the fuzzy layer can be seen in Figure 9.

In this layer, the inputs of the control system are fuzzified. In the next step, the rate of the fuzzy rule activated should be calculated to get the accurate output of the FNNQL control system.

3.1.3. Activation Calculation Layer

First, the fuzzy rule can be supposed as where is the optimal action selection of the FNNQL control.

For measuring the usage of fuzzy rules in NNFIS, according to Equations (19) and (20), the rate of the fuzzy rule activated is defined as where represents the rule number and is the rate of the fuzzy rule activated. According to , the optimal action selection will be described with the fuzzy rule as an example in the following layer.

3.1.4. Action Selection Layer

In this layer, the accurate output of the FNNQL control system can be obtained based on the premise parameters, and each fuzzy input of the control system should obtain its optimal action. According to the -greedy policy, the equation of the action selection can be defined as where , , is the set of the random action; is the optimal action selection; is the action selection probability; and is the number of the fuzzy rule. To assess the action selection, an additional penalty function will be used in this layer as shown in where , . If the stated error is within the accuracy range, a positive reward is obtained and the action selection is enhanced correspondingly. When it is not satisfied, a negative or zero reward is obtained, and the action selection is weakened instead. With using multiple trial-and-errors and -greedy, the relation of the mapping between the fuzzy inputs and the optimal action selection can be obtained. Therefore, the fuzzy rule base can be constructed. Since the state error is caused by model disturbance, the fuzzy rules obtained with trial-and-error are consistent with disturbance change. The structure of this layer can be shown in Figure 10.

3.1.5. Output Layer

In this layer, the value table corresponding to the optimal continuous action selection can be obtained as the consequent parameters of NNFIS based on the fuzzy rule base constructed in Layer IV. According to the approximate Q-learning principle [25] and Equation (23), the value of the optimal continuous action selection can be defined as

Similarly, the value function corresponding to the optimal continuous action selection can also be represented as where is the total of all value function and is the number of the control states. Based on Equations (10), (21), and (25), the equation of the control system can be rewritten as where is the input vector of the FNNQL controller.

In Part I: Fuzzy Parameter Learning, the value function was obtained, which can be used to adjust the cluster centre and cluster width of the membership function in Part II: Fuzzy Parameter Adjustment.

3.2. Part II: Fuzzy Parameter Adjustment

According to Equation (27), the new displacement can be obtained. And then, the new error and the rate of the error can be derived based on Equations (12) and (13). The reward value vector can be obtained based on Equation (24). And then, according to the Bellman optimality theory [26], the time difference error of TD (0) can be described as where is the discount rate.

The mean square error (RMS) function, which is used to measure control system deformation accuracy, can be defined:

In the FNNQL controller, since the membership function is Gauss RBF, the centre of an ideal RBF should be in line with the location of the fuzzy input that occurs most frequently, and all widths are exactly the boundary values of the highest frequency region of fuzzy input. Therefore, it is necessary to get the most suitable centre and width by adjusting the parameters so as to get accurately the disturbance distribution.

According to Equation (29), the centre and width can be updated by the gradient descent [27, 28]. The gradient descent linear function of can be expressed as where is the gradient descent, is the step-size of the gradient descent, and is the centre of the input variables of the fuzzy rule. The gradient descent can be further expressed as

Based on Equation (29),

Based on Equation (28),

Based on Equation (26),

Based on Equations (20), (21), and (22),

Substituting Equations (33), (34), (35), and (36) into Equation (32), we obtain

Substituting Equation (37) into Equations (30) and (31), the gradient descent linear function of can be further expressed as

The gradient descent linear function of can be expressed as where

is the step-size of the gradient descent and is the cluster centre of the input variables of the fuzzy rule.

Similarly, the gradient descent linear function of can be expressed as where is the gradient descent and is the step-size of the gradient descent. The gradient descent can be further expressed as

Based on Equations (20), (21), and (22),

Substituting Equations (33), (34), (35), and (44) into Equation (43), Equation (45) can be obtained as the following:

Substituting Equation (45) into Equations (41) and (42), the gradient descent linear function of can be expressed as

The width of the rate of the error change can be expressed with the linear relation of gradient descent as where

By using the above equations, the most suitable centre and width of the membership functions were obtained to get accurately the disturbance distribution. Thus, all parameters of the FNNQL controller were optimized and adjusted, which makes the FNNQL controller have a strong fault tolerance and adaptability. Meanwhile, the FNNQL controller has a strong adaptability to deal with the disturbance change.

3.3. The FNNQL Strategy Steps

The flow chart of the FNNQL strategy is given in Figure 11.

4. Simulation

In this section, the middle plate of the 5-meter deployable antenna panel was taken as an example to test FNNQL strategy. The initial value of , the transfer matrix , and the disturbances of the uncertainties are the same as in Section 2. The parameters of the FNNQL controller can be given as follows: the step-size of the gradient descent is . The quantization factors and . The discount rate . The initial values of the centre , the error change rate of centre , the width , and the error change rate of width can be found in Tables 58.

In the case of , we can get 9 fuzzy sets for fuzzy control through some combinations. The fuzzy sets can be expressed as

To assess the effectiveness of the FNNQL controller on dealing with the disturbance change, the value functions are trained according to the initial value, which can be given in Equation (11); the training curve is shown as Figure 12.

As seen in Figure 12, after 5000 random trial-and-errors, the curves of 25 nodes were all seen a tendency to narrow and they were kept down to just ±0.03. That further proved that the Q-learning system had been familiar with the corresponding relationship between control correction and disturbance change. Moreover, the following parameters are obtained for the stable system: the value functions of the 25 actuators could also be obtained, and the value function of the 21st actuator is taken as an example, which can be found in Table 9. The centres and the error change rates of the centres, the width, and the error change rates of width of 25 actuators can be obtained in Tables 1013, respectively.

Next, the parameters which are found in Tables 913 were applied to the deployable antenna active adjustment system with different disturbances to verify the adaptability of the FNNQL controller. In the active adjustment system of a deployable antenna panel with disturbances, which could be found in Table 3, the deformation was adjusted by the FNNQL controller and the adjustment curve was shown in Figures 13 and 14. Contrast Figure 13 with Figure 6, after adjustment by the FNNQL controller, the displacements of all adjustment actuators were below 10-3 mm after the 40 adjustments, and the RMS was below 10-3 mm when the system was regarded as stable.

Similarly, the active adjustment system of a deployable antenna panel with different disturbances was found, which could be found in Table 4; the deformation was adjusted by the FNNQL controller, and the adjustment curve was shown in Figures 15 and 16, respectively. Contrast Figure 15 with Figure 7, after adjustment by the FNNQL controller, the displacements of all adjustment actuators were also below 10-3 mm after the 50 adjustments.

As seen in Figures 1416, the actuators’ displacements were adjusted by the FNNQL controller with the consideration of different disturbances and the RMS of the active adjustment system was below 10-3 mm when the system was regarded as stable. The results of the simulation in this paper indicated that the proposed method could be used to optimize the model disturbance change.

5. Conclusions

In this paper, the FNNQL control method was studied for the active deformation adjustment mechanism of a deployable antenna panel. Firstly, the error of the model disturbance was reduced because the input of Q-learning is fuzzified; the accuracy of state transition is improved. A fuzzy rule base of the coupled MIMO system was constructed using trial-and-error by Q-learning so that the accurate relationship between the premise and consequent parameters of the fuzzy system was obtained. Secondly, the width and the centre of the membership function were modified by the RBF neural network with the change of state environment. Finally, the effectiveness and the superiority of the proposed approach were demonstrated using the middle plate of the 5 m deployable antenna panel as an example. The results of the simulation indicated that the proposed method was an effective tool for the active deformation adjustment with disturbance change of the 5 m deployable antenna panel.

However, in the paper, the value function was obtained after 5000 random trial-and-errors; the computing time of the control method is long due to the fact that the number of the trial-and-errors is big. How to reduce the time complexity of the method is our future research.

Data Availability

The authors confirm that the data used to support the findings of this study are included within the article. All the data supporting the results were shown in the paper and can be requested from the corresponding author.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China through Grant 51675398 and the National Key Basic Research Program of China through Grant 2015CB857100. This work builds on the thesis of Tian Wang, and thanks are due to him for his contributions to this research.