Abstract

Wireless networks form heterogeneous wireless networks (HWNs) with overlapping signal coverage, and access selection is one of the key technologies of HWNs. Today, most access selection algorithms select appropriate networks mainly from the perspective of users while failing to consider resource allocation and optimize the overall transmission performance of HWNs. This paper proposes an access selection algorithm for HWNs based on optimal resource allocation by analyzing the wireless link transmission rate model, focusing on maximizing the HWN transmission rate, using the dynamic programming theory to derive the optimal value of bandwidth allocated to users. Experimental results show that the algorithm proposed in this paper can effectively improve network throughput and resource utilization and can connect users to the appropriate network according to QoS rate requirements.

1. Introduction

In recent years, various wireless communication technologies, such as cellular networks, wireless local area networks (WLAN), and wireless metropolitan area networks (WMAN), have achieved rapid development. Within the signal coverage of cellular networks, a variety of other wireless access networks are deployed, forming heterogeneous wireless networks (HWNs) in which multiple networks coexist with overlapping signal coverage [1, 2].

Access selection is one of the key technologies of HWNs. Their main function is to control users’ access requests and select a network that provides connection services [3, 4]. For traditional access selection algorithms, the decision parameters are mainly related to the quality of the wireless link (e.g., received signal strength (RSS) or signal to interference plus noise ratio (SINR)). The basic idea of such algorithms is that user terminals are allowed to access the network with the highest RSS and all users in the same network share the network resources relatively equally. While such algorithms are simple, too many users may end up accessing the same wireless network at once, creating insufficiency in the utilization of network resources and low network capacity [5].

Additionally, some studies comprehensively consider multiple decision parameters (e.g., RSS, bandwidth, network load, delay, delay jitter, packet loss ratio, movement speed, service price, and energy consumption) and use mathematical models to study the access selection of HWNs, such as multiple attribute decision-making [6, 7], utility theory [8, 9], game theory [10, 11], Markov decision process [12, 13], and fuzzy logic [14, 15]. These algorithms are mainly based on the principle of “always best connected” [16] and comprehensively calculate a plurality of decision parameters to obtain the ranking of comprehensive scores of candidate networks. As a result, a network with the highest score is selected from a group of candidate networks.

While the above algorithms solve the network access problem and can select a suitable network for users, they fail to optimize the performance of HWNs, which is not conducive to the effective utilization of the overall network resources. Moreover, they do not calculate the specific resource allocation. In addition, with the increase of the number of decision parameters and users, these algorithms become increasingly complex [17]. Therefore, how to optimize the utilization of wireless resources and improve the overall performance of networks is the research motivation of this paper, along with selecting the appropriate network for users.

In this paper, an optimal resource allocation access selection algorithm (ORAAS) is proposed. Firstly, the transmission rate model of a wireless link is analyzed, and the problem is modeled as maximizing the network transmission rate. Then, we use the dynamic programming theory to calculate the optimal value of the bandwidth resources allocated to each user under the maximum network transmission rate. Finally, we connect users to the appropriate network according to the bandwidth resources allocated by the network and the QoS rate requirements of users. The main contributions of this paper are as follows: (1)Divide the solution of the target problem into several interrelated phases according to the number of users and break down the problem of maximizing the system transmission rate for all users into subproblems of maximizing the system transmission rate for different users(2)Utilize the bandwidth resource allocation state of the previous user when solving each subproblem in order to calculate the optimal bandwidth resource allocation to the next user, thereby gradually obtaining the optimal bandwidth allocation value for each user at the maximum system transmission rate(3)Solve the maximum value of the network transmission rate and the optimal value of the user bandwidth resource allocation simultaneously(4)Improve the network transmission performance under the HWN access selection simulation and optimize the resource utilization efficiency through the proposed algorithm

The rest of this paper is organized as follows: Section 2 reviews the research work related to this article, Section 3 describes the system model and problem definition in detail, Section 4 solves the problem and designs the algorithm, Section 5 introduces in detail the experimental environment used for the performance analysis and discusses the experimental results, and Section 6 summarizes the paper and introduces further research work.

According to different optimization objectives, resource allocation in HWNs is currently divided into user-oriented utility optimization and system-oriented utility optimization [18, 19]. User-oriented utility optimization mainly selects parameters related to user QoS, such as RSS, bandwidth, blocking rate, and price, to evaluate the network performance, and allocates resources on the premise of optimizing user utility [20]. System-oriented utility optimization is mainly to maximize the utility of HWNs through resource allocation, traffic flow scheduling, load allocation, and other means in access selection, including maximizing the number of accessing users, maximizing network capacity, and maximizing energy conservation [21].

Jiang et al. [22] proposed different delay constraints for different services and designed a new and effective network selection strategy in a HWN communication environment, which incorporates delay in the calculation of transmission rate. Users can access the appropriate network after considering the delay requirements of different services. The algorithm greatly improves the throughput of HWNs and guarantees the QoS.

Niyato and Hossain [23, 24] proposed a game theory framework for bandwidth allocation and access control in HWNs. The optimal bandwidth allocation is obtained by calculating the Nash equilibrium of noncooperative games, which maximizes the utility of all connections in the network. Then, based on the obtained bandwidth allocation, the capacity reservation threshold is calculated by a bargaining game to meet the QoS requirements of different types of connections.

BenMimoune et al. [25] proposed an access selection and power allocation scheme in HWNs based on the Voronoi diagram to increase the number of access users and system throughput. As a result, this reduces the call blocking rate and improves energy efficiency.

Choi et al. [26] studied the interface and frequency band selection of mobile users and the power allocation of selected links in an HWN environment. In addition, the authors analyzed the parallel transmission of multiradio access, proposed a distributed joint allocation algorithm to maximize the total system capacity, and solved the bandwidth and power allocation problems of access links.

Wang et al. [27] studied the joint optimization of user connection and resource allocation in HWNs. First, the problem of user association and subchannel allocation for fixed power allocation is derived based on the graph theory, then the power allocation problem in the case of fixed user association and subchannel allocation is derived by using the convex function approximation approach, and finally, the authors proved that the proposed algorithm improves the overall network throughput through the simulation.

This paper proposes an access selection algorithm based on optimal resource allocation. The algorithm calculates the optimal value of bandwidth resources allocated to each user at the maximum network transmission rate and connects users to the appropriate network according to the user’s QoS rate requirements. The algorithm improves the network performance and resource utilization.

3. System Model and Problem Definition

3.1. System Model

Assume that a HWN includes radio access networks (RAN) with overlapping signal coverage and that there is no interference in wireless signals. Each radio access network has its own architecture, and the networks communicate with each other through core network interconnection [28]. mobile user terminals are randomly distributed in the signal ranges of HWNs, and the user terminals are multimode terminals capable of processing all radio access technologies. In addition, assume that a user terminal can access one or more wireless networks in parallel at the same time and that the access selection is controlled by the user terminal. The HWN selection scenario in this paper is shown (Figure 1).

For the convenience of explanation, the main mathematical symbols used in this paper and their descriptions are shown (Table 1).

3.2. Problem Definition

Assume that User is connected to wireless Network and that the bandwidth resource allocated by User in Network is . According to the Shannon capacity formula [22, 26], the maximum data transmission rate that can be reached by User in Network can be defined as

In Formula (1), is the network effectiveness coefficient of Network , which can be set according to the characteristics of different networks. For example, for Network with better coding, the value of its is higher than that of other networks using traditional coding. In this paper, represents the bandwidth efficiency of wireless networks, that is, the bandwidth effectiveness of each network, and . In addition, indicates the bandwidth resources obtained by User in Network , indicates the transmission power of User in Network , and is the thermal noise power spectral density.

In this paper, the user terminal supports multilink parallel access to networks, and according to Formula (1), the maximum data transmission rate that User can achieve is

Since bandwidth allocation may affect the transmission rate of users, from the perspective of the whole HWNs, it is always expected that all users receive a reasonable bandwidth in each wireless network, enabling HWNs to reach the maximum transmission rate. Therefore, the problem is defined as follows:

The first constraint condition in Formula (3) represents that the total bandwidth allocated to all users in Network should be less than or equal to the total bandwidth of Network . In addition, the second constraint condition represents that the bandwidth allocated to users should be greater than or equal to 0.

4. Problem Solving and Algorithm Design

In this section, the problem in this paper is first analyzed according to the principle of dynamic programming. Next, the calculation process of the optimal bandwidth allocation, which maximizes the network rate, is detailed. Finally, the pseudocode of the access selection algorithm is provided based on optimal bandwidth resource allocation.

4.1. Concepts to the Dynamic Programming Theory

According to the definition of the problem given in the previous section and to make HWNs obtain the maximum transmission rate, namely, making every radio access network in HWNs reach the maximum transmission rate, the maximum data transmission rate of network can be expressed as follows:

To make the network reach the maximum transmission rate, it is necessary to obtain the optimal solution of the bandwidth allocation to each user in the network. In this paper, the dynamic programming theory is used to solve this problem, with the key terms defined as follows: (i)Stage: to solve the optimal bandwidth allocation to users and maximize the network transmission rate, the problem is divided into several interrelated “stages” in order to solve it in a certain order. In this paper, for the same network, the bandwidth allocation to users can be seen as a sequential process (i.e., the bandwidth is first allocated to User 1 and then User 2, up to User ). Therefore, users are divided into stages. Each stage is to allocate bandwidth to the user in that stage. The purpose is to breakdown the problem solving process into a multistage decision-making process.(ii)State: from the dynamic programming model, “state” describes the natural state at the beginning of each stage throughout the whole research process of the problem. Usually, a stage contains several states, and the set they constitute is called the reachable state set. Here, is used to represent state variables of Stage , and is the bandwidth resources assignable by User in the network. In addition, represents the set of the reachable states at Stage . For example, when the state variable is , the set of the reachable states is . By definition, is both the starting point of the bandwidth resource state of User (i.e., Stage ) and the end point of the resource state of User for completing bandwidth allocation (i.e., Stage ). Moreover, after the state of User is determined, the bandwidth resource allocation to User and all following users are not affected by the users before User .(iii)Decision: “decision” refers to the action performed in a multistage decision-making process at the specific state of a certain stage. In this paper, the decision variable represents the bandwidth value allocated to User when the bandwidth resource state is , and for Network , it is . By definition, is the function of state variable , which can be expressed as . The value of decision variable is generally not unique but allowed within a certain range. This paper uses to represent the set of decisions allowed by User in the state , so as to obtain . For example, when the state variable , .(iv)Policy: “policy” refers to the set established by the above-mentioned decisions in order. In this paper, the bandwidth allocation policies adopted from User to User are the set of bandwidth allocation decisions from User to User in sequence, namely,

When in Formula (5), it represents a subpolicy composed of decisions at each stage from User to User , which is recorded as . When , this decision sequence is the policy of the whole process, which is recorded as , that is, a set of decisions for bandwidth allocation to all users in the network in sequence, namely,

In addition, since there are many different policies for bandwidth allocation to all users, the optimal policy is the one that maximizes the network speed among all the policies, which is recorded as , namely, (v)State transition equation: the “state transition equation” represents the transition relationship between two adjacent states from Stage to Stage , that is, the value of the state variable of Stage is determined jointly by the state variable of Stage and the decision variable of Stage . The transition relationship between the above two states can be recorded as follows:

In this paper, if the state variable of User is and the bandwidth allocation decision for User is , then the state variable of User can be obtained as follows: (vi)Index function and optimal index function value: in this paper, represents the sum of transmission rates from User to User , and according to Formula (4) and the above definition, can be expressed as a function of , , and , and satisfies the recurrence relation:

In Formula (10), indicates the stage index value of User , that is, the rate that User can reach after obtaining bandwidth. In addition, the optimal index function value achieved by adopting the optimal bandwidth allocation policy from User to User is recorded as , namely,

According to the above definition, the maximum transmission rate for a certain wireless network is determined by breaking down this problem into a multistage decision-making process (Figure 2).

According to the problem defined in Formula (4) and the principle of optimality of dynamic programming, the user bandwidth allocation policy is determined to be a sufficient and necessary condition for the optimal policy.

Theorem 1. Assume that there are users in the signal range of a network, with as the number of users. The sufficient and necessary condition for bandwidth allocation policy enables the network to reach the maximum transmission rate, with the following equation being established for any User : In Formula (12), , and is the state of User determined by the given initial state of bandwidth resources and the bandwidth allocation subpolicy from User 1 to User .
See Appendix A for the proof of the theorem.

Inference 2. If the bandwidth allocation policy is the optimal policy to maximize the network transmission rate, then for any User , its subpolicy must be the optimal policy for the subprocess with the starting point as from to . is determined by and .

See Appendix B for the proof of the inference.

4.2. Calculation of Optimal Bandwidth Allocation Value

In the above section, the stages, states, decisions, and policies are defined and explained, and the problem of determining the maximum network rate is divided into several interrelated stages according to the number of users based on the dynamic programming model. This means that a huge problem is broken down into a series of subproblems of the same type, and they are solved one by one. This section determines the user’s optimal bandwidth allocation value and the maximum network rate that can be reached according to the state transition equation, the recurrence relation of adjacent stages, and the boundary conditions.

There are two methods to solve problems based on the dynamic programming model: the reverse order method and the sequential method. This paper uses the reverse order method. First, according to the boundary condition and starting from , the optimal bandwidth resource value allocated to users in each stage and the corresponding maximum network rate value are obtained step by step, from backwards to forwards. In the solution of subproblems in each stage, the optimal solutions of subproblems in the previous stage are used. For the last user and when determining , the maximum value of the network rate for all users is obtained. According to Formula (4), the index function is the sum of rates for all users in the network, and according to the reverse order solution, its basic equation is

To solve the bandwidth allocation at the maximum network transmission rate, according to the reverse order solution, and using the theorem and inference derived in the previous section, from the last User , the following is obtained: in which, is a set of decisions to allocate bandwidth size for User determined by bandwidth resource state . By solving this problem, the optimal bandwidth resource allocation value of User and the maximum value of the network transmission rate when User is included are obtained.

For the sum of transmission rates from User to User , the following can be obtained: in which, according to Formula (9), can be obtained. In the previous step for calculating User , the function is obtained. The optimal bandwidth resource allocation value of User and the maximum network transmission rate when User and User are included are obtained by solving the User .

Similarly, for the sum of transmission rates from User to User , the following is concluded: in which , and the bandwidth resource allocation value of User and the maximum network transmission rate under the circumstance from User to User are obtained from the results of the previous User .

Similarly, up to User 1, for the sum of transmission rates from User 1 to User , the following is obtained: in which , and by solving the maximum value of the above formula, the bandwidth resource allocation value of User 1 and the maximum transmission rate of the entire network can be obtained.

Since the total bandwidth resources of the network are known (i.e., the initial state is known), and can be determined, and the value of the state can also be obtained from , so and can also be determined. The bandwidth allocation decision and transmission rate for each user can be determined step by step according to the reverse order of the above recursive process, which reveals the maximum network transmission rate.

In addition, since both the bandwidth resource state and the bandwidth allocation value are continuous real variables, the set of the reachable states and the set of the resource allocation decisions are both real number sets. In addition, the real closed interval of the set of the reachable states in each stage is . The above-mentioned problems in the simulation are solved by first discretizing and selecting the appropriate increment before calculating. In Stage , the calculation will be carried out on the point column , where is a positive integer and satisfies

4.3. Design of Access Selection Algorithm Based on Optimal Bandwidth Resource Allocation

After solving the bandwidth value allocated to users by each network, the transmission rate that can be reached by User in Network is obtained according to Formula (1). Assuming that the minimum rate requirement of User in Network is , the user is connected to the corresponding network according to the user rate requirement. If Network cannot meet the minimum rate requirement of User in the network (i.e., ), User will not access Network . On the contrary, if Network can meet the minimum rate requirement of User in the network, then that network is accessed. In particular, if multiple networks can satisfy the rate requirements after allocating bandwidth to users, users can access these networks simultaneously and in parallel. Therefore, the pseudocode of the access selection algorithm based on optimal bandwidth resource allocation is as follows:

1 Initialization , , , , ,
2 for do
3  for do
4   
5   while do
6    
7    storage and
8    
9   end while
10   
11  end for
12  
13  output
14  For do
15   read out
16   According to Formula (1), calculate and output
17   if then
18    user access the network
19   else
20    user does not access the network
21   end if
22    
23  end for
24 end for

5. Experiment and Performance Analysis

This section analyzes the performance of the algorithm proposed in this paper. First, the experimental environment is described, including the simulations and experimental parameter settings. Then, the optimal resource allocation access selection algorithm (ORAAS) is analyzed. Finally, the ORAAS algorithm is compared with the RSS algorithm.

5.1. Setting of Experimental Parameters

The simulation network environment in this paper includes one LTE eNB node, one WLAN AP node, and one WiMAX BS node. Here, the total bandwidth of WLAN network access points is , with coordinates of (0, 0); the total bandwidth of LTE network access points is , with coordinates of (200, 0); and the total bandwidth of WiMAX network access points is , with coordinates of (0, 200), and the network effectiveness coefficients for WLAN, LTE, and WiMAX are set to 0.95, 0.9, and 0.85, respectively. In addition, in this experiment, the user emission power is set to 20 mW, the thermal noise power spectral density to , and the increment of discretization processing of is set to 0.02.

To calculate the RSS between the access point and the user, the experiment uses an improved model based on the COST-231 Hata model to calculate the path loss . According to literature [29], the COST-231 Hata model is defined as follows: in which

The , , , and in the above formula represent frequency (MHz), height of access point from the ground (m), height of user from the ground (m), and distance between the access point and the user (km), respectively. This experiment sets , , and . Additionally, the value of and are small cities, while the simplified path loss model is obtained as follows:

5.2. Performance Analysis of Algorithms

Five users are first randomly generated within a circular area with a radius of 100 m, taking (100, 100) as the center of the circle. The coordinates for the five users are User 1 (101.97, 183.39), User 2 (104.37, 32.04), User 3 (97.53, 104.91), User 4 (54.97, 117.17), and User 5 (152.78,124.74) (Figure 3).

The distance from each user to each access point is calculated based on the coordinates of users and network access points. Thus, the corresponding path loss is calculated according to Formula (21), with the specific data being shown in Table 2.

After obtaining the path loss , the optimal bandwidth allocation value and the maximum data transmission rate are calculated for each user. The curves (Figures 46) show the change of wireless link capacity between each user and network with the change of bandwidth allocation. From the figures, it can be seen that the capacity of each user link is increasing logarithmically with respect to the bandwidth. At the maximum capacity of the HWN, the method given in Section 4.2 solves the optimal bandwidth allocation value of WLAN, LTE, and WiMAX networks for each user (i.e., the area marked with “” in each figure). As can be seen (Table 2 and Figures 46), for the same network access point, the closer the user is to the access point, the better the channel quality is and the more bandwidth that can be allocated. This is due to the utility generated for the same bandwidth resource in the same network by allocating the bandwidth resource to users with good channel quality being greater than that generated by allocating it to users with poor channel quality. This also conforms to the characteristics of monotonically increasing logarithmically and decreasing the marginal benefit. Therefore, the bandwidth allocation policy proposed in this paper can make the best use of the bandwidth resources of each network.

According to Figures 46 and Formula (1), the bandwidth allocation values (MHz) and the maximum transmission rates (Mbps) that can be achieved by each user in different networks are shown (Table 3).

According to the maximum transmission rates that users can achieve in each network and the minimum rate requirements of users, the network access, the utilization rate of network resources, and the maximum total capacity that the system can achieve for each user can be obtained. As shown (Table 4), “√” indicates that the user is connected to the network and “×” indicates that the user is not connected to the network. When the user’s minimum rate requirement is , User 1 chooses to connect to WiMAX; User 2 connects to WLAN and LTE; User 3 connects to WLAN, LTE, and WiMAX; User 4 connects to WLAN and WiMAX; and User 5 connects to LTE.

As can be seen (Table 5), the total bandwidth allocated to users (i.e., User 2, User 3, and User 4) accessing the WLAN is , the unused bandwidth of the WLAN network is 1.84 MHz, the resource utilization ratio is 90.8%, and the maximum rate that the WLAN network can achieve is . Users accessing LTE network include User 2, User 3, and User 5. The allocated total bandwidth is , the unused bandwidth of LTE is 2.30 MHz, the resource utilization ratio of the LTE network is 90.8%, and the capacity of the LTE network is . Users accessing the WiMAX network include User 1, User 3, and User 4. The allocated total bandwidth is , the unused bandwidth of WiMAX is 0.84 MHz, the resource utilization ratio of the WiMAX network is 91.6%, and the capacity of the WiMAX network is .

5.3. Performance Comparison of Algorithms

This section will verify the capacity changes of various wireless access networks (i.e., WLAN, LTE, and WiMAX) and their HWNs when the number of users changes and compare the ORAAS algorithm proposed in this paper with the RSS algorithm. In the ORAAS algorithm, according to the bandwidth resources allocated to users by the network, users choose to access the network that can meet their rate requirements. For the RSS algorithm, users measure the RSS of all networks and choose to access the network with the highest RSS, and all users in the same network share the bandwidth resources of the network on average. In the simulation, the users are randomly generated in the area of HWNs composed of three networks. Five users are generated each time with up to 50 users, and the users randomly fall into a circular area with a radius of 100 m centered on the coordinates (100, 100) (Figure 7). In addition, to speed up the calculation, the is set to 0.05 in the simulation.

The capacities of ORAAS and RSS algorithms are shown with different numbers of users (Table 6). As can be seen, with the increasing number of users, both the ORAAS algorithm and the RSS algorithm can increase the capacity of each network. As the ORAAS algorithm can dynamically allocate the optimal bandwidth resources to users, the ORAAS algorithm has a higher system capacity than the RSS algorithm if the number of users is the same. When the number of users reaches 50, the capacities of WLAN, LTE, WiMAX, and HWNs under the ORAAS algorithm are 151.62 Mbps, 164.80 Mbps, 79.51 Mbps, and 395.92 Mbps, respectively, while the capacities of WLAN, LTE, WiMAX, and HWNs under the RSS algorithm are 137.08 Mbps, 143.56 Mbps, 68.59 Mbps, and 349.23 Mbps, respectively. It can be seen that the ORAAS algorithm proposed in this paper has a better performance. In addition, the data given (Table 6) is used to obtain a fitting graph (Figure 8).

The change of system capacity is shown with the increase of the number of users (Figure 8). As can be seen, when the number of users is small, the capacity of each network and HWN system increases rapidly, and when the users reach a certain number, the capacity growth of each network and HWN system gradually tends to be stable, showing logarithmic growth. Comparing the ORAAS algorithm with the RSS algorithm, it can be seen that the ORAAS algorithm performs better than the RSS algorithm in the same environment, regardless of each wireless network or HWNs composed of each wireless network. This is because the ORAAS algorithm considers different situations of users to allocate bandwidth resources. Furthermore, each user can make full use of bandwidth resources in the multiaccess form, increasing the total system capacity due to a more reasonable use of resources.

The ratio of users connected in parallel to the total number of users under different user rate requirements of the ORAAS algorithm is shown (Figure 9). As can be seen, when the user rate requirement is 1 Mbps, 2 Mbps, 3 Mbps, 5 Mbps, and 8 Mbps, respectively, the proportion of users connected in parallel decreases gradually with the increase of the number of users. As the total amount of bandwidth resources in each network is limited, having more users means that each user receives less resources. When some networks cannot meet the user rate requirements, the users will not be connected to the network. In addition, when the user rate requirement is low (e.g., when ), the proportion of users connected in parallel decreases slowly (Figure 9). As the rate requirement of each user is not high here and the total bandwidth resources of the network can meet the demand of more users at the same time, more users can be connected to the same network at the same time. When the user rate requirement is high (for example, when ), it is difficult for the limited total resources of the network to meet the requirements of multiple users at the same time, causing the proportion of users connected in parallel to decrease obviously. When the number of users increases to 20, the number of users connected in parallel has decreased to 0.

6. Conclusions and Outlook

To optimize the transmission performance of HWNs in the access process, this paper proposes an access selection algorithm for HWNs based on optimal resource allocation, which models the target as the problem of maximizing the system transmission rate. The dynamic programming theory is adopted to determine the maximum value of the system transmission rate and the optimal solution of the user bandwidth resource allocation. Finally, according to the bandwidth resources allocated to users and QoS rate requirements, users are connected to appropriate networks.

This algorithm improves the transmission performance of the HWN system, but the limitation of this algorithm is that it only considers the allocation of bandwidth resources. In the future research work, the simultaneous allocation of various network resources (e.g., bandwidth, power, and time slot) in access selection should be studied further to obtain better system performance improvement and QoS optimization.

Appendix

A. Proof of the Theorem

Necessity. assuming that the bandwidth allocation policy is the optimal policy to maximize the transmission rate of the network, then For the subprocess from User to User , however, the value of its objective function depends on the initial state of the process and the subpolicy . This initial state is determined by the previous subprocess under the subpolicy .
Finding the optimal solution on the policy set is equivalent to finding the optimal solution on the subpolicy set and then finding the optimal solution of these suboptimal solutions on the subpolicy set . Therefore, Formula (A.1) can be converted to Since there is no correlation between the first item in curly brackets and the subpolicy , the following is obtained:

Sufficiency. set to any policy, and is the initial state of User , which is confirmed by , then Therefore, as long as establishes the theorem formula, then for any policy , the following is obtained: Thus, is the optimal policy. The proof is completed.

B. Proof of the Inference

The following method of disproof is used to prove the inference.

If is not the optimal policy, then

Therefore,

Thus, it contradicts the necessity of the theorem. The proof is completed.

Data Availability

MATLAB code and experimental data can be downloaded from the following link. Link: https://pan.baidu.com/s/1I_yNS6X_ce8VvAzl1Pj98g; Password: 01q2.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Guangdong Basic and Applied Basic Research Foundation (2017A030307035, 2018B030311054, and 2020A1515011528) and the Innovative Research Project of the Education Department of Guangdong Province (2017KTSCX127).