Abstract
In this paper, we explore how a rotarywing unmanned aerial vehicle (UAV) acts as an aerial millimeter wave (mmWave) base station to provide recharging service and radio access service in a postdisaster area with unknown user distribution. The addressed optimization problem is to find out the optimal path starting and ending at the same recharging point to cover a wider area under limited battery capacity, and it can be transformed to an extended multiarmed bandit (MAB) problem. We propose the two improved path planning algorithms to solve this optimization problem, which can improve the ability to explore the unknown user distribution. Simulation results show that, in terms of the total number of served user equipment (UE), the number of visited grids, the amount of data, the average throughput, and the battery capacity utilization level, one of our algorithms is superior to its corresponding comparison algorithm, while our other algorithm is superior to its corresponding comparison algorithm in terms of the number of visited grids.
1. Introduction
With the development of computers and robotics, various intelligent unmanned vehicles have been developed and applied to various fields. In the ocean, autonomous underwater vehicles (AUVs) have been used as both undersea mobile base stations (BSs) to assist the construction of ocean sensor networks and underwater mobile collectors to facilitate the information collection [1, 2]. In the land, unmanned aerial vehicles (UAVs) play an important role in the similar applications. In particular, UAVs have significant advantages for the rapid construction of wireless emergency communication networks in postdisaster areas.
The disasters caused by earthquake, flood, typhoon, tsunami, and tornado are often unpredictable, and their sudden occurrences will damage the terrestrial radio infrastructures and prevent the victims from communicating with each other or the outside world. Since the 72 hours after the disaster is the most critical, there is a need to deploy wireless networks for emergency communications, which can provide assistance to the rescue workers or teams in the affected areas [3].
Under these emergency communications circumstances, UAVs will be well suited as the platform equipped with small BSs to provide broadband wireless services for postdisaster areas [4, 5], where any rotarywing UAV will hover over the target area and serve the terrestrial users within a certain horizontal distance. In order to serve more terrestrial users, one effective approach is to increase the number of UAVs for collaborative coverage, which is not always available due to increased networking costs. Another alternative is to fly a UAV along a planned path, covering the target area and serving more terrestrial users.
According to the investigation of the existing works, the main challenge faced by the UAV for the path planning in the emergency communications is that the UAV does not know the distribution of terrestrial users within the affected areas. Therefore, a UAV cannot plan the optimal path in advance to maximize the service efficiency. In addition, due to the limited capacity of UAV batteries [6], it is crucial to find the energyefficient flight trajectory in UAVassisted emergency communications [7].
In response to the above challenges, the authors in [8] developed an online learning framework to solve a UAV’s path planning problem for postdisaster emergency communications, in which the UAV serves as a flying BS to serve terrestrial users along its planed path. In [8], the objective of optimization task is to maximize the sum data rate under the capacity constraints of the UAV battery, and it is modeled as an extended multiarmed bandit (MAB) problem [9], where the two path planning schemes were proposed to gradually learn an energyefficient path while serving terrestrial users within the UAV coverage.
All the above schemes work in the traditional sub6 GHz frequency bands, which is difficult to meet the demand of highthroughput communication due to the limited spectrum resources [10, 11]. Since the communication duration may be very short when a terrestrial user is in a UAV’s coverage each time, there is the increasing demand for ultrahighspeed data transmission in UAVassisted emergency communications, which cannot be satisfied by the traditional sub6 GHz frequency bands because of the limited spectrum resources. The spectrum resources in millimeter wave (mmWave) bands are very rich, which can support ultrahighspeed data transmission [12]. Therefore, UAVassisted mmWave communication is becoming one of the most prominent technologies in emergency communication systems.
However, mmWave communications usually rely on beamforming technology to overcome its inherent weakness, such as severe path loss. The beamforming training mechanism is usually used to determine the beam orientation and beam width at both the transmission end and receiving end [13]. Because of the unknown distribution of terrestrial users, it is difficult to determine the number of response slots required for the beamforming training process. If the number of response slots is selected at random, it is very likely that the number of response slots will not match the user distribution density, resulting in either waste of response slots or high probability of conflicts when terrestrial users compete for response slots, which is a new challenge that needs to be addressed in UAVassisted mmWave networks.
In addition, a user equipment (UE) carried by a terrestrial user trapped in a postdisaster area may not be recharged in a timely manner, so it should save as much battery energy as possible. Especially for an energyconstrained sensing device [14] in a postdisaster area, its residual energy level is the most critical influencing factor for ensuring communication with the outside world. With the popularization of terminals equipped with wireless recharging function, a UE can not only share battery energy each other but also conveniently harvest radiofrequency (RF) energy radiated by BSs [15]. Therefore, the UAV hovering above the target area can also act as a recharger to recharge UEs in its coverage area, and in turn, the recharged UEs can use the harvested energy to send data to the BS in the UAV.
For a UAV’s each flight, it is very desirable to both find the most effective path to serve as many UEs as possible and maximize each UE’s uplink throughput in UAVassisted wireless powered mmWave networks for emergency communications. In fact, it is difficult to achieve both goals simultaneously when the distribution of terrestrial users is unknown. Therefore, in this paper, we focus on how to improve the entire system uplink performance while satisfying the UAV’s battery energy constraint and each UE’s throughput constraint. We address these challenges in this paper, and our main contributions are as follows: (1)We propose the effective performance optimization scheme in UAVassisted wireless powered mmWave networks for emergency communications, which can improve the entire system uplink performance while satisfying the UAV’s battery energy constraint and each UE’s throughput constraint even if the distribution of terrestrial users is unknown(2)Compared with the scheme in [8], each UE makes full use of the energy it harvests to send uplink data in our scheme and thus extends its battery life due to saving its own battery energy(3)In our scheme, the number of response slots can be adjusted according to the currently observed performance results, and thus, it can match the user distribution density as the number of rounds of online learning increases
The rest of this paper is organized as follows. The related works are reviewed in Section 2. The system model and the problem statements are detailed in Section 3. The algorithms for solving the problem are described in Section 4. The performance evaluation results are presented in Section 5, while the conclusions are summarized in Section 6.
2. Related Work
In recent years, the application of UAVs has become more and more extensive, and the researching works in terms of UAVassisted networks have become a hot spot. In [7], the authors gave an overview of UAVassisted wireless communications and introduced the basic framework and channel characteristics related to UAVassisted wireless communications by narrating three cases: UAVs for improving coverage quality, UAVs for information dissemination, and UAVs acting as BSs or relays. Based on the known distribution of terrestrial users, through the joint design of sensing nodes’ scheduling scheme, power allocation strategy, and UAV flight trajectory, the scheme in [16] is aimed at minimizing the total power consumption of a UAV, while the work in [17] attempted to minimize the amount of UAVs acting as flying BSs under the condition of satisfying the coverage of ground terminals.
Some works assumed that terrestrial users follow some kind of probability distribution and explored the performance of UAVassisted networks. The authors in [18] explored how to maximize the coverage probability and system throughput in a singleUAVassisted downlink network underlaid with devicetodevice (D2D) communications, in which the distribution of D2D users follows a homogeneous Poisson point process (PPP). The work of [19] is similar to that in [18], but it considered a multiUAVassisted downlink network, in which the UAVs are employed to forward data from satellites to the terrestrial vehicles in a postdisaster area. The authors in [20] designed a prototype of UAVassisted emergency WiFibased network, which is aimed at accelerating search and rescue operation and doing onsite surveillance over a postdisaster area. By using a stochastic geometry framework, they derived analytical approximate expressions for the outage probability. In a postdisaster area, it is not always feasible to know the distribution of terrestrial users. As mentioned above, the authors in [8] developed an online learning framework to improve performance of UAVassisted wireless communications without the knowledge of the distribution of ground terminals.
Some works focused on connectivity of perceptual networks in a postdisaster area. The authors in [21] explored the deployment of UAVmounted BSs to assist rescue, where machinetomachine (M2M) communication is used to establish connectivity and send rescue messages among the human portable/wearable machinetype devices (MTDs) in either relay mode via a UAV or direct mode. For emergency communications based on heterogeneous Internet of Things, the authors in [22] explored how to accommodate the link between the UAV and the terrestrial cellular device and the D2D link outside the UAV’s line of sight area in the same frequency band. There are also some open research issues and challenges in UAVassisted emergency networks, such as intelligent distributed optimization for the trajectory and scheduling for UAVs, and interference management for UAVassisted emergency networks [23].
All the above works do not involve the application of mmWave communication technology. However, UAVenabled cellular architecture over mmWave frequency band has been recognized as one of the best solution for ondemand high data rate service. Therefore, the many existing works focused on UAVassisted mmWave communications [24–38].
The authors of [24] designed a hierarchical beamforming codebook structure with different beam widths to speed up beamforming training and tracking, which is aimed at enhancing the capacity of mmWave UAVassisted cellular networks. The authors of [25] proposed a trainingbased beam tracking scheme to keep the connection between UAV and terrestrial BSs. The authors of [26] focused on BStoUAV backhaul communications and proposed a beam tracking method, which reduces training overhead by adopting wider beam width at the cost of lower beamforming gain. The authors of [27] proposed a connectivity constraintbased path planning and beam tracking method, by which the UAV can start from a random location and reach its destination within a BS coverage by learning a trajectory while keeping better connectivity.
The works in [28–30] mainly explored the theoretical frameworks to derive the coverage probability and average throughput expression. The authors of [28] assumed that the UAVs acted as the parent nodes and were a threedimensional (3D) spatial deployed at the same height, while the UEs acted as the daughter nodes and were distributed in a Poisson cluster process (PCP) mode. In [29], the UAVs and BSs were assumed to be distributed in a PPP mode while the UEs were distributed in a PCP mode. In [30], although the UEs are modeled by using a PCP mode, downlink simultaneous wireless information and power transfer (SWIPT) scenario and uplink information transmission scenario are jointly considered.
The authors of [31] focused on network coverage and system energy efficiency optimization problem for UAVassisted mmWave cellular networks, but they did not discuss spatially moveable UAVs acting as flying BSs or dynamic UAV placement. The authors of [32] analyzed the secrecy rate performance of UAVassisted mmWave networks by utilizing Matern hardcore point process, which can also ensure the safety distance between UAVs. The authors of [33] designed a spectrum management architecture for mmWave swarm networks with UAVs acting as flying BSs, which considered the issues of interference, energy consumption, and UAV mobility. The authors of [34] derived the multicell probability and the volume spectral efficiency for UAVassisted mmWave cellular networks by using stochastic geometry.
The authors of [35] derived analytical channel models to evaluate the benefit of mmWave links associated with UAVs, which were further assessed by using outage probability. The authors of [36] investigated the rate performance, fairness, and their tradeoff in UAV swarm connected to mmWave cellular networks, where all UAVs are located in a 3D area and distributed according to a homogeneous PPP. The authors of [37] studied beamforming training and tracking for a UAVassisted mmWave system. They designed a beam training codebook according to the angular domain converted from the known location distribution of users and presented the two beam tracking methods in terms of both random and inertial user mobility to predict the beam direction.
The authors of [34] proposed a 3D location distribution model to characterize the positions of the UAVs by using stochastic geometry and considered multicell coverage probability and volume spectral efficiency as metrics to study the performance of mmWave networks with UAVs acting as aerial BSs. The beam formed by the mmWave antenna array has strong directivity and needs multiple beam scans to cover the whole region, so the authors of [38] proposed a sectoring approach to ensure coverage of the whole region. Since the side lobe gain of the antenna array is considered, it leads to substantial interference in other sectors. They set a threshold on power spillage from adjacent sectors to limit interference in concurrent transmission strategy.
The above solutions for UAVassisted mmWave cellular networks need to know the probability distribution of terrestrial users or UAVs, which does not always match the reality in the affected area. As far as we know, the work in [8] is most closely related to ours. As mentioned above, the work in [8] cannot be transplanted to a UAVassisted mmWave emergency communication network, which encourages us to study a new solution for it.
3. System Model and Problem Statement
3.1. System model
As shown in Figure 1, we consider a single rotarywing UAV acting as a flying BS to serve a postdisaster area without available ground infrastructure. The UAV takes off in the area where it can be easily recharged, and it will come back to the recharging point to recharge before the battery runs out. We assume that the UAV flies to a position and then hovers above the position to offer service to ground UEs within horizontal distance . Therefore, the UAV should alternatively fly and hover to cover the disasteraffected area. We use to denote the flying speed of the UAV and adopt to denote its hovering interval. Also, we use to denote the maximum battery capacity of the UAV, where the UAV’s engine powers for hovering and flying are denoted as and respectively.
We divide the considered area into equal size grids. Also, it is assumed that the UAV can serve all the ground UEs in the grid while hovering over the center of the grid. We use to denote the set of all the grids. Also, we adopt to represent a UAV path, where each element ( and ) denotes a grid in this area and is the recharging point. The path starts at , serves grids, and returns to to recharge. The set of all the possible paths that start and end at the recharging point is denoted as . The ground UEs are located randomly in the considered area, and these UE users need wireless access service with the same probability. We use to denote the data volume of the ground UEs within grid .
The distribution of UEs in each grid is unknown to the UAV. However, when the UAV selects to hover above the center of a grid, it can get the distribution of UEs in this grid through the mmWave beam training process. We assume that the UAV starts a mmWave beam training process at the transmission power . During the mmWave beam training process, the UEs that do not have data delivery requirement can turn on the energy harvesting mode, while those that have data delivery requirement can respond to the training initiation package by feeding back the response packages with location information. After obtaining the location information of all the UEs of the grid, the UAV can adjust its transmission beam width to cover exactly all the UEs of the grid and then start the downlink recharging process at power . We assume that these UEs only use the harvested energy to send uplink data. The optimization task can be given by
where represents the horizontal distance between the centers of grid and grid , and is the ratio of charging time in the hovering interval in grid .
3.2. TimeSlotted Structure
In general, the hovering interval consists of the beam training interval , the recharging interval , and the transmission interval . As shown in Figure 2, the beamforming training interval consists of one starting slot for training process, multiple response slots, and one confirming slot for training results, where it satisfies the following relation:
where is the size of a slot, which is a predetermined value. The number of response time slots depends on the number of UEs participating in the beamforming training. If the number of UEs is large, the response slots need to be set more accordingly.
The recharging interval and the transmission interval are closely related and bound by a communication cycle. That is, . In each communication cycle , the UAV can recharge UEs during the recharging interval , while it can receive data from UEs during the transmission interval .
To allow multiple ground UEs to communicate with the UAV during each communication cycle , we need to divide into a downward recharging slot (i.e., ) and multiple upward transmission slots (i.e., , ) based on time division multiple access mode, which is shown in Figure 3. The downward recharging slot is allocated to the UAV to transfer radio energy to its serving UEs, while the upward transmission slots are allocated to UEs for data transmission, during which each UE transmits its own data to the UAV by using the assigned upward transmission slot.
Since , , …, are the time ratios in communication cycle allocated to the UAV and its serving UEs (e.g., ) for energy transfer downward and data transfer upward, respectively, which meets the following relation:
Therefore, can be estimated by the following formula:
3.3. Beam Alignment and Signal Propagation Model
To know the distribution of UEs in a grid and align the mmWave beam between each UE and the UAV, the UAV needs to start a beamforming training process between the UAV and the UEs. Firstly, the UAV broadcasts a directional beacon frame in its mmWave beam facing the ground, where all the UEs remain in a quasiomnidirectional listening mode. Then, each UE that wants to communicate with the UAV chooses a response slot that is similar to the association beamforming training slot in IEEE 802.11ad/ay to carry out beamforming training of the UE side (i.e., it transmits the directional beacon frames in all its beams). After receiving the directional beacon frames from UEs, the UAV can obtain the beamforming training information and the location information of every UE. Finally, the UAV broadcasts a confirmation frame to announce the conflictfree UEs. By setting a reasonable number of training slots, we expect to avoid the collision of UEs’ beacon frames while keeping training time as short as possible. If there is the collision of UEs’ beacon frames, the number of training slots will be increased based on a certain strategy when the beamforming training is needed again. Otherwise, if the number of responding UEs is significantly less than the total number of response slots, it will be reduced. We employ the following mmWave signal propagation model:
where is the transmission power at the mmWave transmission beam of the UAV , which is set to ; is the received power at the mmWave link from the UAV to the UE ; is the directional transmission gain at the mmWave transmission beam of the UAV , while is the directional reception gain at ; is the channel gain at . When the mmWave beam between the UAV and the UE is aligned, is obtained by the following formula [39]:
where is the reception beam width of the UE ; is the gain of the side lobe and , while is the main lobe in radian. is obtained by the following formula [39]:
where is the transmission beam width of the UAV , and it is the main lobe in radian. The channel gain is obtained by the following formula [40]:
where δ (·) is the Dirac delta function; and are the propagation delay and the amplitude of , respectively. is obtained by the following formula:
where is the distance of , and is the speed of light. When is a lineofsight (LOS) link, the amplitude is obtained by the following formula [40]:
where is the wavelength and , is the carrier frequency. if is a nonlineofsight (NLOS) link, it will be eliminated during the beam training process, so we do not discuss its estimation method. is estimated by the following formula:
where (,,) is the threedimensional coordinate of the UAV , while (,,) is that of the UE .
3.4. Throughput Maximization for UEs in Single Grid
After obtaining the number (e.g., ) of conflictfree UEs participating in the beamforming training process, the UAV (e.g., ) wirelessly powers this set of UEs at the power , where the amount of energy harvested by each UE (e.g., ) in the downward recharging slot is obtained by the following formula:
where is the amount of energy harvested by the UE from the UAV , while denotes the recharging power of the UAV . After the UE supplement its energy during the downward phase, in the subsequent upward phase, it transfers its own data to the UAV in its allocated time slot (). Because of the recharging efficiency problem, not all of the energy harvested by each UE can be adopted for its data transmission in the upward. We use to denote the power conversion efficiency factor for the UE , . Thus, the average transmission power according to the harvested energy for the UE is given by
To simplify the formulaic expression of problem modeling, we assume . Although changing can affect the range of values, it has no effect on the overall trend of the simulation results described in the following text. Based on (12)~(13) and the Shannon theorem, the achievable upward throughput from the UE to the UAV in bits/second (bps) can be given by
where , is the mmWave bandwidth, and is expressed as follows:
where is the ambient noise at the UAV , and are the directional transmission gain, the directional reception gain, and the channel gain at the mmWave link from the UE to the UAV , respectively. From (14), it is seen that decreases with for a given . Moreover, it is observed that increases with for a given . However, due to the total time constraint in (2), and cannot be increased simultaneously. Therefore, there exists an optimal time allocation (i.e., ) for maximizing the throughput. When is less than , the throughput gets larger with ; otherwise, it gets smaller with .
For UEs, if we do not limit their lowest throughput, the “doubly nearfar” phenomenon explained in [41] will occur while optimizing their total throughput, which is generated by the downward and upward distancedependent signal propagation attenuation. Furthermore, the result generated from the commonthroughput maximization scheme in [41] deviates far from the optimal value. According to (14), the total throughput of all the UEs is given by
where is the total throughput of all the UEs of the grid served by the UAV and it is also a function of the time allocation vector . Thus, the total throughput maximum problem based on time slot threshold constraint is modeled as follows:
Lemma 1. is a concave function of for any given .
Proof. Please refer to the proof of Lemma 3.1 in [41]. ☐
Based on Lemma 1, is also a concave function of because it is the sum of . Thus, the optimization problem (17) is convex, which can be solved by using convex optimization methods. Lemma 3.2 in [41] can be used to solve the optimization problem (17), which is repeated as follows. If is more than 0, there is a unique that is the solution of , in which and . Thus, is convex function with respect to , in which the minimum value is get at with . If is more than 0 and not more than 1, there are the two different solutions for . If is more than 1, there is only one solution for , which is more than 1, i.e., .
Proposition 2. The suboptimal time allocation solution for the optimization problem (17), which is denoted by , is given by where and is the corresponding solution of .
Proof. The Lagrangian of the optimization problem (17) is given by where and are the Lagrange multipliers associated with the constraints in the optimization problem (17). Thus, the dual function of the optimization problem (17) is given by where is the feasible set of and specified by the constraints in the optimization problem (17). ☐
It is seen from the optimization problem (17) that there is a with , meeting . Therefore, the optimization problem (17) has a strong duality due to the Slater’s condition [42] and thus is a convex optimization problem, where the KarushKuhnTucker (KKT) conditions are both necessary and sufficient for the global optimality of the optimization problem (17), which are given by
where , , and represent the optimal and dual solutions of the optimization problem (17), respectively. It is easily verified that must hold for the optimization problem (17) and thus from (22) without loss of generality, we assume . Also, it is easily verified that must hold for the optimization problem (17) and thus from (24) without loss of generality, we set . From (25), it follows that
where is defined as follows:
Given , from (26), we have
It is shown that is a monotonically increasing function with respect to because for . Thus, the equality in (29) holds if and only if , i.e.,
It is worth noting that and from (21) and (30), respectively. Thus, is given by
where . Furthermore, it follows from (27)~(31) that
where is defined in (30). Because from (31), we can rewrite (32) as
where . It is seen that is more than 1 if is more than 0 and . On the basis of Lemma 3.2 in [41], there is a unique that is the solution of (33). Thus, the optimal time allocation to the downward radio energy transfer is estimated as follows:
Furthermore, on the basis of (31) and (34), the optimal time allocation to the upward radio information transmission is estimated as follows:
Therefore, it proves Proposition 2.
In order to make the suboptimal time allocation solution (i.e., the formula (18)) for the optimal problem (17) meet the minimum throughput requirement of the UE with the longest communication distance, the following relationships must be satisfied:
where the meanings of and are shown in Figure 4, and may be selected from 0.7 to 0.9 based on the actual experience. Therefore, the following relationships must be satisfied accordingly:
From Figure 4, it can be known that the groundoriented mmWave beam width of the UAV should not exceed . After the beam training interval , the UAV gets the coordinates of all the conflictfree UEs. So, it can also estimate the distance from itself to each UE according to formula (11). If the longest distance is less than , it can further narrow the beam width facing the ground, which makes the emitted energy be more concentrated and also further inhibits the adverse effects of the “doubly nearfar” phenomenon.
3.5. UAV Flight Path Planning for Throughput Optimization
For optimization task (1), the throughput optimization problem for the UEs in a single grid (e.g., the grid ) has been solved in Section 3.4, but it does not involve how to choose an optimal flight path for the UAV to maximize the throughput it receives from the UEs of all the grids in this path. In this subsection, we will address it.
The UAV is unaware of the distribution of UEs in a grid and their communication requirements until a beamforming training process is performed on a selected grid. Due to the limited battery capacity of the UAV, only a subset of all the grids can be selected to form its flight path during a flight. The selection problem of an optimized subset can be transformed into a MAB problem, where there exists a bandit with multiple arms denoted by and each arm offers a reward (e.g., , ). In this paper, the UAV is the gambler and each grid is its arm.
Based on the MAB theory, in order to maximize the total reward of all the trials, the gambler needs to explore different arms to find the most value one and also exploit the arm with most value as many times as possible so as to maximize the total reward. It is desirable to find a balance between exploration and exploitation.
Unlike the real gambler who can select each arm arbitrarily, the UAV need to fly to different grids to serve UEs. So, it needs to consider not only the battery energy consumption during a flight but also the number of potential UEs in the target grid. The former depends on the horizontal distance between its current grid and the target grid, while the latter requires a reasonable forecast of the number of UEs in the target grid. Since the classic algorithms for the MAB problem do not consider the number of potential UEs in the target grid, we extend them to solve our formulated UAV path planning task. For the distanceaware upper confidence bound (DUCB) algorithm in [8], the UAV first serves each grid once, and then, it selects a grid from according to the following formula:
where represents the index of serving rounds and the UAV serves a grid in each round, is the average reward of grid , is the times that the UAV visited grid , represents the horizontal distance between the next grid and the current one, represents the horizontal distance between the next grid and the recharging point, represents the remaining battery capacity of the UAV, and , , and are the relevant empirical parameters.
In (38), if is large, the UAV tends to exploit this highprofit grid to get the possible maximal throughput; if is large, the confidence interval decreases, and thus, the UAV tends to explore other less selected grids. In addition, represents the flight cost and it prompts the UAV to select the nearby grids to save the flight energy consumption, while represents the remaining battery level of the UAV and it prompts the UAV to select the grids near the recharging point to return to the recharging point with less energy consumption.
Obviously, formula (38) does not take into account the communication requirement of the UEs who have conflicts during the current round of beamforming training process. Ignoring such potential UEs will result in unfair distribution of access opportunities to the UAV. In addition, in order to avoid the response frame conflict of beamforming training process, the number of response slots will be increased, which in turn may make the number of response slots greatly exceed the actual number of responding UEs due to the dynamic distribution of UEs. For any grid which occurs in this case, if the number of response slots is reduced, the corresponding communication cycle will be increased. That is, this grid has the potential for greater throughput. Therefore, we propose the improved grid selection model based on distanceaware as follows:
where is the average throughput of grid during the latest rounds; is fixed to a positive integer; is the cumulative times that the UAV visited grid ; represents the horizontal distance between the next grid and the current grid ; represents the horizontal distance between the next grid and the recharging point ; represents the remaining battery energy of the UAV; and are the relevant empirical parameters; is the number of response slots with conflicting signal in the next grid , which records the information about the last conflict that occurred in this grid; is the latest number of conflictfree UEs in the next grid ; and is the latest number of total response slots in the next grid . is given by
where is the throughput of grid during the th round.
Based on formulas (39) and (40), we design the flight path planning algorithm based on improved distanceaware (FPPIDA) for a fully recharged UAV to find the most effective path to serve as many UEs as possible. The FPPIDA algorithm requires that a UAV must first visit each grid once in order to know the rewards from all the grids. After that, it can select out the next grid with the greatest potential reward according the historical average reward of each grid, the number of the served times of each grid, and the other relevant parameters. When the FPPIDA algorithm judges that the remaining energy after the UAV’s visiting the next grid is not enough to make the UAV fly back to the recharging point, it instructs that the UAV stops visiting the next grid and flies back to the recharging point. The further details for the FPPIDA algorithm are described in Section 4.
For the exploration algorithm in [8], the UAV selects a grid with the probability or selects the grid that generate maximum throughput with the probability 1. After checking the remaining battery capacity at each round , with probability , the UAV selects a grid from according to a softmax function which converts the average throughput of the grid into the probability.
where is more than 0 and it is the temperature parameter of the softmax function. If this variable goes to infinity (i.e., high temperature), all the grids will be selected by nearly the same probability. Otherwise, if this variable goes to 0 (i.e., low temperature), the probability of choosing the grid with the highest average throughput tends to 1. Also, with probability 1, the UAV selects a grid from according to the following formula:
Since the exploration algorithm tends to choose the nearby grids according to formula (42), we propose the improved grid selection model based on exploration as follows:
where and are the relevant empirical parameters. In addition, we also rewrite formula (41) as follows:
Based on formulas (40), (43), and (44), we design the flight path planning algorithm based on improved exploration (FPPI∂E) for a fully recharged UAV to find the most effective path to serve as many UEs as possible. Unlike the FPPIDA algorithm, the UAV running the FPPI∂E algorithm does not serve each grid in advance once since it is based on probability to balance exploration and exploitation, where the larger the is, the more inclined it is to exploitation. The further details for the FPPI∂E algorithm are described in Section 4.
4. Problem Solving Algorithm Description
The pseudocode for FPPIDA is listed in Algorithm 1. In Algorithm 1, after initializing some variables (see line 1), each grid must be visited once (see lines 2~8), where parameter variables associated with each grid are initialized (see lines 3~4), and then, Algorithm 2 is invoked to get the throughput for each grid and update the corresponding parameters (see line 5). After the execution of Algorithm 2, the remaining energy of the UAV is updated (see line 6), the number of times that each grid is visited is recorded, and the amount of data received by the UAV is accumulated (see line 7).

After each grid is serviced once, it is necessary to determine the next most appropriate grid to service, where the grids that meet the conditions are firstly identified (see lines 9~13), and then the most suitable one is selected (see lines 14~25). In the identification phase, if the amount of energy that the UAV takes to serve a grid and then return to a recharging point is less than its current remaining energy, this grid can be served by the UAV and thus added to the candidate set (see lines 11~12). In the selection and service phase, the algorithm can adjust the serving parameters of the currently selected grid based on its historical data (see lines 16~20), where the number of response slots should be increased accordingly if this grid has any conflicting historical record (see lines 16~17); otherwise, the number of response slots should be reduced appropriately if there are too many idle response slots (see lines 18~20).
Lines 2~8 of Algorithm 2 mainly complete the beam training task and observe and record the number of conflicting response slots and the information of nonconflicting UEs. According to the known position information of nonconflicting UEs, Algorithm 2 adjusts the beam width of the UAV (see lines 11~17), and invokes Algorithms 1–2 in [43] to get (see line 18). From this, the data collected by the UAV from the grid can be calculated (see lines 19~20).

Based on Algorithm 1 and Algorithm 2, we know that the FPPIDA algorithm needs rounds of calculation for a UAV flight path , where grids need to be calculated and compared to select the next service grid in each round, and thus, the computational complexity of the FPPIDA algorithm is . In order to get the optimal solution, we must enumerate each possible solution in the solution space to determine which is optimal, and thus, the computational complexity of obtaining the optimal solution is . Clearly, the computational complexity of the FPPIDA algorithm is significantly less than that of the optimal solution mathematically.
The pseudocode for FPPI∂E is listed in Algorithm 3, which is basically similar to Algorithm 1, except that it does not require that each grid must be served once at the beginning of algorithm execution process and its grid selection strategy (see lines 13~14 and lines 25~26) is different from that of Algorithm 1.

5. Performance Evaluation
5.1. Experimental Parameter Settings
We study the performance of our two algorithms (i.e., FPPIDA and FPPI∂E) through simulation in terms of the total number of served UEs, the number of visited grids, the amount of data, the average throughput, and the battery capacity utilization level. The battery capacity utilization level is defined as the ratio of the amount of energy used for wireless recharging and hovering to the initial amount of energy of the UAV, which is used to measure the percentage of energy to serve UEs. When it gets bigger, it means that the UAV is more about serving UEs instead of flight.
To compare the strengths and weaknesses of our two grid selection models, we consider the flight path planning algorithm based on distanceaware (FPPDA) and the flight path planning algorithm based on exploration (FPP∂E) as the comparison algorithms. FPPDA selects the next grid by formula (38) instead of formula (39), and the others are the same as those of FPPIDA. Similarly, FPP∂E selects the next grid by formulas (41) and (42) instead of formulas (43) and (44), and the others are the same as those of FPPI∂E.
The UEs are randomly distributed and the traffic load in any grid follows binomial distribution , where is the number of UEs in grid , is the possibility that the UE needs radio access service, and the possibility that each UE needs radio access service is fixed to 0.2. In addition, each UE will transfer the given amount of data when it wants to get access service, where the length of the sent data is determined randomly. If any UE gets an access service in a round but it does not complete all the data transfers in this round, it will compete for access service in each subsequent encountered beamforming training opportunity.
The key parameters of the proposed algorithms and the comparison algorithms are selected via a series of numerical simulations, and these parameters and the other main parameters are listed in Table 1.
5.2. Experimental Results and Analysis
We first consider the five simulations, which are shown in Figures 5–9. Here, the number of UEs is fixed to 1000, and the distance between the recharging point and the center of the covered area is fixed to 900 m.
We simulate the performance trend of the four algorithms as the initial amount of energy (i.e., the battery capacity) varies. From Figures 5–7, we can see that all the four algorithms can serve more UEs, visit more grids, and get more amount of data as the battery capacity increases. This is because the longer paths can be planned when the UAV has more energy.
Figure 8 shows that there is not a significant relationship interaction between the throughput and the battery capacity, which is consistent with the intuition. Still, there is a small variation trend. When the battery capacity gets large from a small value, the UAV has more opportunities to find the grids with higher throughput. However, when it continues to get large, this advantage is no longer obvious or even vanishes since it may not find the more grids with higher throughput.
From Figure 9, the battery capacity utilization level in FPPIDA and FPPDA decreases with the battery capacity, while that of FPPI∂E and FPP∂E increases with the battery capacity. In FPPIDA and FPPDA, once a grid is visited, the probability of its being visited again is reduced. Therefore, the UAV needs to fly more different grids to serve UEs, and thus, it increases its flying distance and consumes more flight energy. In FPPI∂E and FPP∂E, once the UAV has spotted a few nearby highthroughput grids, it tends to repeat the aircraft over the neighboring area since the exploration probability is very small, which is helpful to reduce the flying distance and flight energy.
Figures 5–9 show that FPPIDA outperforms FPPDA on the five performance metrics. This is mainly because we have taken into account the avoidance of conflicts in the beamforming training phase and the dynamic adjustment of response slots. From Figures 5–9, we also see that these considered factors have little effect on FPPI∂E. As mentioned above, FPPI∂E tends to make the UAV repeat the aircraft over the neighboring area once the UAV has spotted a few nearby highthroughput grids. So the effect of these considered factors is also compromised when the UAV always serve the UEs in those familiar grids. When compared with FPP∂E, FPPI∂E allows the UAV to explore more unknown grids and thus extend the service range, though it has worse performance in terms of the total number of served UEs, the amount of data, and the battery capacity utilization level.
In addition, when compared with FPPI∂E and FPP∂E, FPPIDA and FPPDA have the wider service range and thus have the better fairness. This is mainly because in FPPIDA and FPPDA, the increase in the number of times of being served will reduce the probability of being served again, which in turn limits the performance improvement in terms of the total number of served UEs, the amount of data, and the battery capacity utilization level due to more flight energy consumption. In Figures 10–14, the battery capacity is fixed as 1000, while the distance between the recharging point and the center of the covered area is fixed as 900 m. We simulate the performance trend of the four algorithms as the number of UEs varies.
From Figures 10–14, we can see that, as the number of UEs increases, four of all the five performance metrics of the four algorithms show an improved trend. This is due to the fact that the four algorithms can select out better planned paths in a highdensity user environment, which is helpful to improve the total number of served UEs, the amount of data, the average throughput, and the battery capacity utilization level. However, in terms of the number of visited grids, the change of the number of UEs has a greater impact on the instability of FPP∂E, especially when the number of UEs is larger, the UAV is more likely to be confined to a smaller range of flight. The main reason is that the exploration probability is too small in FPP∂E. For the same exploration probability, FPPI∂E overcomes this weakness of FPP∂E since it increases the visiting probability to the grids with conflicting records and thus extends the service range.
Figures 10–14 show that FPPIDA is on the whole better than FPPDA while FPPI∂E is slightly worse than FPP∂E in terms of the total number of served UEs, the amount of data, and the battery capacity utilization level. However, FPPI∂E is superior to FPP∂E in terms of the total number of visited grids. In addition, because of the larger service range, the total number of served UEs in FPPI∂E exceeds that in FPP∂E when the number of UEs distributed on the ground is large. The main reason of the above phenomena is the same as that explained in Figures 5–9.
In Figure 15–19, the number of UEs is fixed as 1000, while the battery capacity is fixed as 100. As shown in Figures 15, 17, and 19, the performances of all the four algorithms in terms of the total number of served UEs, the amount of data, and the battery capacity utilization level decrease as the distance between the recharging point and the center of the covered area increases. The main reason is that the longer distance between the recharging point and the center of the covered area requires more energy to be reserved to return to the recharging point, so there is less energy to serve the UEs when the battery capacity is fixed.
Figure 16 shows that there is no significant correlation between the number of visited grids and the distance between the recharging point and the center of the covered area, while Figure 18 shows that there is no significant correlation between the average throughput and the distance between the recharging point and the center of the covered area. In fact, the former mainly depends on the strategy function for selecting grids, while the latter mainly relies on the selected grids.
6. Conclusion
In this paper, we have addressed the performance optimization problem in UAVassisted wireless powered mmWave networks for emergency communications. The optimization task of concern is transformed into an extended MAB problem, for which we have proposed FPPIDA and FPPI∂E to solve efficiently, respectively. Our algorithms improve the ability to explore the unknown user distribution. Simulation results show that FPPIDA outperforms FPPDA in terms of all the five metrics while FPPI∂E outperforms FPP∂E in terms of the number of visited grids.
Data Availability
The simulation data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this article.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (No. 61873352).