Abstract

The rapid growth of maritime wireless communication demand and the complex offshore wireless communication environment have brought challenges to ensure the real-time and reliability of data transmission in the marine Internet of Things (MIoT). Unmanned aerial vehicles (UAVs) have great advantages in enhancing coverage and channel quality. Hence, we investigate a UAV-assisted data collection and data offloading system based on nonorthogonal multiple access (NOMA) technology in this paper. We jointly optimize the buoy-UAV association relationship, transmit powers, and the UAV trajectory to minimize the total mission completion time while ensuring data transmission requirements. We first propose a UAV trajectory optimization algorithm based on deep reinforcement learning (DRL). Then, we design a heuristic algorithm to effectively solve the subproblem of power control and the association relationship. Finally, we propose a joint optimization scheme to solve the minimization problem. Simulation results show the effectiveness of the proposed scheme.

1. Introduction

Marine environmental monitoring is indispensable with the continuous increase of human marine activities. A large amount of meteorological and hydrological data leads to an increase in the demand for maritime wireless communication [1, 2]. Buoys are widely deployed in the ocean due to their low cost and flexible deployment. With the development of technology, buoys can be used for marine environment monitoring with a variety of sensors and communication equipment and can also be powered by power supply methods such as lithium-ion batteries and solar energy [3, 4]. However, the transmit power of the buoy is limited. Traditional maritime wireless communication methods, such as land base stations and satellites, have disadvantages such as limited coverage and long transmission distance, which seriously affect the real-time and reliability of information transmission [5]. For the current five-generation (5G) and the upcoming six-generation (6G) era, it is of great significance to build an efficient and dynamic maritime communication network [6]. Therefore, the UAV-assisted wireless communication system (UWCS) has received widespread attention.

Unmanned aerial vehicles (UAVs) have the advantages of maneuverability and easy manipulation, which can be deployed on demand and enlarge coverage [7]. It is easier to establish a line-of-sight (LoS) channel and a stronger communication link with the target device, which can better deal with the variable ocean environment [8, 9]. In the marine Internet of Things (MIoT), aiming at the problem of the large number and wide distribution of buoys, UAV can act as a mobile base station, collecting data collected by buoys from the target area and offloading the data to the OBS [10, 11]. Furthermore, the limited spectrum resources of MIoT also pose a challenge to the reliability and efficiency of data transmission. Nonorthogonal multiple access (NOMA) technology is considered a promising technology in the 5G era [12]. Compared with orthogonal multiple access (OMA) technology, NOMA greatly improves the spectrum efficiency in the presence of limited spectrum resources by allowing multiple users to access simultaneously in the same channel and relying on power domain multiplexing and successive interference cancellation (SIC) decoding technology [1315].

Recently, much research has applied NOMA technology to UAV-assisted wireless communication system. Zhao et al. in [13] investigated a NOMA-assisted UAV large-scale IoT data collection system and proposed a data collection optimization algorithm. The results show that, compared with the traditional UWCS, the NOMA-based UWCS has better performance in data collection. W. Chen et al. in [16] maximized the sum rate of the UAV-assisted uplink NOMA system by jointly optimizing the UAV location, buoy sensor grouping, and power control. The simulation results show the performance gain of NOMA in the sum rate of the system. Tang et al. in [17] investigated the scenario of a UAV-assisted marine wireless communication downlink in which the UAV hovers continuously to provide services to multiple groups of ships. Obviously, the NOMA-based UWCS has great advantages in enhancing coverage and strengthening communication links.

In the above work, the optimization problem is usually formulated as a mixed-integer nonconvex problem. They can usually be divided into several subproblems, which can be solved by traditional optimization techniques and iterative algorithms [18]. However, the above solutions may have high computational complexity. Furthermore, the buoys associated with the UAV and NOMA cochannel interference vary with UAV position. The complex dynamic changes bring great challenges to the traditional convex optimization technology as well. With the development of machine learning technology, reinforcement learning (RL) is considered to be an effective solution to the high-dynamic environment [1921]. Deep reinforcement learning (DRL) solves the continuous state space problem that RL cannot solve by introducing deep neural network (DNN), such as deep -learning (DQN) and deep deterministic policy gradient (DDPG) [22]. At present, a lot of work has focused on the research of UWCS based on DRL. L. Wang et al. in [23] minimized the energy consumption of all user equipment by jointly optimizing UAV trajectories, user associations, and resource allocation. Two algorithms are proposed to effectively solve the minimization problem based on convex optimization and DRL technology, respectively. The results show that the DRL-based method is better than the convex optimization method. Zhang et al. in [24] studied the UAV lineup and user distribution change scenarios and developed a DDPG-based proactive self-regulation method for UAV networks, which is based on the proposed asynchronous parallel computing architecture. Wang et al. in [25] studied a UAV-assisted mobile edge computing system. They minimized the maximum processing delay and proposed a DDPG-based algorithm to solve the high-dimensional state space and continuous action space. However, the flight action space of actual UAV is continuous and high dimensional, which may bring dimensional disaster to traditional reinforcement learning methods (such as DQN) [23, 26]. DDPG exists the problem of overestimation. To solve the above problems, Fujimoto et al. in [27] proposed the twin-delayed deep deterministic (TD3) algorithm based on DDPG. In [28], Sun et al. considered the age of information (AoI) and energy consumption and proposed an AoI-energy-aware UAV trajectory optimization algorithm based on TD3.

In this paper, we investigate a UAV-assisted data collection and data offloading system in MIoT. Specifically, buoys are used to collect marine environment information in the sensing layer. The UAV collects the sensing information from buoys based on NOMA technology and offloads the collected data to the OBS. Our goal is to minimize the total mission completion time of the UAV by jointly optimizing the UAV trajectory, the buoy-UAV association relationship, the UAV transmit power, and the buoys transmit power. The main contributions of our paper are listed as follows. (i)We jointly consider the UAV trajectory, buoy-UAV association relationship, and transmit powers to investigate the UAV’s total mission completion time minimization problem. The above minimization problem is a mixed-integer nonconvex problem. Accordingly, we divide the total mission process of UAV into data collection stage and data offloading stage for analysis(ii)We propose a UAV trajectory optimization algorithm based on TD3 to solve the UAV trajectory coupling of data collection and data offloading since the minimization problem is a mixed-integer nonconvex problem. Furthermore, we design a heuristic algorithm to effectively solve the above problem due to the coupling between the buoy-UAV association relationship and the buoys transmit power(iii)We propose a joint TD3-based trajectory optimization, power control, and buoy-UAV association relationship scheme that effectively solves the mixed-integer nonconvex problem. The simulation results show that the proposed scheme can effectively shorten the UAV’s total mission completion time while ensuring that the data transmission requirements are met

The remainder of this paper is organized as follows. Section 2 presents the system model and the problem formulation. Section 3 briefly introduces the TD3 algorithm and proposes the TD3-based UAV trajectory optimization algorithm. Then, we design a heuristic algorithm to solve the subproblem of power control and buoy-UAV association relationship. Finally, we propose a joint optimization scheme for the minimization problem. Simulation results and conclusion are given in Sections 4 and 5.

2. System Model and Problem Formulation

2.1. Network Model

We consider a UAV-assisted MIoT system as shown in Figure 1, which includes a UAV base station, buoys, and an OBS. Each buoy senses and stores hydrometeorological data and is powered by a lithium-ion battery to ensure that it has sufficient energy to transmit data. The total mission time of the UAV is denoted as and divided into time slots. The time slot length is . The total mission process of UAV consists of two stages. The first stage is that the UAV utilizes NOMA to collect data from buoys, and the number of time slots of this stage is . The UAV is allowed to collect data from at most buoys in each time slot where . The second stage is that the UAV offloads all the collected data to the OBS after completing the first stage, and the number of time slots of this stage is . Therefore, can be expressed as

Let denote the set of all buoys. We denote that the horizontal coordinate of the UAV in the -th time slot is , where . Let denote the set of the data collection time, and denote the set of the data offloading time.

The fixed flight height of UAV is , and the flight velocity of UAV in the -th time slot is . Then, the UAV should follow the maximum flight velocity constraints, which are expressed as

where and are the maximum flight velocity and acceleration, respectively, and is the minimum flight velocity.

The horizontal coordinate of the -th buoy is . The horizontal coordinate of OBS is . If the time slot is small enough, the motion of the UAV in each time slot can be regarded as static approximately. Hence, the distance between UAV and the -th buoy at the -th time slot is , , . In the stage of data offloading, the distance between UAV and OBS at the -th time slot is . and denote the antenna heights of the -th buoy and OBS, respectively.

2.2. Transmission Model
2.2.1. Channel Model

We adopt the model of the air-to-ground channel and the two-ray path loss model [29] and give the LoS and NLoS path loss models of the buoy-UAV and UAV-OBS links, respectively.

Specifically, the channel gain of buoy-UAV and UAV-OBS links is expressed as , where denotes the average path loss in the -th time slot, expressed as

where and are the average path loss for and , . and are the excessive path loss for and paths, respectively, is the wavelength, and denotes the probability of LoS link which is expressed as where and are two constant values depending on the environment and denotes the elevation angle between the -th buoy (or OBS) and UAV, which is given by .

2.2.2. UAV Data Collection from Buoys

In this stage that , let denote the association indicator between -th buoy and UAV. means that the -th buoy is associated with the UAV in the -th time slot. Otherwise, .

In uplink NOMA system, the UAV is regarded as a receiver to receive signals from multiple buoys at the same time and allows multiple buoys to share the same channel. The SIC decoding technique is used to demodulate the received signals with different received power levels. The successfully demodulated signal is deleted from all received signals, and the later decoded signal receives less cochannel interference. Therefore, the buoy with high channel gain is usually demodulated first, and its interference comes from the buoys with worse channel gain [30, 31]. The cochannel interference of uplink transmission between the -th buoy and UAV in the -th time slot can be given by where is the set of the buoy whose channel gain is worse than the -th buoy in the -th time slot.

Hence, the signal-to-interference-noise-ratio (SINR) between the -th buoy and UAV at the -th time slot is expressed as where is the transmit power of the -th buoy in the -th time slot and is the noise power.

The transmission rate of -th buoy in the -th time slot is expressed as where is the spectrum. In order for the received signal to be demodulated successfully, the SIC demodulation condition that SINR needs to meet is as [32] where denotes the SIC threshold.

2.2.3. UAV Data Offloading to OBS

In this stage that , let denote the association indicator between UAV and OBS. means that the UAV can offload the data to the OBS at the -th time slot. Otherwise, .

The signal-to-noise-ratio (SNR) between the UAV and OBS at the -th time slot needs to satisfy the following condition: where is the transmit power of the UAV and is the SNR threshold.

The transmission rate between the UAV and OBS at the -th time slot is given by

2.3. Problem Formulation

Our goal is to minimize the UAV’s total mission completion time by jointing optimization of the buoy-UAV association relationship, UAV transmit power, buoys transmit power, and UAV trajectory. Let denote the buoy-UAV-associated variables, denote the UAV transmit power during data offloading, denote the buoys transmit power, and denote the UAV trajectory. The total mission completion time minimization problem can be formulated as

In problem , , and restrict the maximum transmit power of UAV and buoy, respectively. limits the maximum number of buoys that can be associated with the UAV in each time slot. Let denote the data size that needs to be collected in -th buoy. ensures that all data collected by the UAV is offloaded to OBS. ensures that the data collection requirements of each buoy are met. and are the UAV maximum velocity and maximum acceleration constraints, respectively. is the SIC demodulation constraint that SINR needs to meet in the data collection stage. is the SNR constraint in the data offloading. Problem is a mixed-integer nonconvex problem since it contains binary relational variables, which makes it difficult to be solved effectively.

3. Proposed Scheme

In order to solve problem , we first propose a TD3-based UAV trajectory optimization algorithm (TTO). Then, we design a heuristic algorithm to solve the power control problem while determining the buoy-UAV association relationship (PCAR). Finally, the above two algorithms are combined to effectively solve the problem .

3.1. TD3-Based UAV Trajectory Optimization

In our system, the start position of data offloading stage is similar to the end position of data collection, which is named as the transition position (TP) between two stages. Therefore, the UAV trajectories of these two stages are coupled, and the TP cannot be determined in advance. Furthermore, the UAV trajectory changes dynamically according to the requirements of data collection and data offloading. The traditional deterministic optimization method is difficult to solve the above problems [19]. Therefore, in this paper, we use an advanced DRL method, TD3, to solve the UAV trajectory optimization subproblem with the given transmit powers and association relationships. In the following, we first give the definitions of state, action, and reward and then briefly introduce the TD3.

3.1.1. State Definition

In the data collection stage, the UAV’s action is closely related to the remaining data size of buoys and the buoy-UAV association relationship. Similarly, in the data offloading stage, the UAV’s action is related to the remaining data size of UAV and the UAV-OBS association relationship. Furthermore, the UAV only collects data from buoys in the target area. Hence, we define the state space as where the variables contained in the above expression are defined as (i) denotes the set of the buoy-UAV association relationship in the -th time slot(ii) denotes the remaining data size of each buoy in the -th time slot(iii) denotes the remaining data size of UAV(iv) denotes the boundary penalty information of the UAV to judge whether the UAV position exceeds the target area in the -th time slot

3.1.2. Action Definition

Based on the above state and environment information, the UAV’s action is defined as where denotes the flight angle of UAV in the -th time slot and .

3.1.3. Reward Definition

Let denote the number of maximum total mission completion time slots. if the UAV completes the mission in time slots. Then, we design the reward function as

It can be seen from (16) that the shorter the time it takes for the UAV to complete the mission, the greater the reward it will eventually obtain.

Note that for better performance, we reduce the order of magnitude of and by orders of magnitude to be less than or equal to the order of magnitude of when calculating the state and reward.

3.1.4. TD3

The TD3 has the following advantages [27]: (i)Clipped double -learning for actor-critic: TD3 contains two critic networks. For the two target -values generated by the two critic target networks, the minimum of them is selected to suppress the overestimation problem caused by high variance, expressed aswhere is the target value which is used to update the two critic networks, is reward, is the reward discount factor, is the target policy, is the target network parameter, and is the next state (ii)Target networks and delayed policy updates: the TD3 algorithm updates the actor and its target network after a fixed number of updates to the critic network, expressed aswhere is the update parameter (iii)Target policy smoothing regularization: TD3 smoothes the estimate and reduces the error by adding a small amount of random noise to the target actor network and averaging over minibatch, expressed as

It can be seen from (12) that most of the state dimensions are related to buoys. Only two dimensions are related to the UAV’s position, two dimensions are related to OBS, and one dimension is related to UAV boundary penalty information. Therefore, there is a problem of dimension imbalance. Dimension spread technology can effectively solve this problem. We spread the above state dimensions. For example, we connect the position state dimension of UAV to a spread network composed of neurons and spread its dimension to [33, 34]. Furthermore, we set a termination flag to indicate whether UAV has completed its mission. is applied to the target value function. Hence, the -value of the target value function is after the UAV completes the mission, so as to make the critic learning performance more stable [34].

In summary, our proposed algorithm TTO is shown in Algorithm 1. The update process of TD3 is shown in Figure 2. In lines 1-3, we first initialize the network parameters and the experience replay buffer . In lines 5-17, we initialize the environment, obtain the initial state information, and set a time variable to represent the time spent by UAV to perform the mission. Moreover, a termination mark is set. indicates that UAV has not completed the mission. Then, UAV makes an action selection according to the observed state and environmental information. Specifically, UAV constantly interacts with the environment and updates the actor and critic networks. The actor network outputs the action to be executed by the UAV according to the state information . Then, the transition information is stored in . If the UAV flies beyond the target area, the action will be canceled and the boundary penalty information will be given. In the data collection stage, we remain . The data offloading stage begins when UAV completes the data collection mission. Then, we remain . if the UAV completes the data offloading mission. The episode is terminated when or .

In lines 18-28, transition information is randomly selected from to form a minibatch, which is input into the actor and critic network. The actor network calculates the corresponding according to . After selecting the target -value and smoothing the target policy according to and , the critic network minimizes the loss function and updates the critical network by the following way:

Then, the actor network is updated in the way of delayed update by the deterministic policy gradient, expressed as

Finally, the optimal trajectory of UAV is obtained by cyclic iteration until the maximum number of episodes .

1. Initialize critic networks , , and actor network with random parameters , , and .
2. Initialize target networks , , and .
3. Initialize experience replay buffer .
4. for episode =0 todo
5.  Initialize the environment and state , and the terminated flag .
6.  for epoch todo
7.   Select action , , and observe reward and next state .
8.   if the UAV flies beyond the target area then
9.    . Then cancel the UAV’s action and update
     , based on the current state.
10.   end if
11.   ifthen
12.    let , and start the data offloading.
13.   else
14.    Let , and continue the data collection.
15.   end if
16.   if then
17.    , and let .
18.   end if
19.   Store transition tuple in .
20.   ifthen
21.    Sample mini-batch of transitions from .
22.    , clip.
23.    .
24.    Update critics:
25.    .
26.    Update the actor policy by the deterministic policy gradient:
27.    .
28.    Update target networks:
29.    .
30.    .
31.   end if
32.  end for
33. end for
34. return The UAV trajectory
3.2. Power Control and Buoy-UAV Association Relationship

Given the UAV trajectory, the problem can be written as

Obviously, the problem is still a mixed-integer nonconvex problem. Given the mission completion time and the association relationships of buoy-UAV and UAV-OBS, can be transformed into a problem of maximizing the total transmission data size in the -th time slot, which can be divided into two parts.

First, in the data collection stage, it can be seen from problem that the SINR between UAV and buoys is related not only to the transmit power of buoys but also to the buoy-UAV association relationship. Therefore, in order to determine the buoy-UAV association relationship, we first introduce Lemmas 1 and 2.

Lemma 1. The UAV must be associated with the first buoys with the larger channel gain in the -th time slot.

Proof. As can be seen from , the total transmission data size depends on the summation term . Therefore, we might as well assume that the transmit power of all buoys is the maximum transmission power . The channel gain between buoys and UAV is expressed in descending order as . Obviously, selecting the first buoys with the largest channel gain can maximize the total transmission data size.

Lemma 2. The transmit power of the buoy with the largest channel gain among the buoys associated with the UAV in the -th time slot must be .

Proof. Except for the buoys transmit power, other assumptions are the same as Proof. If the UAV is associated with buoys, for the buoy with the largest channel gain, the SINR between it and the UAV is expressed as In order to meet the constraints and and maximize the total transmission data size, the value of denominator term should be as large as possible, so as to maximize the value of molecular term and make more buoys connected to UAV. Therefore, should be .
The total transmission data size of buoys in the -th time slot is expressed as Therefore, problem can be transformed into Due to the existence of cochannel interference between buoys, is still nonconvex. Therefore, we convert into the following form: Therefore, is a convex problem that can be solved by a standard convex optimization solver (such as cvxpy).

1. Input the UAV’s current position , and .
2. Input current channel gain , and sort it in descending order to obtain .
3. According to , get initial NOMA group and the number of initial associated buoys .
4. repeat
5.  Solve P2a to obtain the optimal solution and the optimal value .
6.  if the solution state is not optimal then
7.   Remove the one with the worst channel gain in the currently associated buoy.
8.   .
9.  end if
10. until
11. return and ;

The algorithm PCAR is shown in Algorithm 2. In Algorithm 2, the channel gains are first sorted in descending order. It is stipulated that the UAV is associated with buoys at most, and the first buoys with the largest channel gain are selected to form the initial NOMA group. Then, we solve . If has an optimal solution, the optimal solution and the optimal value are obtained. If has no optimal solution, the buoy with the worst channel gain is removed from the current NOMA group to form a new NOMA group. The above process is repeated until has an optimal solution. Note that the UAV is associated with at least one buoy in each time slot.

Input: The UAV’s initial position , the buoys’ position , the OBS’s position ;
Output:, , , ;
1. for episode =0 todo
2.  for epoch todo
3.   / 11-15 of Algorithm 1/
4.   ifthen
5.    Let , and obtain current channel gain .
6.    Set .
7.    ifthen
8.     Let UAV-OBS association relationship .
9.    end if
10.   else
11.    Let , and obtain current channel gain .
12.    Update and with given by performing Algorithm 2.
13.     7-10 of Algorithm 1/
14.    Update with given tranmist power and association relationship.
15.   end if
16.  end for
17. end for

Second, in the data offloading stage, the total transmission data size of UAV in the -th time slot is expressed as

Then, can be transformed into the following form:

Problem is a standard convex problem, the optimal solution of which is .

In summary, we propose a joint TTO and PCAR scheme (TTO-PCAR) to solve the problem . The TD3 agent is deployed in the OBS, and OBS maintains the communication with the UAV. During training, UAV collects data from buoys through traffic channel. Meanwhile, the UAV receives the states information of buoys through the control channel and feeds back the states information of itself and buoys to OBS. The OBS operates the proposed scheme with the above states information and sends the operation results to UAV in each time slot. Then, the UAV forwards relevant signalling (such as transmit power of buoys and buoy-UAV association relationship) to the buoy through the control channel. The specific process is shown in Figure 3 and Algorithm 3. Specifically, the UAV initial position is first given. In lines 3-12, and are obtained by Algorithm 2 according to the UAV current position in the data collection stage. is obtained according to and in the data offloading stage. Then, the UAV next position information is updated by lines 7-10 of Algorithm 1. Finally, the above process is repeated until .

3.3. Complexity Analysis

TD3 contains two actor networks and four critic networks. Hence, the computational complexity of Algorithm 1 is . is the number of the fully connected layers of actor network. is the number of the fully connected layers of critic network. and are the unit numbers in the -th layer of the actor network and the critic network, respectively. Since the UAV is associated with at most buoys in each time slot, the computational complexity of Algorithm 2 is . Hence, the computational complexity of Algorithm 3 is .

4. Simulation Results

In simulation, the considered target area is where buoys are randomly distributed. A UAV is used to collect data with a fixed height  m from the target area. The flight velocity of UAV is  m/s and  m/s. The maximum transmit power of UAV and buoys is  W and  dBm, respectively. The position of OBS is  m. The UAV is allowed to associate up to 3 buoys in each time slot, i.e., . The time slot length is  s. The data size range of each buoy is  Mbits. The spectrum is  MHz. Furthermore, our proposed algorithm TTO is based on Pytorch. For actor and critic networks, we use a fully connected DNN with two hidden layers of 400 neurons. The learning rate is 0.0001. The experience memory buffer size is 100000. The minibatch size is 256. The discount factor is 0.99. . . . Other simulation parameters are shown in Table 1.

In order to compare performance, we use the following scheme as the comparison algorithm. (i)UAV trajectory based on Fermat point (FTP) [35]: this scheme first regards each user as the vertex of a triangle to form multiple triangles. Then, the Fermat points of each triangle are taken as the hovering points of the UAV. The UAV hovers at the points in turn to collect data(ii)UAV trajectory based on circle scheme (CTS): this scheme first finds the geometric center of all users as the center of the circle and then averages the distance from all users to the center of the circle to determine the radius of the UAV trajectory(iii)UAV data collection based on OMA (TTO-OMA): this scheme refers to the UAV using OMA technology for data collection. The proposed TTO algorithm is still used to determine the UAV trajectory(iv)DRL scheme based on DQN (DTO-PCAR): this scheme uses DQN instead of TD3 in our proposed algorithm

Figure 4 shows the comparison of accumulative reward for different schemes. For the convenience of observation, we smoothed the curves. It can be seen that the proposed TTO-PCAR scheme could be convergent after 1000 episodes, while the compared TTO-OMA scheme needs 3000 episodes to be convergent. Moreover, the compared DTO-PCAR scheme cannot be convergent after 6000 episodes. Therefore, the performance of our proposed scheme is significantly better than the other two schemes. Figure 5 shows the UAV trajectory comparison with TTO-PCAR scheme under different SIC thresholds. The SIC thresholds are 10 dB, 12 dB, and 15 dB, respectively. We find that the average total mission completion time of UAV is basically the same, which is 33 s, 36 s, and 37 s, respectively. However, the UAV trajectory is closer to the farther buoy with the increase of SIC threshold. This is because when the UAV uses NOMA technology for data collection, the channel gain of the farther buoy is poor. Therefore, in order to meet the SIC constraint, the UAV will gradually fly to the farther buoys whose data has not yet been collected.

Figure 6 shows the comparison of the total mission completion time of under different spectrums. Figure 7 shows the UAV trajectory obtained by our proposed scheme with buoys and compares it with the other three schemes. It can be seen that the total mission completion time of TTO-PCAR is significantly lower than that of other schemes. In particular, the data collection time of our proposed scheme is 20 s with  MHz and that of TTO-OMA is 33 s; thus, NOMA is more efficient in data collection than OMA. This is because the designed reward (shown in Equation (16)) is related to the total transmission rate in each time slot. Second, the trajectory of UAV data collection process is fixed with the FTP and CTS scheme, resulting in the UAV trajectory of data offloading process longer. TTO-PCAR scheme takes the coupling of two stages into the consideration of UAV trajectory optimization; thus, the time of data offloading process is less. The total flight distance based on TTO-PCAR is also significantly lower than that of the other two schemes.

Figure 8 shows the UAV trajectory based on TTO-PCAR with different spectrums. It can be seen from Figure 8 that the flight distance of UAV decreases with the increase of spectrum. This is because the transmission rate of buoys is reduced with the reduction of spectrum. If the data of the buoy far from the OBS has not been collected, the agent chooses to make the UAV closer to the buoy in order to increase the transmission rate and obtain greater reward according to (16).

Figure 9 shows the total mission completion time of different schemes with different buoy numbers. FTP scheme is to find the hover points to collect data and classify the problem as a travelling salesman problem, so as to traverse the hover points. Hence, FTP takes a lot of time on UAV flight. Although CTS scheme can collect data in each time slot, it does not consider the data collection requirements of different buoys, because the UAV just flies based on circle. The proposed scheme TTO-PCAR dynamically adjusts the UAV trajectory according to the data collection requirements of different buoys. Therefore, the total mission completion time of TTO-PCAR is significantly lower than that of FTP and CTS.

5. Conclusion

This paper has investigated the joint optimization problem of the buoy-UAV association relationship, transmit powers, and the UAV trajectory for NOMA-enabled UAV data collection and offloading in MIoT. First, we propose a TD3-based UAV trajectory optimization algorithm to solve the UAV trajectory subproblem. Second, we design a heuristic algorithm to solve the subproblem of power control and buoy-UAV association relationship. Finally, we propose a joint TD3-based trajectory optimization, power control, and buoy-UAV association relationship scheme. The proposed scheme can effectively solve the mixed-integer nonconvex problem. Simulation results show that the proposed scheme significantly shortens the total mission completion time of UAV. In future work, we will investigate the problem of UAV trajectory optimization based on NOMA to shorten the time for UAV to perform mission in the MIoT.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported in part by the National Key Research and Development Program of China under Grant 2019YFE0111600, in part by the National Natural Science Foundation of China under Grant Nos. 62101089, 62002042, 51939001, and 61971083, in part by the China Postdoctoral Science Foundation under Grant Nos. 2021M700655 and 2021M690022, in part by the Liaoning Revitalization Talents Program under Grant No. XLYC2002078, and in part by the Fundamental Research Funds for the Central Universities under Grant No. 3132022231.