Abstract

In this paper, a multiple-cluster downlink multiple-input single-output (MISO) nonorthogonal multiple access (NOMA) system is considered. In each cluster, there are one central user and one cell-edge user. The central user has a data buffer with finite storage units, which will decode the cell-edge user’s message and store it at the data buffer. To enhance the performance of the cell-edge user, the central user operates as a relay and helps forward the message to the cell-edge user. Our objective is to maximize the long-term average sum rates for the cell-edge users by designing the beamforming vectors and online power control, under the constraints of the data buffer causality, required information rates for central users, and transmit power at the base station and central users. Based on the current buffer state and the channel state information, we propose a low-complexity online Lyapunov optimization algorithm combined with a constrained concave-convex procedure (CCCP) to solve the causal and nonconvex problem. Furthermore, we verify the asymptotic optimality of the proposed online Lyapunov optimization algorithm. Simulation results demonstrate that our proposed scheme performs better than the greedy algorithm and the orthogonal multiple access (OMA) scheme.

1. Introduction

Recently, nonorthogonal multiple access (NOMA) scheme is considered as a breakthrough key technology for the fifth generation networks and has attracted great attention [1, 2]. Combined with successive interference cancellation (SIC) and beamforming technique, the NOMA scheme allows a single base station to serve more users with higher spectral efficiency [3, 4]. NOMA also has an extension to a cooperative relaying system, where users with poor capabilities can be improved by requesting the ones with strong capabilities acting as relays, which can decode and forward (DF) the messages to the poor ones [5, 6]. By using cooperative relaying and maximum ratio combining, the spatial diversity of NOMA systems can be enhanced [7, 8]. Furthermore, Zhang et al. [9] had shown that the data buffer, which can store messages sent by the base station, applied in the cooperative relays can improve the performances of NOMA cooperative relaying systems significantly.

In the practical and causal NOMA cooperative relaying system, the current information is reachable only when it arrives, which is involved with the online resource allocation problem. In [1012], the authors manage to solve the resource allocation problem by merely utilizing the current information and applying dynamic programming algorithms [13]. Because of the high computational complexity of dynamic programming algorithms, some studies propose the Lyapunov optimization approach [1416]. The authors of [14] propose a low-complexity optimization algorithm using Lyapunov functions to achieve close-to-optimal utility performance in energy-harvesting networks. Mao et al. [15] aimed to minimize the long-term average network service cost on hybrid energy supply networks by optimizing the base station selection and power allocation. Choi and Kim [16] considered a wireless powered system, and its objective was to minimize the expected energy transmission power while stabilizing data queue for all nodes. To the best of our knowledge, the stochastic optimal control for multiple-input single-output (MISO) NOMA cooperative relaying systems has not been studied.

A cellular downlink MISO NOMA system is being considered in our study, which consists of multiple clusters. In each cluster, there are one central user and one cell-edge user. The central user has a data buffer with finite storage units and operates as a relay to help forward the message to the cell-edge user. For cell-edge users with high quality of service (QoS) requirement, by designing optimal beamforming vectors and power allocation, our objective is to maximize the long-term average achievable information bits for the cell-edge users under the constraints of required number of achievable information bits for the central users and transmit power constraints. The optimization problem is causal and nonconvex and thus hard to solve. Based on the current buffer state and the channel state information, we propose a low-complexity online Lyapunov optimization algorithm combined with a constrained concave-convex procedure (CCCP). The asymptotic optimality of the proposed online Lyapunov optimization algorithm is also verified.

The rest of this paper is organized as follows: Section 2 presents the system model in detail. Section 3 presents the proposed low-complexity online Lyapunov optimization algorithm. Section 4 verifies the asymptotic optimality of the proposed online Lyapunov optimization algorithm. Section 5 provides the simulation results. Finally, Section 6 concludes the paper.

Notations. For a vector, , , and denotes its conjugate, transpose, and Frobenius norm, respectively. and denote the expectation and the real part of x, respectively. denotes the Gaussian variable with mean a and variance b.

2. System Model and Problem Formulation

We consider a buffer-aided cooperative NOMA downlink system consisting of a base station and users as shown in Figure 1, where the base station is equipped with N antennas and each user is equipped with a single antenna. The clustering algorithm is a significant factor in the multiuser NOMA system since it will influence the system performance [1719]. In [17], a clustering algorithm was proposed based on selecting two highly correlated users with a large channel-gain difference. In [18], the authors selected users for satellite by a channel quality-based scheme and proposed a user pairing method by maximizing the minimum channel correlation between users in the same group. The authors in [19] investigated three user clustering schemes from a fairness perspective by maximizing the throughput of the worst user. However, the aforementioned studies focused on the one-time-slot problem without considering the long-term performance of the system. By employing the clustering scheme proposed in [19], we group the users into K clusters in advance and formulate a multitime-slot online stochastic optimization problem in the following. In the kth cluster, , two users are included, i.e., a central user and a cell-edge user.

The base station transmits signals to central users directly while transmits signals to cell-edge users with the cooperation of central users. We assume that the transmission time duration is partitioned into M slots with equal length of T. Each time slot is further partitioned into phases. In each time slot, the base station transmits signals to central and cell-edge users using the NOMA protocol during the first phase, and all the central users decode and forward the signals to cell-edge users using the timed-division multiple access scheme during the remaining K phases. Thus, during the first phase of the mth time slot, , the transmitted signal from the base station to the central and cell-edge users is expressed aswhere and denote the signals intended to the central and cell-edge users of the kth cluster with and , respectively, and and are the corresponding beamforming vectors. Accordingly, the received signals at the central and cell-edge users of the kth cluster arewhere and denote the channel response from the base station to the central and cell-edge users of the kth cluster, respectively, and and denote the additive Gaussian noises at the central and cell-edge users of the kth cluster.

In the kth cluster, the central user first decodes by treating as interference and then removes using successive interference cancellation to decode . The signal-to-interference-and-noise ratios (SINRs) for decoding and at the kth central user, , arewhere is the intercluster interference and

Accordingly, at the kth central user, the numbers of achievable information bits intended for the kth central and cell-edge users in the first phase arewhere and B denotes the bandwidth. At the kth cell-edge user, , the SINR for decoding is expressed aswhere

During the remaining K phases of the mth time slot, the kth central user, , decodes and forwards the signals to the kth cell-edge user in any phase among the remaining K phases. The received signal at the kth cell-edge user iswhere denotes the channel response from the kth central user to the kth cell-edge user, denotes the transmit power, and denotes the additive Gaussian noise at the kth cell-edge user. Accordingly, the number of achievable information bits transmitted from the kth central user to the kth cell-edge user iswhere the denominator K is included because the orthogonal multiple access (OMA) transmission scheme among K clusters is employed and denotes the SINR for decoding as

The central users can store the information bits at the data buffers for later transmission. We assume the data buffers at the central users have finite storage units, denoted as . Denote as the number of storage units and stored information bits in the kth central user, , at the end of the first phase of the mth time slot, . Thus, considering the data buffer’s states and channel states, the central users will transmit the information bits to the cell-edge users with the following data causal constraint:

The dynamics of data buffer is

At the kth cell-edge user, maximum ratio combining is employed to combine the signals transmitted from the base station and the kth central user. Thus, in the mth time slot, the number of achievable information bits intended for the kth cell-edge user is

For the cell-edge users with high QoS requirement, our objective is to maximize the long-term average achievable information bits for the cell-edge users under the constraints of required number of achievable information bits for the central users and transmit power constraints, which is formulated aswhere denotes the required number of achievable information bits for the kth central user in any time slot; and mean the transmit power constraints at the base station and central users, respectively; and

Problem is a stochastic optimization problem which needs all the channel state and buffer state of all time duration ahead, it is impractical and challenging. Therefore, we will solve this problem with an online algorithm.

3. Online Lyapunov Algorithm

In this section, an online Lyapunov algorithm to solve problem is proposed. The main idea is to convert the causal constraints into virtual buffers and maintain their stability.

According to the Lyapunov optimization [20], we first introduce a virtual buffer queue for the actual buffer state at the kth user as , where denotes the perturbation parameter. Specifically, Neely [20] had proved that maintaining the stability of all the virtual buffers is equivalent to meeting their causality requirements (13). Then, we define the quadratic Lyapunov function with respect to the data buffer at the kth central user as

Accordingly, the per-time-slot Lyapunov drift with respect to the data buffer at the kth central user iswhich describes the expected change in the Lyapunov function over one time slot. Based on the Lyapunov optimization framework [20], we consider the minimization of the drift-plus-penalty function rather than using the objective of (15a)–(15d) directly, which is a technique to maintain the stability of the queue and optimize the long-term average objective function in the meantime. Therefore, the Lyapunov drift-plus-penalty function is expressed aswhere ρ denotes a control parameter and

Because of the dynamics involved in , it is still difficult to minimize the Lyapunov drift-plus-penalty function (19) directly. Instead, we turn to minimize the upper bound of (19). Thus, we present the following lemma to provide an upper bound on the Lyapunov drift-plus-penalty function.

Lemma 1. For any feasible , the drift-plus-penalty function has an upper bound aswhere

Proof. From the dynamics of data buffer (13), we havewhere the identity is used.
From (22) and (23), we haveSubstituting (25) into (19), we obtain (21).
The Lyapunov algorithm only needs the current system state, and thus, we turn it into a per-slot problem and minimize the drift-plus-penalty function’s upper bound. Since and are constants, given , then , , and the expectation can be removed. Thus, the optimization problem is reduced toIn problem (26), the constraint is equivalent toUsing (3), (7), and (11), problem (26) is recast aswhere is an introduced slack variable andProblem is still nonconvex on account of the objective function, constraints (27), (28c), and (28d). To work out the problems (28a)–(28d), we employ an iterative algorithm based on CCCP [21].
Note that when , in the objective function is convex and when , is concave. To proceed, we define it by its first-order Taylor expansion around the point as [22]The right-hand side of (27) can be approximated by its first-order Taylor expansion around the point asSimilarly, the right-hand sides of (28c) and (28d) can also be approximated by their first-order Taylor expansions around the points aswhere .
Thus, in our iterative algorithm, given which is optimal in the lth iteration, we solve the following problem in the th iteration:where . Problem (33) is convex and can be solved effectively using the interior point method [23].
Now, we summarize the proposed online Lyapunov algorithm in Algorithm 1.

(1)Initialize: , ;
(2)While
Initialize: , , , ;
   Repeat:
   Solve (33) to obtain ;
  ;
   Until: Convergence;
   ; Update .
(3)End While

Remark 1. (Complexity Analysis for Algorithm1): the computational complexity for the online Lyapunov algorithm combined with CCCP is mainly from solving problem (33), which is a semidefinite programming (SDP). From [24], the complexity of solving an SDP is , where denotes the number of semidefinite cone constraints and denotes the dimension of the semidefinite cone. For problem (33), and . Thus, the complexity of solving problem (33) is . Denote the iteration number of CCCP iterations as , the complexity of the proposed Algorithm 1 is , which indicates the complexity of Algorithm 1 grows as the number of clusters increases.

Remark 2. (Application Issue of Algorithm1): actually, the expected number of users that the system can support depends on the implementation hardware. For example, under our experimental circumstance (the central processing unit is Intel Core i7-4790K with 4.0 GHz and the random access memory is 8 GB), the average time required to solve one CCCP procedure is about 0.7 ms for the scenario that , i.e., 8 users. Thus, for each time slot, is a pessimistic setting in our experiment, which aims to adapt our proposed Lyapunov algorithm to suit other conditions. Furthermore, as the unit for data transmission and scheduling, the subframe lasts for 1 ms, which is called transmission time interval (TTI) in the LTE standard [25, 26]. Therefore, in most cases, TTI requires our proposed algorithm should be executed within 1 ms. In reality, it is feasible to implement our proposed algorithm on real base stations since they have much more powerful processing ability to execute the algorithm within such quite short time slot. Specifically, our proposed algorithm aims at the downlink NOMA system considering the user clustering scenario [19]. In most existing cellular systems [1], with powerful computing hardware equipped in the base station (such as macro base station), our proposed algorithm can still be applicable even when more users are involved in the system.
From (11), if the channel response between the kth central user and the kth cell-edge user, , is a circularly symmetric complex Gaussian random variable, is unbounded given a positive because is an exponentially distributed random variable. To work out this problem, we propose a modified algorithm from Algorithm 1 to provide a feasible solution for the case when , where is defined as the maximum gain [27].
Firstly, we give the feasible set and its complement on the possible range of as and , respectively. The case is defined as an outage event. Let denote the outage probability, i.e., . When , Algorithm 1 still provides the feasible solution to problem . When , the causal constraint (12) may be violated and the solution obtained by Algorithm 1 may be infeasible. Thus, the transmit power of the kth central user is given by

Remark 3. The main idea of this modified algorithm above is that we use the buffer state to determine the transmit power of the central user in the current time slot. To maintain the causal constraint, i.e., , and the transmit power constraint (15d), we can choose according to (34) adaptively. Specifically, when the central user transmits all the messages of the data buffer, the critical value of the transmit power can be obtained by setting . To avoid the SINR reaching the infeasible region, the transmit power should be less than , which is the power limit we have predefined.

4. Performance Analysis

In this section, the online Lyapunov algorithm’s performance is analyzed. From [27], the perturbation parameters, , are defined as

The asymptotic optimality of our proposed online Lyapunov algorithm is verified as follows.

We first define the following optimization problem related to aswhere the data causal constraint (12) are replaced by constraint (36b), i.e., in a long time, the average number of data units stored at the kth central user equals to that transmitted to the kth cell-edge user.

Lemma 2. Problem is the relaxed problem of .

Proof. For any feasible solution of problem , because of the dynamics of the data buffer (13), we haveSince , , we have , i.e., (36b) is satisfied. Therefore, is the relaxed problem of , and any feasible solution for (15a)–(15b) is also feasible for .
To continue, we have the following result for problem whose proof can be found in ([20], (Appendix 4.A)).

Lemma 3. Let . There exists a solution to problem without the constraint (36b), denoted as , which satisfies the following equalities:where denotes the optimal objective value of problem , is an arbitrary small positive number, and .

Then the asymptotic optimality of our proposed online Lyapunov algorithm is verified in the following lemma.

Lemma 4. Denote and as the optimal objective values of problems and , respectively. We havewhere

Proof. Since the objective of problem is minimization of the drift-plus-penalty function, we haveFrom Lemma 1, we havewhere the last inequality comes from Lemma 3 and . Combining (41) and (42), we haveSince problem is a relaxation of problem , we have . Let , we can obtainwhere . Summing up the equations for , then dividing by M and letting , and when , the asymptotic optimality is proved.

5. Simulation Results

In this section, we provide the simulation results of our proposed algorithm. We consider the NOMA system with complex Gaussian random channels, where the channel responses are modeled as Gaussian random variables with zero-mean and variance , respectively. We assume that there are 3 clusters in the system and the base station has 4 antennas, and we set , , , and . The central users transmit signal with , and the central users’ QoS constraint is . Moreover, the greedy algorithm and the conventional OMA are used for comparison. The greedy algorithm maximizes the sum rate during each time slot as local optimal solution. For the conventional OMA scheme, time-division multiple access is utilized, which means the base station serves all the users independently for different time slots. All the simulation works are conducted in the MATLAB-based framework. To solve the convex problem, we use the CVX optimization software [28].

Figures 2 and 3 show the long-term average sum rate of cell-edge users and all users versus time slot respectively, with the control parameter . As we can see, the overall performance of the proposed online Lyapunov algorithm is superior to the greedy algorithm and conventional OMA scheme because our proposed online Lyapunov algorithm can decode and forward messages according to the channel states and buffer states, i.e., when the channel states are good and buffers have enough storage, more information bits will be transmitted; when the channel states are poor and buffers are insufficient, the information will be stored for later transmission. However, the greedy algorithm can only optimize the current sum rate without considering the channel states and buffer states.

From the perspective of time slots, Figures 2 and 3 also reveal that our proposed Lyapunov algorithm is worse than the greedy method and the OMA scheme in the first several tens of time slots while outperforms them afterwards. This is because the information of the channel states and buffer states is inadequate in the beginning, which is adverse to the online optimization in the Lyapunov algorithm. Besides, the average sum rate obtained by our proposed Lyapunov algorithm tends to be stable after approximately 400 time slots, which is the time needed to reach the saturation, i.e., rate stability [20]. In order to get more insights into how the number of users impact on that time, we further evaluate the system performance based on the proposed Lyapunov algorithm with respect to different numbers of users. As a result, for the scenarios , i.e., users, the required time slots to achieve stability are , respectively. Concretely, it takes more time slots to reach the sum rate saturation if we increase the number of users in the system because more users included will add up the number of the causal constraints (12), which requires the additional Lyapunov optimization process to achieve the stability of the system.

Figure 4 describes the data buffer dynamic versus time slot with the control parameter . As Figure 4 shows, the data buffers’ levels of the three clusters are confined within the range of under and verify that the proposed online Lyapunov algorithm can stabilize the buffer length. The weight parameter ρ aims to remain the stability of the buffer queue in the Lyapunov drift-plus-penalty function. Actually, the choice of the value of ρ depends on the specific conditions. In the problem we formulated, we figure out that setting can better strike the balance of good system performance and fast convergence rate. Thus, we suggest choosing the value of ρ based on the specific problem.

Figure 5 shows the long-term average sum rate of cell-edge users with 400 time slots versus transmit power constraints at the base station. Here, we include another two schemes, where all the central users decode and forward the signals to cell-edge users in one phase after the base station transmission (denoted as “one-phase scheme” in the legend) and the cell-edge users only receive signals from central users while the base station can transmit signals in each phase (denoted as “center-to-edge scheme”). It is seen from Figure 5 that the long-term average sum rate increases with the growth of . Moreover, Figure 5 also shows that our proposed online Lyapunov algorithm has a significant performance improvement over the greedy algorithm, the one-phase scheme and the center-to-edge scheme, especially when the transmit power is large. This can be explained as follows. For the one-phase scheme, when the central users transmit signals simultaneously in one phase, extrainterferences are introduced in the SINRs of the cell-edge users, which will cause degradation on their channels. For the center-to-edge scheme, except for the extrainterferences, the cell-edge users only receive signals from the central users, which act as relays. Thus, the amount of the transmitted information bits is reduced since the BS does not support the direct transmission to the cell-edge users. Instead, our proposed method can avoid the interference channels and obtain more effective performance by completing the transmission with phases.

6. Conclusions

Considering the NOMA system with buffer-aided cooperative relaying, we have proposed the online Lyapunov algorithm combined with the constrained concave-convex procedure to solve the causal long-term average transmission sum rate maximization problem and verified the asymptotic optimality of the proposed online Lyapunov. Simulation results have shown that the proposed online Lyapunov algorithm outperforms the greedy algorithm and the conventional OMA scheme.

Data Availability

No data were used to support this study.

Disclosure

This research was a part of the project titled Optimization Design for Physical Layer Security of Nonorthogonal Multiple Access Wireless Networks, funded by the National Natural Science Foundation of China.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61802447, in part by the Guangdong Natural Science Foundation under Grants 2018B0303110016 and 2014A030310374 and Guangzhou Science and Technology Program under Grant 201804010445.