Wireless Internet of Things: Enabling Future Generation Connectivity and Communications
View this Special IssueResearch Article  Open Access
Djordje B. Lukic, Goran B. Markovic, Dejan D. Drajic, "TwoStage Precoding Based on Overlapping User Grouping Approach in IoTOriented 5G MUMIMO Systems", Wireless Communications and Mobile Computing, vol. 2021, Article ID 8887445, 13 pages, 2021. https://doi.org/10.1155/2021/8887445
TwoStage Precoding Based on Overlapping User Grouping Approach in IoTOriented 5G MUMIMO Systems
Abstract
Downlink transmission techniques for multiuser (MU) multipleinput multipleoutput (MIMO) systems have been comprehensively studied during the last two decades. The wellknown low complexity linear precoding schemes are currently deployed in longterm evolution (LTE) networks. However, these schemes exhibit serious shortcomings in scenarios when users’ channels are strongly correlated. The nonlinear precoding schemes show better performance, but their complexity is prohibitively high for a realtime implementation. Twostage precoding schemes, proposed in the standardization process for 5G new radio (5G NR), combine these two approaches and present a reasonable tradeoff between computational complexity and performance degradation. Before applying the precoding procedure, users should be properly allocated into beamforming subgroups. Yet, the optimal solution for user selection problem requires an exhaustive search which is infeasible in practical scenarios. Suboptimal user grouping approaches have been mostly focused on capacity maximization through greedy user selection. Recently, overlapping user grouping concept was introduced. It ensures that each user is scheduled in at least one beamforming subgroup. To the best of our knowledge, the existing twostage precoding schemes proposed in literature have not considered overlapping user grouping strategy that solves user selection, ordering, and coverage problem simultaneously. In this paper, we present a twostage precoding technique for MUMIMO based on the overlapping user grouping approach and assess its computational complexity and performance in IoToriented 5G environment. The proposed solution deploys twostage precoding in which linear zero forcing (ZF) precoding suppresses interference between the beamforming subgroups and nonlinear TomlinsonHarashima precoding (THP) mitigates interuser interference within subgroups. The overlapping user grouping approach enables additional capacity improvement, while ZFTHP precoding attains balance between the capacity gains and suffered computational complexity. The proposed algorithm achieves up to 45% higher MUMIMO system capacity with lower complexity order in comparison with twostage precoding schemes based on legacy user grouping strategies.
1. Introduction
Cellular Internet of things (IoT) has been recognized as a key enabler for digital transformation and automation of almost all industries. Before 5G New Radio (5G NR), cellular networks have been mainly designed and implemented for humantype communications. Hence, the connectivity needs of industry 4.0 can be addressed only with the implementation of massive machine type communication (mMTC) 5G NR use cases. Based on current predictions, around 5 billion cellular IoT connections are expected by 2025 [1]. Multiuser (MU) multipleinput multipleoutput (MIMO) and its evolution, massive MIMO (mMIMO), have been identified as one of the most promising technologies to address the massive capacity demands in 5G networks and beyond. A combination of spatial multiplexing and transmit beamforming technique enables simultaneous transmission of independent data streams using the same radio resources and thus achieves higher throughput and spectral efficiency in MUMIMO systems [2].
The performance of MUMIMO system design is largely dependent on deployed user grouping method [3]. Actually, the use of improper user grouping strategy can allocate users with highly correlated channels into the same beamforming subgroup and thus significantly reduce system capacity. In this paper, we consider practical cellular IoT scenario when the number of users is larger than the number of transmit antennas , i.e., , which requires selection of user subsets . In general, in order to find the optimal subset of users, the complete search space of size is required, which is prohibitively complex when the number of users becomes large [4]. Several suboptimal user grouping methods have been proposed with the aim to reduce complexity. means clustering is a widely used strategy for grouping of users into the specified number of clusters, such that each user belongs to the cluster with the nearest mean [5]. However, the constraints on the cluster size cannot be imposed with means clustering. This presents an important disadvantage in MUMIMO scenario since the number of users within the cluster should be less or equal to the number of base station antennas. Also, the number of clusters needs to be specified in advance and the final results are proven to be sensitive to initial parameters, while method often terminates at a local optimum [6]. In [7], Dimic and Sidiropoulos presented a suboptimal greedy user selection algorithm which iteratively selects user with the biggest contribution to the cumulative system capacity until further increase cannot be achieved. When this approach is applied, only those users characterized with the favorable channel conditions are selected, while users with less favorable channel conditions are dropped. Such behaviour can present a problem in the case of the fixed IoT endpoint devices with relatively low throughput requirements since these may be dropped in many consecutive iterations and thus not be served for a long period of time. In [8], Tian et al. introduced the concept of overlapping user grouping (OUG) based on the greedy approach (OUGGreedy). They also demonstrated that the OUGGreedy can achieve higher capacity than existing greedy user selection algorithms and ensure that each user will be selected in at least one beamforming subgroup. Such defined user grouping strategy takes the full advantage of the favorable propagation which represents a key property in massive MIMO systems [9]. An overlapping user grouping approach based on the spectral clustering (OUGSC) has been also proposed in [8]. Spectral clustering method has many fundamental advantages comparing to the traditional means clustering. However, it also requires the number of clusters as an input. The OUGSC algorithm has reduced computational complexity but it achieves lower throughput performance than OUGGreedy algorithm [8]. This is due to the fact that OUGGreedy algorithm directly optimizes sum capacity with the greedy user selection approach. On the other hand, OUGSC algorithm uses indirect metric for channel similarity measure as a part of spectral clustering procedure [8, 10].
The joint decoding at the receiver side is not feasible in MUMIMO system since users cannot cooperate due to their random geographic location. Hence, the successful data transmission is extremely dependent on the precoding technique deployed at the base station, i.e., the ability to simultaneously send independent signals and suppress interference between users as much as possible. When channel state information (CSI) is considered known at the transmitter side (i.e., reliably estimated), the nonlinear dirty paper coding (DPC) technique [11] can completely eliminate interuser interference and achieve the maximum MUMIMO system capacity. The TomlinsonHarashima precoding (THP) [12] represents the simplified version of DPC which combines symmetric modulo operation and achieves near maximum capacity performance. Another prominent nonlinear precoding technique is vector perturbation (VP) [13], which perturbs signal data vectors intended for different users in order to achieve better orthogonalization. Thus, a more reliable decoding can be achieved on the receiver side. Low complexity user grouping strategies based on VP technique were proposed to support adaptive modulation mechanism [14, 15]. In traditional VP algorithm, where the same modulation scheme is applied for all users, perturbation signal is found via closestpoint lattice search which is the nondeterministic polynomialtime hard (NPhard) problem. The latticereductionaided (LRaided) algorithm could be used to overcome this challenge. However, THP has lower complexity and it outperforms LRaided VP in the case of the largescale MIMO application scenario [16]. Anyhow, the computational complexity of nonlinear precoding schemes significantly increases with the number of users which complicates their practical implementation.
Conversely, the linear precoding schemes with the reduced complexity are also proposed for MUMIMO systems, such as zero forcing (ZF) and block diagonalization (BD) [17]. These schemes are successfully deployed in longterm evolution (LTE) networks and can mitigate interuser interference by projecting signal of the intended user into the null space of all the other users. However, in the case of users with highly correlated channels, it is almost impossible to discriminate signals with the projection operation which results in high capacity loss. In order to enhance MUMIMO system capacity and alleviate its complexity at the same time, a combination of linear and nonlinear precoding schemes, i.e., twostage precoding scheme, is proposed in the Third Generation Partnership Project (3GPP) standardization phase for 5G NR [18, 19].
In [20], Zarei et al. proposed lowcomplexity twostage HLTHP precoding scheme which achieves performance close to the conventional THP. It was assumed that all users within the same group have identical CSI statistics. However, a concrete user grouping strategy was not considered in [20] even though it significantly contributes to the overall MUMIMO system complexity. In [21], Trifan et al. proposed twostage BDTHP precoding scheme based on the optimized means clustering with the imposed cluster size constraint and a distance metric based on the angles between users. Yet, this approach does not provide information on the channel separation between users associated with different clusters. Moreover, in this approach, user selection within the cluster is performed randomly. This can result in scheduling of users with the unsuitable mutual channel conditions and a degradation of MUMIMO system performance.
In this paper, we propose an approach in which the existing hybrid twostage precoding scheme is extended with the overlapping user grouping strategy. Also, the comprehensive analysis on its computational complexity throughput and BER performance has been conducted for mMTC 5G NR use case. Instead of further modification of means clustering, like in [21], we here adopt the overlapping user grouping method from OUGGreedy algorithm. This algorithm considers both user selection and user ordering in order to maximize MUMIMO system capacity and to ensure that users with the favorable channel conditions are assigned to multiple beamforming subgroups simultaneously. Twostage precoding technique is used afterwards to separate newly formed beamforming subgroups in the spatial domain. In the first stage, ZF scheme is used to blockdiagonalize the channel matrix, i.e., to minimize the intergroup interference. In the second stage, for each subgroup, a THP scheme is used to eliminate the interuser interference. The main difference between our twostage precoding technique and the ones proposed in [20, 21] is that calculation of precoding matrices for beamforming subgroups is done in the initial step by OUGGreedy so that linear ZF precoder can directly use them for block diagonalization which simplifies overall beamforming procedure. It should be also noticed that application of OUGGreedy algorithm yields to the significant capacity gain compared to the legacy user grouping when combined with twostage precoding technique. Also, we adopted twostage ZFTHP precoding in order to accomplish balance between the achieved capacity gains and the suffered computational complexity (i.e., in comparison to the case in which only THP is used). While the existing works on twostage precoders based on legacy user grouping strategies compare their performance only with the performance of linear precoders or twostage schemes with twostage linear precoding, we here benchmark proposed algorithm against nonlinear precoders and twostage schemes with nonlinear precoding as well. Hence, this paper also provides the comparative analysis of all precoding types combined with legacy and overlapping user grouping methods.
The rest of the paper is organized as follows. System model is introduced and user grouping problem is formulated in Section 2. In Section 3, the proposed twostage precoding scheme based on overlapping user grouping strategy is proposed. Complexity evaluation of the proposed algorithm is carried out in Section 4. Numerical simulation results and comparative analysis with the algorithms that employ existing user grouping methods and precoding schemes are presented in Section 5. Section 6 concludes this paper and presents research directions for the future work.
2. System Model and Problem Formulation
2.1. System Model
The downlink of a singlecell MUMIMO system is considered, in which a base station with a uniform rectangular antenna array of antennas simultaneously transmits data to singleantenna IoT devices (IoT users). We did not consider IoT devices equipped with multiple antennas since these are generally considered as a small and simple devices. It would not be practical to equip them with MIMO antennas because it would not provide sufficient spatial diversity between the antennas to enable effective operation. The choice of multiple antennas would demand independent RF chains per each antenna and advanced digital processing to separate the data streams. This would increase cost and complexity of IoT devices, and also increase energy consumption that is not appropriate for the battery powered devices. Channel matrix is assumed fixed during the channel coherence time and can be expressed as , where denotes matrix or vector transpose, and is the channel vector between the base station and user . As in the previous work in this area, we assume that CSI is known at the base station. Let denote as the received signal at user . The signals received by users can be written as follows: where denotes the received data for all users in a single time slot, is the precoding matrix, is the data vector intended for transmission to users where is QAM modulated data symbol of the th user with modulation order , and is the additive white Gaussian noise (AWGN) vector with zero mean and unit variance. The choice of this particular modulation scheme is made since the traditional THP precoder only applies for QAM signaling. Modified THP, which is characterized with similar complexity as traditional THP, was recently designed to support PSK modulations included in 5G standardization for millimeter wave communications [22]. Described system model operates on sub6 GHz band; hence, the traditional THP precoder is sufficient for this scenario and it also simplifies receiver design. The total power of transmitted signal is constrained to , where stands for the expectation operator and denotes matrix or vector Hermitian transpose. Throughout this manuscript, bold uppercase and lowercase symbols are used to denote matrices and vectors, respectively, and the normal symbols are used to represent scalars.
In many urban mMTC 5G NR use cases, IoT devices are located indoor, whereas macrobase station is located outdoor. Hence, we here consider that base station communicates with users over the spatially correlated Rayleigh channels characterized with the nonlineofsight (NLOS) propagation [8].
In the considered scenario, base station is elevated and free of local scattering, which results in high correlation among the transmit antennas. We model spatial correlation matrix at the transmitter using the onering MIMO channel scattering model shown in Figure 1, which was firstly employed by Jakes [23] and adopted in [24]. Let be the azimuth angle of the user located at distance from base station and surrounded by a ring of scatterers with radius . From Figure 1, it follows that angular spread of transmitted signal can be approximated as . Spatial correlation coefficient between transmit antennas is modelled as follows [24]: where is the vector for a planar wave impinging the transmit antenna array with the angle of arrival (AoA) , is the wavelength that corresponds to carrier frequency , and , are vectors indicating the position of base station antennas in twodimensional (2D) coordinate system.
From Equation (2), it can be verified that is a normal matrix which can be eigendecomposed as follows: where represents a unitary matrix composed of the eigenvectors of and is a diagonal matrix whose elements are eigenvalues of .
IoT devices located indoor usually experience fluctuation of the received signal power due to the obstacles on the transmission path, i.e., shadow fading. The channels of geographically proximate devices are significantly correlated when affected by the same shadowing. Spatial correlation of the channels between users is modelled using Gudmundson’s model defined in [25] and adapted for IoT networks in [26] as follows: where denotes the distance between users and , is the standard deviation of shadow fading, and is the correlation distance, i.e., distance at which correlation drops to 0.5. is also a normal matrix with eigendecomposition similar to Equation (3) where unitary matrix and diagonal matrix include the corresponding eigenvectors and eigenvalues of , respectively. We here adopted Kronecker correlation model [27], which assumes complete correlation separability between transmitter and receiver. Hence, channel matrix can be expressed as follows: where is an uncorrelated Rayleigh channel matrix whose elements are independent and identically distributed (i.i.d.) complex Gaussian random variables with zero mean and unit variance. Substitution of decomposed spatial correlation matrices at transmitter (Equation (3)) and receiver (Equation (5)) in Equation (6) gives the following channel matrix expression:
2.2. User Grouping Problem Formulation
The performance of MUMIMO system largely depends on the channel correlation among the users included in the same beamforming subgroup. Hence, the proper user grouping is necessary in order to suppress interuser interference and maximize system capacity.
Let denote the whole set of users clustered into subgroups. Deterministic MIMO channel capacity for each beamforming subgroup , is defined as [28]: where denotes the vector 2norm operator. Parameters symbolize the power allocation factors derived from the waterfilling algorithm [29]: where is the operation defined as and is the water level satisfying and is the effective channel gain after beamforming procedure: which represents the th eigenvalue of the effective channel matrix [3].
Different user selections for beamforming subgroups give different values of Equation (8). Furthermore, different user ordering within the same beamforming subgroup also yields different MUMIMO sum capacity. In general, user grouping strategy depends on the channel matrix and the transmitted signal power . Thus, we define the optimal user grouping method as the one that maximizes MUMIMO system capacity. The corresponding optimal power allocation defined by gives the maximum sum capacity under the user grouping strategy . Putting all together, the optimal user grouping problem can be formulated as in [8]: subject to and . As can be seen from the previous expression, the sum capacity can be optimized with respect to the overlapping among beamforming subgroups and power allocation when solving the optimization problem (Equation (12)).
3. TwoStage Precoding Based on Overlapping User Grouping Approach
The system model for the proposed twostage precoding scheme based on overlapping user grouping strategy is depicted in Figure 2.
User grouping is achieved by employing the overlapping method from OUGGreedy algorithm introduced in [8]. Let be the set of users that have been assigned in iteration and be the set of remaining users that have not been selected yet. In each iteration, algorithm selects users from in order to form the subgroup which gives the maximum capacity defined in Equation (8) with the corresponding waterfilling power allocation. This procedure is known as zero forcing with user selection (ZFS) [7] and is repeated until all users are assigned to their respective beamforming subgroups. In the next step, the searching space of subgroup is widened to the users that have been already assigned to one of the previous subgroups. More specifically, the searching space for subgroup obtained from ZFS algorithm is reset as follows: to perform the overlapping user grouping [8]. Using the extended searching space, users with the favorable channel conditions are reselected and assigned to several beamforming subgroups at the same time. Accordingly, we obtain the set of overlapping user groups and corresponding set of matrices where denotes the rowreduced channel matrix which includes channel vectors of users selected in beamforming subgroup .
Once users are grouped according to the OUGGreedy algorithm, linear ZF precoding scheme is applied to suppress interference between already formed beamforming subgroups. For this purpose, precoder with is designed to null offdiagonal elements of the effective ZF channel matrix:
In order to cancel intergroup interference, the effective ZF channel matrix from Equation (14) must be diagonalized, i.e., for every . This is possible when precoding matrix for each beamforming subgroup is a MoorePenrose pseudoinverse of the rowreduced channel matrix [30].
Hence, the user data in each beamforming subgroup is ideally transmitted in the null space of the channel matrix made of channel vectors related to users from all other subgroups. However, it should be noticed that it is not necessary to determine previous Equation (15) since the corresponding precoding matrices are already obtained in OUGGreedy algorithm when calculating Equation (8). This leads to simplified ZF precoding which only includes multiplication of precoding matrices obtained from OUGGreedy algorithm with the corresponding rowreduced channel matrices.
After the ZF precoding technique is performed, remaining interuser interference in each beamforming subgroup is mitigated by using the nonlinear THP precoding scheme. THP precoded signal for beamforming subgroup is given by as shown in Figure 3. Thus, is an unitary feedforward matrix obtained from LQ decomposition of th diagonal element of the effective channel matrix , and is a data vector whose elements are calculated according to the following: where represents the feedback matrix, denotes the identity matrix, and is a symmetric modulo function which limits transmitted power of modulated data symbols and ensures that they lie inside the Voronoi region of the original constellation and is given by the following: with and representing the real and imaginary part of complex number . The main purpose of the feedback matrix is to cancel the interference caused by already detected data symbols and is defined as follows: where is the diagonal scaling matrix and is the lower triangular matrix derived from LQ decomposition of . Hence, THP precoding matrix for beamforming subgroup can be expressed as follows:
As can be seen, the proposed hybrid mechanism is based on twostage precoding. The first stage consists of the linear precoder used to eliminate intergroup interference. To suppress interference inside every group, the nonlinear precoding is employed in the second stage. In other words, beamforming matrix consists of two parts: where denotes the linear ZF beamforming matrix and is the cumulative nonlinear THP beamforming matrix.
The achievable sum rate of the proposed algorithm is calculated as where represents channel capacity for overlapped beamforming subgroup defined as in [28]: which is equivalent to Equation (8) and derived for the case of perfect MIMO channel estimation. Previous formula was used for the performance evaluation of all existing algorithms and the proposed one in Section 5.
4. Computational Complexity Analysis
Computational complexity is an important design parameter, especially in implementation of IoToriented 5G systems where a massive number of IoT devices have limited battery lifetime. This section covers complexity analysis of the proposed scheme with twostage ZFTHP precoding based on overlapping user grouping approach (marked as OUG ZFTHP algorithm). In order to achieve this, the computational complexity for deployed overlapping user grouping method and twostage ZFTHP precoding is derived. The total computational capacity is defined as the sum of these two parts (excluding the calculations from the prior steps that can be reused in the former steps). Also, in order to compare computational complexity for the proposed and the referent algorithms, the complexity for these algorithms is given. As the referent algorithms, we here observed previously introduced OUGGreedy grouping with the linear ZF precoding (marked as OUGGreedy ZF algorithm) proposed in [8], twostage BDTHP precoding based on the optimized means clustering (marked as means BDTHP algorithm) proposed in [21], and linear ZFS algorithm proposed in [7]. We here also consider scheme with THP precoding based on overlapping user grouping strategy (marked as OUG THP), a combination that was not previously observed in the literature. More on this referent scheme is given in the next section where the capacity performance analysis is presented. Since all these algorithms also comprise user grouping and the precoding part, the computational complexity is presented for the both of these parts separately, and the total complexity is given as the sum of these two (in the same way as for the proposed algorithm).
The complexity for all the observed algorithms is quantified by the number of floatingpoint operations (FLOPs) [30] required for multiplication (division) and addition (subtraction) of complexvalued numbers. For the sake of accuracy, we use a common assumption and count each complexvalued multiplication as 6 FLOPs and each complexvalued addition as 2 FLOPs. Also, computational complexity required for precoded data vectors is considered, where represents the channel coherence time interval.
First, we consider complexity of the overlapping user grouping method. For the sake of brevity, it was assumed that each beamforming subgroup has users. Derivation of the effective channel gains represents the most computationally expensive operation in the overall OUGGreedy algorithm [7]. As an alternative to Equation (11), the simplified sequential waterfilling (SWF) approach for channel matrix pseudoinverse Equation (15) and channel gain Equation (11) calculation is introduced in [4] as follows: where is a projection matrix onto the orthogonal complement of the subspace spanned by the channels of the currently selected users in that beamforming subgroup. Vectormatrix multiplication in Equation (21) requires FLOPs [31]. In the worstcase scenario, this procedure is performed for all users in iterations. Repetition over beamforming subgroups gives the total number of FLOPs: which can be formulated after substitution as follows:
Hence, the computational complexity of the overlapping user grouping strategy is no more than which is of the same order as ZFS with SWF mechanism [4] and twoorder simpler comparing to the conventional capacitybased ZFS algorithm [32] as outlined in Table 1. In practice, the number of beamforming subgroups will be more than 2; thus, OUGGreedy algorithm also outperforms optimized means clustering method used by means BDTHP algorithm [21].

Next, we derive the computational complexity of twostage ZFTHP precoding scheme. Matrixmatrix multiplication is executed in order to obtain diagonal elements of the effective channel matrix which requires FLOPs. Note that complexity of calculating MoorePenrose pseudoinverses has been already evaluated as part of the user grouping procedure. LQ decomposition of matrix requires approximately FLOPs [33]. Calculation of diagonal scaling matrix requires FLOPs which is used for generating feedback matrix with complexity of FLOPs. Subtracting identity matrix from feedback matrix requires FLOPs. Calculating data vectors requires FLOPs [34]. Multiplication of with the unitary feedforward matrix Hermitian requires FLOPs. Previous steps are repeated times for each beamforming subgroup.
Finally, FLOPs are needed to multiply the cumulative product with data vectors for all users and generate twostage precoded data vectors . Thus, the total number of FLOPs required for twostage ZFTHP precoding is as follows:
Application of the corresponding substitution gives more concise expression:
As summarized in Table 2, twostage ZFTHP precoding scheme has the same computational complexity as twostage BDTHP scheme. The expressions for complexity of the precoding techniques summarized in Table 2 are adopted from [20] in the case of ZF precoding and THP schemes, and from [21] for BDTHP scheme.

Moreover, twostage ZFTHP technique has the lowest complexity among conventional linear and nonlinear precoding schemes. The computational complexity required to generate one twostage precoded data vector in the case of 32 antennas and IoT devices grouped in 4 beamforming subgroups is illustrated in Figure 4. For this choice of parameter values, presented precoding schemes have similar complexity when the number of IoT devices is less than 20. As the number of users in the cell increases, a computational complexity of ZF and THP precoders substantially escalates in comparison with ZFTHP and BDTHP. As expected, nonlinear THP scheme has the highest complexity.
Based on the previously defined computational complexity for different user grouping and precoding schemes, the total complexity for all observed algorithms is presented in Table 3. The total complexity is calculated as a sum of corresponding user grouping and precoding schemes for each algorithm (excluding the complexity related to calculation of beamforming matrices when ZF precoding is employed, since these were calculated as a part of user grouping procedure in ZFS, OUG ZFTHP, and OUGGreedy ZF algorithms).

As evident in Table 3, ZFS and OUGGreedy ZF algorithms have somewhat lower computational complexity than OUG ZFTHP algorithm. Such behaviour could be expected since these two schemes employ only linear ZF precoding with lower complexity due to the reuse of beamforming matrices already calculated as a part of user grouping procedure (similarly as for OUG ZFTHP algorithm). However, as will be presented in the next section, these schemes achieve lower overall MUMIMO system capacity than OUG ZFTHP due to the less efficient ZF precoding in comparison to ZFTHP precoding. This is particularly evident in the case of the correlated MIMO channels (i.e., mutually dependent user channels) when linear ZF precoding cannot sucesfully mitigate all interuser interference, and more complex twostage ZFTHP precoding achieves significantly better performance and thus enables larger capacity gains.
On the other hand, OUG ZFTHP algorithm has lower computational complexity than OUG THP and means BDTHP algorithms. The higher complexity of OUG THP algorithm is a consequence of using THP precoding, which possess significantly higher complexity than ZFTHP precoding. However, this more complex precoding scheme enables somewhat higher capacity gains, as will be shown in the next section. If we observe OUG ZFTHP and means BDTHP complexity, it is obvious in Table 2 and Figure 4 that ZFTHP and BDTHP precoding schemes require the same amount of FLOPs. However, complexity order of the optimized means clustering adopted in means BDTHP algorithm increases linearly with the number of beamforming subgroups [21] which is not the case with overlapping user grouping method deployed in OUG ZFTHP algorithm. Hence, overall OUG ZFTHP algorithm is more computationally efficient than means BDTHP algorithm.
Also, it is worth mentioning that in means BDTHP algorithm, the computational complexity of singular value decomposition (SVD) procedure for the linear precoding part (i.e., BD procedure) is neglected since it has to be determined very infrequently from the longterm CSI. In here proposed OUG ZFTHP algorithm, calculation of beamforming matrices is done as a part of user grouping procedure which simplifies subsequent linear ZF precoding. This gives more realistic evaluation of OUG ZFTHP complexity.
5. Results and Discussion
To evaluate the performance of the proposed twostage ZFTHP precoding based on overlapping user grouping approach (OUG ZFTHP algorithm), we compared the MUMIMO system capacity for this algorithm with the linear OUGGreedy ZF algorithm [8] and twostage BDTHP precoding based on optimized means clustering (means BDTHP) [21]. First algorithm introduces the overlapping user grouping method which showed good performance in IoToriented MUMIMO system. Latter one considers concrete user grouping strategy in junction with twostage precoding scheme for the first time. Thus, these algorithms represent suitable candidates for the performance benchmarking. For the sake of completeness, we also show simulation results for ZFS [7] and THP precoding [12], combined with the overlapping user grouping strategy (OUG THP) that defines practical lower and upper bound of the MUMIMO capacity region for this particular case, respectively. 2D MUMIMO system environment has been created using the MATLAB software package. The MUMIMO system capacity for all the observed algorithms was estimated according to Equation (21) defined in the Section 3. The MonteCarlo simulation of these algorithms is performed by averaging 500 random channel realizations.
We assumed a singlecell MUMIMO system with a base station located at the center of the cell and equipped with 128 omnidirectional antennas which represent the typical configuration of the commercial massive MIMO antenna. It simultaneously transmits data in 3.5 GHz band to 300 singleantenna IoT devices. This frequency band has been identified as a global International Mobile Telecommunications2020 (IMT2020) band for 5G NR deployment by International Telecommunication Union Radiocommunication Sector (ITUR) [35]. Configuration of the planar antenna array is (i.e., 8 antenna elements vertically and 16 antenna elements horizontally) with the aim to exploit 2D beamforming in horizontal domain. The base station antenna spacing is normalized with respect to the wavelength and set to 0.5. Data symbols are modulated with 16QAM technique which was shown sufficient for mMTC 5G NR use cases [36]. To simulate the transmit and receive antenna correlation, we adopted Jakes’ onering MIMO channel model [23] and Gudmundson’s shadowing model [25] commonly used in cellular IoT scenarios, respectively.
Parameter values for both correlation models were taken from [8] since the same type of propagation environment was considered. The scattering objects are located around devices in radius of 30 meters [24], while correlation distance between devices is 20 meters and shadow fading varies with standard deviation of 0.4 [26]. IoT devices are uniformly distributed around base station with dedicated azimuth values at distance between 100 and 300 meters. The angular spread of the signal transmitted from the base station is derived as in [24]. This is aligned with the expected beam arrival distance in rich scattering radio environment for chosen antenna configuration and operating frequency band. An overview of the main system parameter configuration used in MonteCarlo simulations is provided in Table 4.

In [8], it was mathematically shown that overlapping user beamforming subgroup can select more users with a higher probability than the corresponding beamforming subgroup and that searching space extension always results in larger capacity. We demonstrate numerically the superiority of here proposed approach that combines overlapping user grouping method with twostage ZFTHP precoding scheme in terms of the achievable MUMIMO system capacity.
First, we have evaluated the proposed algorithm performance in the case of the environment with uncorrelated Rayleigh fading where users’ channels are mutually independent. Equivalent channelbased received signaltonoise ratio (SNR) to throughput mapping method adopted by 3GPP [19] is used for performance evaluation. MUMIMO system capacity comparison of the analyzed algorithms is outlined in Figure 5.
It can be seen that proposed OUG ZFTHP algorithm achieves approximately the same capacity as OUG THP and OUGGreedy ZF algorithms. The same finding holds for the conventional ZFS and means BDTHP algorithms. This is due to the fact that linear precoding has almost the same performance as nonlinear precoding when users have uncorrelated channel vectors. Hence, linear precoder can efficiently suppress both intergroup and interuser interference and there is no need to use twostage hybrid precoding mechanism. means BDTHP algorithm has lower throughput performance due to a lack of overlapping approach which could further populate formed subgroups using the favorable propagation property. In the case of uncorrelated MIMO channels, the capacity improvement is mainly achieved by overlapping user grouping strategy among beamforming subgroups.
Next, we consider more realistic scenario with correlated shadow fading which imposes dependency between user channels. Results in Figure 6 show that proposed OUG ZFTHP algorithm achieves significant improvement on sum capacity over the existing suboptimal approaches. We can observe capacity increase from 10% and 30% in low SNR regime (4 dB) to 25% and 45% in high SNR regime (20 dB) comparing to OUGGreedy ZF and means BDTHP algorithms, respectively. SNR reference values are chosen according to real 5G urban NLOS radio conditions at 3.5 GHz [37].
Obtained large performance gain of OUG ZFTHP algorithm is the result of the proposed combination of more advanced twostage signal processing and overlapping among beamforming subgroups. Linear OUGGreedy ZF algorithm achieves lower sum rate due to the correlation of user channel vectors, whereas the poor performance of means BDTHP algorithm comes from the random user selection within clusters. Additionally, high SNR regime improves transmission reliability and beam steering which contribute to the higher achievable system throughput. Algorithms that use overlapping user grouping based on greedy user selection methods are exploiting the favorable propagation property without generating much interuser interference which greatly enhances MUMIMO system capacity. From Figure 6, it can also be seen that nonlinear THP precoding based on overlapping user grouping strategy (i.e., OUG THP) provides the best sum rate. However, here proposed algorithm requires significantly less FLOPs as shown in Section 4. Thus, OUG ZFTHP approach represents a good tradeoff between computational complexity and MUMIMO system performance in terms of capacity.
In order to give further performance comparison for the observed algorithms which comprise user grouping and precoding procedures, we considered the average uncoded bit error rate (BER) as the performance metric (i.e., achieved BER prior to forward error correction decoding at the receiver), where averaging is performed over a sufficient number of channel realizations and over all users. The uncoded BER is calculated as in [16], in which the upper bound for symbol error rate (SER) in the case of different precoding techniques is given, with the additional averaging realized over all users for all beamforming subgroups. The use of Gray coding for the adopted 16QAM modulation technique is presumed for all schemes. It should be noticed that these BER values are derived in [16], under the assumption that transmitter for each user in each subgroup essentially fixes the minimum required SNR () for which it encodes data at the rate corresponding to the possible capacity. However, if the actual SNR value is smaller than , decoding errors occur with the probability [16]. Thus, given BER represents the upper bound for the observed scenario in which the ideal CSI data is used.
The comparison of the estimated average uncoded BER for all the observed algorithms is presented in Figures 7 and 8, in the case of the environment with correlated and uncorrelated Rayleigh fading, respectively. As obviously shown in Figures 7 and 8, the algorithms which deploy more complex nonlinear precoding (i.e., means BDTHP, OUG ZHTHP, and OUG THP) significantly outperform those with linear ZF precoding (i.e., ZFS and OUGGreedy). Such behaviour is expected due to more successful mitigation of interuser interference with nonlinear precoding techniques, as was already shown in the literature [20]. Also, much better BER performance is achieved for all the observed algorithms in the case of uncorrelated MIMO channels, due to the significantly lower interuser interference. The best average uncoded BER is achieved in the case of OUG THP algorithm, while proposed OUG ZFTHP algorithms have somewhat higher BER in the case of uncorrelated MIMO channels, and essentially same BER as OUG THP algorithm in the case of correlated MIMO channels.
Previous findings are summarized in Table 5 where numerical performance of all the observed algorithms in good radio conditions (i.e., 20 dB in the case of achievable sum rate and 40 dB in the case of BER analysis) is presented.

When the number of users increases for the same number of base station antennas , it was shown that ZFbased user grouping strategies achieve the sum rate which approaches the capacity upper bound [4]. This comes from the fact that more combinations within the search space are covered. Asymptotically, when user number goes to infinity, the optimal sum rate is achieved since searching among all possible combinations is done. Previous findings apply to our case as well with additional benefit from the introduction of overlapping user grouping approach. In this approach, even more users can be added in unpopulated beamforming subgroups since probability that users’ channels are spatially uncorrelated increases. However, this leads to the increased number of beamforming subgroups that should be precoded, and hence, computational complexity becomes prohibitively high because of its polynomial relation with the number of users and their subgroups. To overcome this challenge, we can increase the number of base station antennas. In that way, the same number of users will be served and selected in the lower number of beamforming subgroups keeping the complexity of signal processing on reasonable level. In that case, we could exploit both beamforming and multiplexing gains. However, the increase in the number of base station antennas results with the increased hardware complexity on base station side, especially in the observed 5G NR midband where digital beamforming is envisioned. The tradeoff between these two approaches is to choose reasonably a large number of base station antennas and numerous users in the cell but to keep the ratio between them relatively small. The last statement holds in the case of IoToriented MUMIMO system with numerous IoT devices served in 5G cell unlike in massive MIMO case where the number of base station antennas is typically much larger than the number of users [28].
6. Conclusions
In this paper, we have studied user grouping and scheduling problem in IoToriented 5G MUMIMO systems. We have proposed twostage hybrid precoding scheme based on overlapping user grouping strategy for mMTC 5G NR use case. In this framework, user grouping is performed using the greedy approach that allows users with favorable channel conditions to be scheduled into the multiple beamforming subgroups simultaneously. Twostage hybrid precoding scheme is then applied on created beamforming subgroups in order to minimize the interference in MUMIMO system. Linear ZF precoding cancels interference among beamforming subgroups while the nonlinear THP precoding reduces remaining interference between scheduled users within each beamforming subgroup. Comparative analysis with other precoding schemes based on different user grouping methods has been presented. Numerical results demonstrate that proposed algorithm achieves much higher MUMIMO system capacity in comparison to the existing twostage precoding schemes based on legacy user grouping strategies, especially in large SNR regime (from 30% at 4 dB to 45% at 20 dB). Also, thorough complexity analysis has shown that despite its good throughput performance, the proposed approach has lower computational complexity as the existing algorithms that employ user grouping methods and twostage precoding schemes. Also, the proposed OUG ZFTHP algorithm achieves very good BER performance in the observed application scenario.
Obtained numerical results encourage further research in the area of user grouping and scheduling in 5G MUMIMO systems. Future work will include evaluation of the proposed twostage precoding based on overlapping user grouping approach in heterogeneous 5G network consisting of both IoT devices and legacy users with different quality of service (QoS) requirements and assessment of its performance in more realistic radio environment which imposes channel imperfections. In order to support given QoS requirements for the observed users, a deployment of adaptive modulation mechanism might be necessary. In that case, the low complexity VP precoding techniques could be observed as a promising solution, instead of here considered THP schemes.
Data Availability
The data generated from MonteCarlo simulations to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Acknowledgments
This work has been supported by the Ministry of Education, Science and Technological Development of the Republic of Serbia.
References
 A. Zaidi, A. Branneby, A. Nazari, M. Hogan, and C. Kuhlins, Cellular IoT in the 5G Era, Ericsson White paper, 2020.
 T. van Chien and E. Björnson, Massive MIMO Communications, 5G Mobile Communications, Springer, Basel, Switzerland, 2017. View at: Publisher Site
 E. Castaneda, A. Silva, A. Gameiro, and M. Kountouris, “An overview on resource allocation techniques for multiuser MIMO systems,” IEEE Communications Surveys and Tutorials, vol. 19, no. 1, pp. 239–284, 2017. View at: Publisher Site  Google Scholar
 J. Wang, D. J. Love, and M. D. Zoltowski, “User selection with zeroforcing beamforming achieves the asymptotically optimal sum rate,” IEEE Transactions on Signal Processing, vol. 56, no. 8, pp. 3713–3726, 2008. View at: Publisher Site  Google Scholar
 S. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982. View at: Publisher Site  Google Scholar
 N. Ganganath, C.T. Cheng, and C. K. Tse, “Data clustering with cluster size constraints using a modified kmeans algorithm,” in 2014 International Conference on CyberEnabled Distributed Computing and Knowledge Discovery, pp. 158–161, Shanghai, China, 2014. View at: Publisher Site  Google Scholar
 G. Dimic and N. D. Sidiropoulos, “On downlink beamforming with greedy user selection: performance analysis and a simple new algorithm,” IEEE Transactions on Signal Processing, vol. 53, no. 10, pp. 3857–3868, 2005. View at: Publisher Site  Google Scholar
 R. Tian, Y. Liang, X. Tan, and T. Li, “Overlapping user grouping in IoT oriented massive MIMO systems,” IEEE Access, vol. 5, pp. 14177–14186, 2017. View at: Publisher Site  Google Scholar
 H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Aspects of favorable propagation in massive MIMO,” in Proc. 22nd Eur. Signal Process. Conf. (EUSIPCO), pp. 76–80, 2014. View at: Google Scholar
 U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007. View at: Publisher Site  Google Scholar
 M. H. M. Costa, “Writing on dirty paper (corresp.),” IEEE Transactions on Information Theory, vol. 29, no. 3, pp. 439–441, 1983. View at: Publisher Site  Google Scholar
 R. F. H. Fischer, C. Windpassinger, A. Lampe, and J. B. Huber, “Spacetime transmission using TomlinsonHarashima precoding,” in Proc. ITG SCC, pp. 139–147, 2002. View at: Google Scholar
 B. M. Hochwald, C. B. Peel, and A. L. Swindlehurst, “A vectorperturbation technique for nearcapacity multiantenna multiuser communication—Part II: perturbation,” IEEE Transactions on Communications, vol. 53, no. 3, pp. 537–544, 2005. View at: Publisher Site  Google Scholar
 R. Chen, C. Li, J. Li, and Y. Zhang, “Low complexity user grouping vector perturbation,” IEEE Wireless Communications Letters, vol. 1, no. 3, pp. 189–192, 2012. View at: Publisher Site  Google Scholar
 A. Li and C. Masouros, “A constellation scaling approach to vector perturbation for adaptive modulation in MUMIMO,” IEEE Wireless Communications Letters, vol. 4, no. 3, pp. 289–292, 2015. View at: Publisher Site  Google Scholar
 S. Zarei, W. Gerstacker, and R. Schober, “Comparison of latticereductionaided vector perturbation and TomlinsonHarashima Precoding,” in Proc. 2019 IEEE WCNC, Marrakech, Morocco, 2019. View at: Google Scholar
 Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, “Zeroforcing methods for downlink spatial multiplexing in multiuser MIMO channels,” IEEE Transactions on Signal Processing, vol. 52, no. 2, pp. 461–471, 2004. View at: Publisher Site  Google Scholar
 R11701678, “Nonlinear precoding for downlink multiuser MIMO,” Tech. Rep., Huawei, Hisilicon, 3GPP TSG RAN WG1 Meeting 88, Athens, Greece, 2017. View at: Google Scholar
 R11703187, “On MU MIMO nonlinear precoding in NR,” Tech. Rep., Nokia, AlcatelLucent Shanghai Bell, 3GPP TSG RAN WG1 Meeting 88, Athens, Greece, 2017. View at: Google Scholar
 S. Zarei, W. Gerstacker, and R. Schober, “Low complexity hybrid linear/Tomlinson–Harashima precoding for downlink largescale MUMIMO systems,” in Proc. 2016 IEEE Globecom Workshops, Washington DC, USA, 2016. View at: Google Scholar
 R. F. Trifan, A. A. Enescu, and C. Paleologu, “Hybrid MUMIMO precoding based on Kmeans user clustering,” Algorithms, vol. 12, no. 7, pp. 146–163, 2019. View at: Publisher Site  Google Scholar
 S. Sheikhzadeh, A. R. Forouzan, and F. Parvaresh, “Tomlinson–Harashima precoding for transmitterside intersymbol interference cancellation in PSK modulation,” IET Communications, vol. 13, no. 5, pp. 610–619, 2019. View at: Publisher Site  Google Scholar
 W. C. Jakes, Microwave Mobile Communications, Wiley, New York, USA, 1974.
 A. Adhikary, Junyoung Nam, JaeYoung Ahn, and G. Caire, “Joint spatial division and multiplexing—the largescale array regime,” IEEE Transactions on Information Theory, vol. 59, no. 10, pp. 6441–6463, 2013. View at: Publisher Site  Google Scholar
 M. Gudmundson, “Correlation model for shadow fading in mobile radio systems,” Electronics Letters, vol. 27, no. 23, pp. 21452146, 1991. View at: Publisher Site  Google Scholar
 P. Agrawal and N. Patwari, “Correlated link shadow fading in multihop wireless networks,” IEEE Transactions on Wireless Communications, vol. 8, no. 8, pp. 4024–4036, 2009. View at: Publisher Site  Google Scholar
 K. Yu and B. Ottersten, “Models for MIMO propagation channels: a review,” Wireless Communications and Mobile Computing, vol. 2, no. 7, pp. 653–666, 2002. View at: Publisher Site  Google Scholar
 T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentals of Massive MIMO, Cambridge University Press, Cambridge, UK, 2016. View at: Publisher Site
 P. He, L. Zhao, S. Zhou, and Z. Niu, “Waterfilling: a geometric approach and its application to solve generalized radio resource allocation problems,” IEEE Transactions on Wireless Communications, vol. 12, no. 7, pp. 3637–3647, 2013. View at: Publisher Site  Google Scholar
 A. Wiesel, Y. C. Eldar, and S. Shamai, “Zeroforcing precoding and generalized inverses,” IEEE Transactions on Signal Processing, vol. 56, no. 9, pp. 4409–4418, 2008. View at: Publisher Site  Google Scholar
 R. Hunger, Floating Point Operations in MatrixVector Calculus, Technische Universität München, Associate Institute for Signal Processing, Tech. Rep. TUMLNSTR0505 Ver. 1.3, 2007.
 Z. Shen, R. Chen, J. G. Andrews, R. W. Heath, and B. L. Evans, “Low complexity user selection algorithms for multiuser MIMO systems with block diagonalization,” IEEE Transactions on Signal Processing, vol. 54, no. 9, pp. 3658–3663, 2006. View at: Publisher Site  Google Scholar
 M. Arakawa, Computational Workloads for Commonly Used Signal Processing Kernels, MIT, Lincoln Laboratory, Tech. Rep. ESCTR2006071, 2006.
 A. GarciaRodriguez and C. Masouros, “Powerefficient TomlinsonHarashima precoding for the downlink of multiuser MISO systems,” IEEE Transactions on Communications, vol. 62, no. 6, pp. 1884–1896, 2014. View at: Publisher Site  Google Scholar
 J. Lee, E. Tejedor, K. Rantaaho et al., “Spectrum for 5G: global status, challenges, and enabling technologies,” IEEE Communications Magazine, vol. 56, no. 3, pp. 12–18, 2018. View at: Publisher Site  Google Scholar
 C. Bockelmann, N. Pratas, H. Nikopour et al., “Massive machinetype communications in 5G: physical and MAClayer solutions,” IEEE Communications Magazine, vol. 54, no. 9, pp. 59–65, 2016. View at: Publisher Site  Google Scholar
 J. Zhang, Z. Zheng, Y. Zhang, J. Xi, X. Zhao, and G. Gui, “3D MIMO for 5G NR: several observations from 32 to massive 256 antennas based on channel measurement,” IEEE Communications Magazine, vol. 56, no. 3, pp. 62–70, 2018. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2021 Djordje B. Lukic et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.