Abstract

Coordinated Multipoint (CoMP) transmission and reception has been suggested as a key enabling technology of future cellular systems. To understand different CoMP configurations and to facilitate the configuration selection (and thus determine channel state information (CSI) feedback and data sharing requirements), performance benchmarks are needed to show what performance gains are possible. A unified approach is also needed to enable the cluster of cooperating cells to systematically take care of the transceiver design. To address these needs, the generalized iterative approach (GIA) is proposed as a unified approach for the minimum mean square error (MMSE) transceiver design of general multiple-transmitter multiple-receiver multiple-input-multiple-output (MIMO) systems subject to general linear power constraints. Moreover, the optimum decoder covariance optimization approach is proposed for downlink systems. Their optimality and relationships are established and shown numerically. Five CoMP configurations (Joint Processing-Equivalent Uplink, Joint Processing-Equivalent Downlink, Joint Processing-Equivalent Single User, Noncoordinated Multipoint, and Coordinated Beamforming) are studied and compared numerically. Physical insights, performance benchmarks, and some guidelines for CoMP configuration selection are presented.

1. Introduction

Though cellular has many challenges such as multipath fading, cell edge interference, and scarce spectrum, there is a demand for even better cellular performance than what is achieved today. In order to meet this demand, revolutionary ideas are needed. Coordinated Multipoint (CoMP) transmission and reception, a type of Network MIMO (multiple-input and multiple-output) in Long-Term Evolution-Advanced (LTE-A) [1], is one of those ideas and is a key enabling technology of future cellular systems. It, being a MIMO technique, actually exploits the multipath fading. Furthermore, it lowers the cell edge interference by having potential interfering cells cooperate. And lastly, its lowering of the interference allows for better spectrum reuse and, therefore, better use of the scarce spectrum. Since there are various levels of cell cooperation, there are various CoMP configurations [14]. As such, the following three categories of configurations are generally considered.

The first category is Noncoordinated Multipoint (Non-CoMP) and does not use CoMP at all. In it, each base station (BS) communicates with its own user(s) and does so without cooperating with the other cells in data sharing or channel state information (CSI) exchange. Each BS either ignores or tries to estimate the intercell interference. It has the lowest level of cooperation.

The second category is Coordinated Beamforming (CBF). (In LTE-A, it is also referred to as Coordinated Scheduling and Coordinated Beamforming (CS/CB).) Here, each BS again only communicates with its own user(s) and there is no data sharing between BSs and no data sharing between users. This time though, the cells do cooperate to minimize the interference they cause to each other through coordination and joint transmitter and/or receiver design. It has the second lowest level of cooperation. Much work has been done for CBF configurations where each cell has one transmitter and receiver pair [512] and where each cell has one transmitter and multiple receivers [1316]. There also are different CSI considerations (e.g., CSI only available at receivers [58, 16], full CSI available at a central processing unit [914], CSI available only on a per-cell basis [15]) and different design strategies (e.g., centralized [914] or distributed [15] designs).

The third category is Joint Processing (JP). Here, the cells fully cooperate; the BSs act as a single equivalent transmitter in downlink (the data is processed and transmitted jointly from the BSs) to form the Joint Processing-Equivalent Downlink (JP-DL) [1719] and act as a single equivalent receiver in uplink (all received signals are shared and jointly processed) to form the Joint Processing-Equivalent Uplink (JP-UL) [20]. It is shown that JP-UL [20] and JP-DL [17] bring significant gains to both the cell average throughput and the cell edge user throughput. Note that JP-UL and JP-DL have higher level of cooperation than the previous two categories (Non-CoMP and CBF). When the users act as a single equivalent receiver (resp., transmitter) in downlink (resp., uplink), it forms the Joint Processing-Equivalent Single User (JP-SU), which is essentially a point-to-point MIMO system. JP-SU has the highest level of cooperation and is only of theoretical interest.

In addition, a few attempts have also been made to jointly consider different categories/configurations. For example, joint precoder and decoder designs (e.g., SINR balancing, user rate balancing and maximum sum rate) are proposed for Non-CoMP, JP-DL and CBF and numerical comparison of their ergodic sum rates is made in [2123]. But to the best of our knowledge, there are no comparison and configuration selection guidelines for various CoMP configurations in the literature.

As seen from these previous works, the precoder and decoder designs and performance evaluation for CoMP systems can be very complex and diverse. This is due to the fact that there exist various CoMP configurations, design criteria, and constraints (e.g., the per-antenna power constraint, per-transmitter power constraint). There also exists a vast number of design approaches associated with each of the design criteria, each of the constraints, and each of the CoMP configurations. Moreover, CoMP was not considered mature and was not adopted by 3GPP in LTE release 10 [24]. Thus, performance benchmarks (which show what performance gains are possible) for CoMP configurations are needed to help determine rules for configuration selection. Since different CoMP configurations require different levels of CSI feedback and data sharing, these rules also help to determine CSI feedback and data sharing requirements. There is also a need for a unified approach to enable the cluster of cooperating cells to systematically take care of the transceiver design of whatever configuration they choose to implement. Both of these two needs will be addressed in this paper.

To address the need for performance benchmarks, we consider joint MMSE precoder and decoder designs for JP-UL, JP-DL, JP-SU, Non-CoMP, and CBF. Firstly, this is because joint MMSE designs can be considered as performance benchmarks for other practical design criteria; an MMSE solution is near optimum in some other senses (e.g., maximum sum rate [25, 26], minimum BER [27]) as well. It has been shown that maximizing the sum rate is equivalent to minimizing the geometric mean of the MSEs of all data streams [25]. Moreover, minimizing the sum MSE is equivalent to minimizing the upper bound of the MSEs geometric mean. Thus, the MMSE results are nearly optimum in the maximum sum rate sense. Regarding BER, it has been shown that the MMSE design minimizes the lower bound of BER [27]. In addition, the BER results of the MMSE and minimum BER designs in [26] are very comparable. So, the MMSE results are nearly optimum in the minimum BER sense as well. Though studies in [2527] are for single-user systems, these remarks are also true for CoMP systems. Secondly, note that with full CSI, JP-SU provides a performance upper bound for all CoMP configurations with same total number of transmit antennas and same total number of receive antennas, as shown in Figure 1. Similarly, Non-CoMP and CBF, where each cell has one transmitter and receiver pair, provide performance upper bounds for their respective categories, given same total number of transmit antennas and same total number of receive antennas. Thus, the performance benchmarks can be set forth numerically for various simulation setups; these numerical performance benchmarks can then be used to compare the different configurations and/or categories.

Although not much MMSE work has been published for the CoMP configurations, joint MMSE transceiver designs for the single-user, multiuser downlink, multiuser uplink, and CBF MIMO systems have been studied. For example, for single-user MIMO systems, closed-form expressions of the MMSE design have been derived for the total power constraint [25, 26] and for the shaping constraints [28]. For uplink MIMO systems subject to the per-user power constraint, numerical solutions are provided mainly by the optimal transmit covariance optimization approach (TCOA) [29, 30] and suboptimal iterative approaches such as in [29]. For downlink systems, numerical solutions are provided mainly by iterative approaches such as in [31] for the total power constraint and in [18] for the per-antenna and per-cell power constraints. Dual uplink approaches [3234] have also been employed for the total power constraint. Recently, for K-user MIMO interference channels (a case of CBF), a joint MMSE design subject to per-transmitter power constraint, using a linear search for each Lagrange multiplier, is proposed [35].

Note that various CoMP configurations can be considered as special cases of general multiple-transmitter multiple-receiver (MTMR) systems. In this paper, the novel generalized iterative approach (GIA) is proposed as the unified approach to take care of the MMSE design of general MTMR MIMO systems subject to general linear power constraints, including the per-transmitter power constraint and the more practical per-antenna power constraint. The GIA can provide tradeoff between multiplexing and diversity gains. In addition, the optimum decoder covariance optimization approach (DCOA) for the MMSE design of downlink systems (i.e., JP-SU, JP-DL, and Non-CoMP) subject to general linear power constraints is also proposed so that the optimality of the GIA can be studied. For this purpose, the equivalence between the GIA and the optimum TCOA [29, 30] for the uplink or DCOA for the downlink is established in the respective configurations.

In the numerical simulations, firstly, aspects pertaining to the proposed approaches are investigated. The convergence properties of the proposed approaches are investigated; the optimality and diversity/multiplexing tradeoff of the GIA are verified numerically; numerical comparison between the GIA and the approach in [35] is investigated. Secondly, aspects pertaining to performance benchmark are investigated. To set forth a benchmark among different CoMP configurations, MSE and BER performances for the five CoMP configurations (JP-SU, JP-DL, JP-UL, CBF, and Non-CoMP) are compared. Since this paper is concerned with performance benchmarks (achievable theoretical upper bounds), fairness-type criteria, and practical issues such as synchronization required by different CoMP configurations are not considered here. Various important factors (level of cooperation, system load, system size, and path loss) are studied though. The performance benchmarks and the resulting physical insights (into the mechanisms and performances of CoMP configurations) are very useful. In particular, much needed guidelines for the configuration selection process are obtained.

Notations are as follows. All boldface letters indicate vectors (lower case) or matrices (upper case). 𝐀, 𝐀, 𝐀1, tr(𝐀), 𝐸(𝐀), rank(𝐀), and 𝐀𝐹 stand for the transpose, conjugate transpose, inverse, trace, expectation, rank, and Frobenius norm of 𝐀, respectively. abs(𝐀) denotes taking the absolute value element-wise of 𝐀. span(𝐀) represents the subspace spanned by the columns of 𝐀. Matrix 𝐈𝑎 signifies an identity matrix with rank 𝑎. Matrix 𝟎 signifies a zero matrix with proper dimension. diag[] denotes the diagonal matrix with elements [] on the main diagonal. 𝐀>𝐁(𝐀𝐁) means that 𝐀𝐁 is positive definite (semidefinite). 𝐀𝐁 denotes the Schur product of 𝐀 and 𝐁 (element-wise product of 𝐀 and 𝐁). CN(𝜇,𝑞) denotes a complex normal random variable with mean 𝜇 and variance 𝑞. Finally, i.i.d. stands for independent and identically distributed.

2. Formulation

2.1. A Single Formulation for General MTMR MIMO Systems

In this subsection, we derive a single formulation to describe a general MTMR MIMO system including the five CoMP configurations (JP-UL, JP-DL, JP-SU, Non-CoMP, and CBF) investigated in this paper. Consider an MTMR MIMO system with 𝑇 transmitters and 𝑅 receivers. Let 𝜏𝑛 and 𝛾𝑙 denote the numbers of antennas at the 𝑛th transmitter and the 𝑙th receiver, respectively. Accounting for the path loss (spatial correlation can be easily incorporated as well but has been omitted for simplicity), the channel from the 𝑛th transmitter to the 𝑙th receiver is modeled as𝐇𝑙𝑛=𝑑𝛽𝑙𝑛𝐇𝑊,𝑙𝑛.(1) Here, 𝑑𝑙𝑛 denotes the distance between the 𝑙th receiver and the 𝑛th transmitter, and 2𝛽 is the path loss exponent. The entries of 𝐇𝑤,𝑙𝑛 are i.i.d. CN(0,1). Here, the subscript 𝑊 represents spatially white noise.

Some of the transmitters (resp., receivers) in the CoMP system may be sharing and jointly processing their data (resp., received signals). Such a collection of transmitters (resp., receivers), which are connected via backhaul, share CSI and data, and act like a single transmitter (resp., receiver) in transmission and data processing, is a composite transmitter (resp., receiver) and thus an equivalent transmitter (resp., receiver). For the sake of having a single formulation, a transmitter (resp., receiver) which does not collaborate with other transmitters (resp., receivers) in the above way is also considered to be an equivalent transmitter (resp., receiver). Thus, this MTMR MIMO system can also be (and will be) considered as having 𝐶 equivalent transmitters (eq-transmitters for short) and 𝐾 equivalent receivers (eq-receivers for short). Obviously, 𝐶𝑇 and 𝐾𝑅.

Let 𝑡𝑐 and 𝑟𝑖 denote the numbers of antennas at the 𝑐th eq-transmitter and the 𝑖th eq-receiver, respectively. Then, 𝑡=𝑇𝑛=1𝜏𝑛=𝐶𝑐=1𝑡𝑐 and 𝑟=𝑅𝑙=1𝛾𝑙=𝐾𝑖=1𝑟𝑖 are the total numbers of transmit and receive antennas, respectively. Also let 𝐇𝑖𝑐 denote the composite channel matrix from the 𝑐th eq-transmitter to the 𝑖th eq-receiver. At the 𝑐th eq-transmitter, let 𝐬𝑖𝑐, 𝑚𝑖𝑐, and 𝐅𝑖𝑐 denote the data, number of data streams, and precoder for the 𝑖th eq-receiver, respectively. Furthermore, let Φ𝐬𝑖𝑐=𝐸(𝐬𝑖𝑐𝐬𝑖𝑐) and 𝐆𝑖𝑐 be, respectively, the source covariance matrix for 𝐬𝑖𝑐 and the decoder for 𝐬𝑖𝑐. Which transmitter transmits to which receiver is configurable. When the 𝑐th eq-transmitter has no data to transmit to the 𝑖th eq-receiver, 𝐬𝑖𝑐=𝟎, 𝑚𝑖𝑐 = 0, Φ𝐬𝑖𝑐=𝟎, 𝐅𝑖𝑐=𝟎, and 𝐆𝑖𝑐=𝟎. When it does, Φ𝐬𝑖𝑐 is positive definite and 𝐅𝑖𝑐 and 𝐆𝑖𝑐 must be designed.

In this system, there may be multiple clusters where each cluster jointly designs the MIMO processors for its own eq-transmitters and eq-receivers but does so independently of the other clusters. There is no CSI sharing between clusters and the intercluster interference is formulated as noise. Let 𝐷 and 𝑆 define one such cluster; 𝐷 being the set of eq-transmitter indices in the cluster and 𝑆 being the set of eq-receiver indices in the cluster. 𝐷 and 𝑆 are introduced to allow a single formulation to take care of the MMSE transceiver design for different CoMP configurations. At the 𝑖th eq-receiver, 𝑖𝑆, the received signal is thus𝐲𝑖=𝑐𝐷𝐇𝑖𝑐𝑗𝑆𝐅𝑗𝑐𝐬𝑗𝑐+𝐧𝑖,𝐧(2)𝑖=𝐚𝑖+𝐢𝑖,𝐢𝑖=𝑙𝐷𝐇𝑖𝑙𝑗𝑆𝐅𝑗𝑙𝐬𝑗𝑙.(3) Here, 𝐧𝑖, 𝐚𝑖 and 𝐢𝑖 are the noise plus intercluster interference vector, the noise vector, and the intercluster interference vector, respectively, at the 𝑖th eq-receiver. The interference is from all of the eq-transmitters which do not belong to 𝐷. Thus, when there is only one cluster in the system, there is no interference and 𝐧𝑖=𝐚𝑖, 𝐢𝑖=𝟎 for every 𝑖𝑆. Note that, except in Non-CoMP, the possible intercell interference is implicitly included in the first term in (2), and is considered to be manageable.

2.2. Five CoMP Configurations

The needed CSI feedback and data sharing in each CoMP configuration are assumed done through ideal link and of zero delay. The above single formulation is able to describe any general MTMR MIMO system including JP-UL, JP-DL, JP-SU, Non-CoMP, and CBF. There is only one cluster in JP-UL, JP-DL, JP-SU, and CBF. But, there are 𝐶 clusters in Non-CoMP. Without loss of generality and for convenience, Non-CoMP and CBF considered in this paper have only one transmitter-receiver pair per cluster.

2.2.1. Configuration I: JP-UL

In JP-UL, the system has only one cluster and is just an equivalent uplink MIMO system, that is, there are multiple transmitters (each being an eq-transmitter) but only one eq-receiver (full cooperation among all receivers). Thus, 𝐇𝐷={1,2,,𝐶},𝑆={1},𝐶=𝑇,𝐾=1,𝑖𝑐=𝐇1𝑐𝐇𝑅𝑐,𝐧𝑖=𝐚𝑖,𝐢𝑖=𝟎,𝑐𝐷,𝑖𝑆.(4)

For both FDD and TDD systems, each BS estimates all uplink CSI and sends the CSI to a central processing unit via the backhaul (if the BSs are colocated, the backhaul is not needed). The central processing unit performs the system-wide transceiver design and sends each user its optimized precoder through the serving BS. Each user uses the received precoder for transmitting data. Lastly, the BSs share their received signals with the central processing unit for joint decoding.

2.2.2. Configuration II: JP-DL

In JP-DL, the system has only one cluster and is just an equivalent downlink MIMO system, that is, there are multiple receivers (each being an eq-receiver) but only one eq-transmitter (full cooperation among all transmitters). Thus, 𝐇𝐷={1},𝑆={1,2,,𝐾},𝐶=1,𝐾=𝑅,𝑖𝑐=𝐇𝑖1𝐇𝑖𝑇,𝐧𝑖=𝐚𝑖,𝐢𝑖=𝟎,𝑐𝐷,𝑖𝑆.(5)

In TDD systems, the BSs estimate downlink CSI through reciprocity. In FDD systems, each user estimates all intracluster downlink CSI and feeds back the CSI to its serving BS. After obtaining the CSI, each BS sends the CSI to a central processing unit via the backhaul (if the BSs are co-located, the backhaul is not needed). The central processing unit performs the system-wide transceiver design and sends the optimized precoders and decoders to the BSs. Each BS uses the optimized precoder for transmitting data. Each BS also sends the decoder to its users for processing the received data.

2.2.3. Configuration III: JP-SU

In JP-SU, essentially a point-to-point MIMO system, there is only one eq-transmitter (full cooperation among all transmitters) and only one eq-receiver (full cooperation among all receivers). It is only of theoretical interest (showing performance upper bound for all CoMP systems) and the signaling issues are irrelevant and omitted. It is assumed that a central processing unit knows all the channels and performs the system-wide transceiver design. Thus,𝐇𝐷={1},𝑆={1},𝐶=1,𝐾=1,𝑖𝑐=𝐇11𝐇1𝑇𝐇𝑅1𝐇𝑅𝑇,𝐧𝑖=𝐚𝑖,𝐢𝑖=𝟎,𝑐𝐷,𝑖𝑆.(6)

2.2.4. Configuration IV: Non-CoMP

In Non-CoMP, each transmitter (being an eq-transmitter) is paired with a unique receiver (being an eq-receiver). Each pair is a cluster of the system, so the intercell interference is the inter-cluster interference. Thus, pairwise transceiver design is performed and the system with 𝐶 eq-transmitter eq-receiver pairs (𝐶=𝐾=𝑇=𝑅) is decoupled into 𝐶 single user clusters with the 𝑖th one being𝐷={𝑖},𝑆={𝑖},𝐇𝑖𝑖=𝐇𝑖𝑖,𝐧𝑖=𝐚𝑖+𝐢𝑖,𝐢𝑖=𝐶𝑙=1,𝑙𝑖𝐇𝑖𝑙𝐅𝑙𝑙𝐬𝑙𝑙,𝑖𝑆.(7)

In TDD systems, each transmitter estimates the forward link CSI through reciprocity. The transmitter performs the joint transceiver design and sends the decoder to the receiver. In FDD systems, each receiver estimates the forward link CSI and sends the estimated information to the transmitter. Both transmitter and receiver can independently perform the joint transceiver design. The transmitter will use the resulting precoder to transmit data and the receiver will use the decoder to process the received data.

2.2.5. Configuration V: CBF

Like Non-CoMP, there are multiple pairs of transmitters and receivers in CBF. However, unlike Non-CoMP, there is only one cluster here. Note that in CBF, 𝐅𝑖𝑐=𝟎 for 𝑖𝑐 and the BSs do not share data. The CSI acquisition and signaling requirement in uplink (resp., downlink) for a central processing unit are the same as in JP-UL (resp., JP-DL). The central processing unit performs the system-wide transceiver design. Thus, 𝐷={1,2,,𝐶},𝑆={1,2,,𝐶},𝐇𝑖𝑐=𝐇𝑖𝑐,𝐧𝑖=𝐚𝑖,𝐢𝑖=𝟎,𝑐𝐷,𝑖𝑆.(8) Note that, for the composite channel matrix 𝐇𝑖𝑐 in (4)–(8), the subscript 𝑖 is the eq-receiver index and the subscript 𝑐 is the eq-transmitter index. However, for the channel matrix 𝐇𝑙𝑛, the subscript 𝑙 is the receiver index and the subscript 𝑛 is the transmitter index.

2.3. MMSE Design Subject to General Linear Power Constraints

For a given cluster, define the MSE with respect to the 𝑖th eq-receiver and the 𝑐th eq-transmitter, 𝑖𝑆,𝑐𝐷, as 𝜂𝑖𝑐𝐸𝐆=tr𝑖𝑐𝐲𝑖𝐬𝑖𝑐𝐆𝑖𝑐𝐲𝑖𝐬𝑖𝑐.(9) Note that when the 𝑐th eq-transmitter has no data for the 𝑖th eq-receiver, 𝜂𝑖𝑐=0. The sum MSE 𝜂 is𝜂=𝑐𝐷𝑖𝑆𝜂𝑖𝑐.(10)

2.3.1. MMSE Problem

We will jointly choose {𝐅𝑖𝑐,𝐆𝑖𝑐}𝑖𝑆,𝑐𝐷 to minimize the sum MSE η:𝐅𝑖𝑐,𝐆𝑖𝑐MMSE=argmin𝐅𝑖𝑐,𝐆𝑖𝑐𝑖𝑆,𝑐𝐷{𝜂},(11) subject to general linear power constraints, for example, the per-antenna power constraint at the 𝑐th eq-transmitter𝐈𝑡𝑐𝑖𝑆𝐅𝑖𝑐𝚽𝐬𝑖𝑐𝐅𝑖𝑐𝑃=diag𝑐1,,𝑃𝑐𝑡𝑐,𝑃𝑐1,,𝑃𝑐𝑡𝑐>0,𝑐𝐷,(12) or the per-transmitter power constraint at the 𝑛th transmitter of the 𝑐th eq-transmitter,𝐐tr𝑛𝑖𝑆𝐅𝑖𝑐𝚽𝐬𝑖𝑐𝐅𝑖𝑐=𝑃𝑏𝑛𝑐>0,𝑛𝐽𝑐,𝑐𝐷.(13) Here, 𝐽𝑐 denotes the set of all cooperating transmitters that form the 𝑐th eq-transmitter. When there is only one element in 𝐽𝑐, that is, 𝐽𝑐={𝑛}, 𝐐𝑛=𝐈𝑡𝑐 in (13). When there are more than one element in 𝐽𝑐, 𝐐𝑛 is a 𝑡𝑐×𝑡𝑐 matrix whose entries are all equal to zero except for the diagonal elements corresponding to the antennas of the 𝑛th transmitter. The values of these nonzero diagonal elements are equal to one.

2.3.2. Augmented Cost Function

To solve (11) subject to (12) or (13), one can use the method of Lagrange multipliers to set up the augmented cost function for general linear power constraints𝜉=𝜂+𝑐𝐷𝚲tr𝑐𝑖𝑆𝐅𝑖𝑐𝚽𝐬𝑖𝑐𝐅𝑖𝑐𝐏𝑐,(14) where Λ𝑐 represents the Lagrange multipliers. Only the widely considered per-transmitter power constraint and the practical per-antenna power constraint are given as examples. For the per-antenna power constraint in (12),𝚲𝑐𝜆=diag𝑐1,,𝜆𝑐𝑡𝑐,𝐏𝑐𝑃=diag𝑐1,,𝑃𝑐𝑡𝑐,𝑐𝐷.(15) For the per-transmitter power constraint in (13), let Δ𝑛=𝐈𝜏𝑛𝜆𝑛𝑐,Γ𝑛𝑐=𝐈𝜏𝑛𝑃𝑏𝑛𝑐/𝜏𝑛,𝑐𝐷. Thus𝚲𝑐𝚫=diag𝑛𝑛𝐽𝑐,𝐏𝑐𝚪=diag𝑛𝑐𝑛𝐽𝑐,𝑐𝐷.(16)

2.4. MMSE Decoders and Precoders

Define the noise covariance matrix and the noise plus interference covariance matrix at the 𝑖th eq-receiver as Φ𝐚𝑖=𝐸(𝐚𝑖𝐚𝑖) and Φ𝐧𝑖=𝐸(𝐧𝑖𝐧𝑖), respectively. Assume Φ𝐚𝑖 is known. Therefore, Φ𝐧𝑖 is also known in JP-SU, JP-UL, JP-DL and CBF because Φ𝐧𝑖=Φ𝐚𝑖. In Non-CoMP, Φ𝐧𝑖 can be estimated explicitly as Φ𝐧𝑖=𝐶𝑙=1,𝑙𝑖𝑑2𝛽𝑖𝑙𝑃𝑏𝑙𝑙𝐈𝑟𝑖+Φ𝐚𝑖, and 𝑃𝑏𝑙𝑙=𝑡𝑙𝑘=1𝑃𝑙𝑘 (see Appendix A).

After some math manipulations, (9) becomes𝜂𝑖𝑐=tr𝐆𝑖𝑐𝐇𝑖𝑐𝐅𝑖𝑐𝚽𝐬𝑖𝑐𝚽𝐬𝑖𝑐𝐅𝑖𝑐𝐇𝑖𝑐𝐆𝑖𝑐+𝚽𝐬𝑖𝑐+𝐆𝑖𝑐𝑘𝐷𝐇𝑖𝑘𝑗𝑆𝐅𝑗𝑘𝚽𝐬𝑗𝑘𝐅𝑗𝑘𝐇𝑖𝑘+𝚽𝐧𝑖𝐆𝑖𝑐.(17) There are two possible directions to solve the MMSE problem.

2.4.1. MMSE Decoder

On one hand, for a given set of precoders {𝐅𝑖𝑐}𝑖𝑆,𝑐𝐷, setting the gradient of 𝜂 in (10) with respect to 𝐆𝑖𝑐 equal to zero yields the MMSE decoder for 𝐬𝑖𝑐, 𝑐𝐷,𝑖𝑆:𝐆𝑖𝑐=𝚽𝐬𝑖𝑐𝐅𝑖𝑐𝐇𝑖𝑐𝐌𝑖,𝐌𝑖=𝑘𝐷𝐇𝑖𝑘𝑗𝑆𝐅𝑗𝑘𝚽𝐬𝑗𝑘𝐅𝑗𝑘𝐇𝑖𝑘+𝚽𝐧𝑖1.(18) Substituting (18) into (17), 𝜂 in (10) is reduced to𝜂1=𝑐𝐷𝑖𝑆tr𝚽𝐬𝑖𝑐𝐅𝑖𝑐𝐇𝑖𝑐𝐌𝑖𝐇𝑖𝑐𝐅𝑖𝑐𝚽𝐬𝑖𝑐+𝚽𝐬𝑖𝑐.(19) The augmented cost function 𝜉 in (14) is also reduced to𝜉1=𝜂1+𝑐𝐷𝚲tr𝑐𝑖𝑆𝐅𝑖𝑐𝚽𝐬𝑖𝑐𝐅𝑖𝑐𝐏𝑐.(20) Note that 𝜂1 in (19) and 𝜉1 in (20) are merely functions of precoders {𝐅𝑖𝑐}𝑖𝑆,𝑐𝐷 (and Lagrange multipliers {Λ𝑐}𝑐𝐷).

2.4.2. MMSE Precoder

On the other hand, for a given set of decoders {𝐆𝑖𝑐}𝑖𝑆,𝑐𝐷 and Lagrange multipliers {Λ𝑐}𝑐𝐷, setting the gradient of 𝜉in (14) with respect to 𝐅𝑖𝑐 equal to zero yields the MMSE precoder for 𝐬𝑖𝑐,𝑐𝐷,𝑖𝑆:𝐅𝑖𝑐=𝐍𝑐𝐇𝑖𝑐𝐆𝑖𝑐,𝐍𝑐=𝑘𝐷𝑗𝑆𝐇𝑗𝑐𝐆𝑗𝑘𝐆𝑗𝑘𝐇𝑗𝑐+𝚲𝑐1.(21) Substituting (21) into (14), the augmented cost function 𝜉 in (14) is reduced to𝜉2=𝑐𝐷𝑖𝑆tr𝐆𝑖𝑐𝐇𝑖𝑐𝐍𝑐𝐇𝑖𝑐𝐆𝑖𝑐𝚽𝐬𝑖𝑐+𝚽𝐬𝑖𝑐+𝑐𝐷𝑖𝑆𝐆tr𝑖𝑐𝚽𝐧𝑖𝐆𝑖𝑐𝑐𝐷𝚲tr𝑐𝐏𝑐.(22) Note that 𝜉2 in (22) is merely a function of precoders {𝐆𝑖𝑐}𝑖𝑆,𝑐𝐷 and Lagrange multipliers {Λ𝑐}𝑐𝐷.

2.4.3. Transmit and Decoder Covariance Matrices

When the nonzero source covariance matrices are diagonal matrices with the same diagonal elements (i.e., Φ𝐬𝑖𝑐=𝜎2𝐈𝑚𝑖𝑐,𝑖𝑆,𝑐𝐷,𝐬𝑖𝑐0), replacing 𝐅𝑖𝑐 by 𝐅𝑖𝑐𝐀𝑖𝑐 (𝐀𝑖𝑐 is an arbitrary unitary matrix with proper dimension) does not change the power constraint (12) or (13). Furthermore, 𝜂(𝐅𝑖𝑐,𝐆𝑖𝑐)=𝜂(𝐅𝑖𝑐𝐀𝑖𝑐,𝐀𝑖𝑐𝐆𝑖𝑐). Define the transmit covariance matrices as𝐔𝑖𝑐=𝐅𝑖𝑐𝐅𝑖𝑐,(23) and the decoder covariance matrices as𝐕𝑖𝑐=𝐆𝑖𝑐𝐆𝑖𝑐.(24) Essentially, 𝜂(𝐔𝑖𝑐,𝐕𝑖𝑐)=𝜂(𝐅𝑖𝑐𝐀𝑖𝑐,𝐀𝑖𝑐𝐆𝑖𝑐) for arbitrary unitary matrices {𝐀𝑖𝑐}𝑖𝑆,𝑐𝐷. Therefore, the transmit and decoder covariance matrices {𝐔𝑖𝑐,𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷 can be used to determine the MSE (in fact, the transmit and decoder covariance matrices {𝐔𝑖𝑐,𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷 also determine the achievable sum rate) and consequently determine the precoders and decoders. Thus, if the transmit covariance matrices {𝐔𝑖𝑐}𝑖𝑆,𝑐𝐷 which minimize the MSE are found, the precoders {𝐅𝑖𝑐}𝑖𝑆,𝑐𝐷 can be obtained using (23) and the decoders {𝐆𝑖𝑐}𝑖𝑆,𝑐𝐷 can be obtained from (18). Similarly, if the decoder covariance matrices {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷 which minimize the MSE are found, the decoders {𝐆𝑖𝑐}𝑖𝑆,𝑐𝐷 can be obtained using (24) and the precoders {𝐅𝑖𝑐}𝑖𝑆,𝑐𝐷 can be obtained from (21).

3. Unified Approach for General MTMR MIMO Systems

The GIA is proposed as a unified approach for the MMSE design for general MTMR MIMO systems. It is motivated by the fact that, if the Lagrange multipliers Λ𝑐 in (21) are known, we can solve the coupled equations (18) and (21) iteratively for the decoders {𝐆𝑖𝑐}𝑖𝑆,𝑐𝐷 and precoders {𝐅𝑖𝑐}𝑖𝑆,𝑐𝐷. Note that, in most literatures (e.g., [35]), the Lagrange multipliers are obtained through linear search, in which the search space increases significantly as the system size increases. We herein propose a much more efficient approach using an explicit expression for the Lagrange multipliers.

To obtain an explicit expression for the Lagrange multipliers Λ𝑐, 𝑐𝐷, set the gradient of 𝜉1 in (20) with respect to 𝐅𝑖𝑐 equal to zero and then left-multiply the resulting equation with 𝐅𝑖𝑐. Once this is done for each 𝑖𝑆, sum them all up to obtain the following equation:𝑖𝑆𝐅𝑖𝑐𝚽𝐬𝑖𝑐𝐅𝑖𝑐𝚲𝑐=𝐁𝑐,𝐁(25)𝑐=𝑖𝑆𝐅𝑖𝑐𝚽2𝐬𝑖𝑐𝐅𝑖𝑐𝐇𝑖𝑐𝐌𝑖𝐇𝑖𝑐𝑖𝑆𝐅𝑖𝑐𝚽𝐬𝑖𝑐𝐅𝑖𝑐×𝑘𝐷𝑗𝑆𝐇𝑗𝑐𝐌𝑗𝐇𝑗𝑘𝐅𝑗𝑘𝚽2𝐬𝑗𝑘𝐅𝑗𝑘𝐇𝑗𝑘𝐌𝑗𝐇𝑗𝑐.(26) Utilizing (12), for the per-antenna power constraint,𝚲𝑐=𝐏𝑐1𝐈𝑡𝑐𝐁𝑐.(27) Utilizing (13), for the per-transmitter power constraint, 𝜆𝑛𝑐=𝑃1𝑏𝑛𝑐𝐐tr𝑛𝐁𝑐,𝑛𝐽𝑐.(28) Note that the usage of (27) or (28) enforces the corresponding complementary slackness conditions𝚲𝑐𝐈𝑡𝑐𝑖𝑆𝐅𝑖𝑐𝚽𝐬𝑖𝑐𝐅𝑖𝑐𝐏𝑐𝜆=0,(29)𝑛𝑐𝐐tr𝑛𝑖𝑆𝐅𝑖𝑐𝚽𝐬𝑖𝑐𝐅𝑖𝑐𝑃𝑏𝑛𝑐=0,𝑛𝐽𝑐.(30)

With the explicit expression for the Lagrange multipliers in (27) or (28) in hand, a GIA can be developed. There are three steps in each iteration of the GIA.

Step 1. Given {𝐅𝑖𝑐}𝑖𝑆,𝑐𝐷, obtain {𝐆𝑖𝑐}𝑖𝑆,𝑐𝐷 using (18).

Step 2. Given {𝐅𝑖𝑐}𝑖𝑆,𝑐𝐷, obtain {Λ𝑐}𝑐𝐷 using (27) or (28).

Step 3. Given {𝐆𝑖𝑐}𝑖𝑆,𝑐𝐷 and {Λ𝑐}𝑐𝐷, obtain {𝐅𝑖𝑐}𝑖𝑆,𝑐𝐷 using (21).

The iterative procedure of the GIA stops when the Karesh-Kuhn-Tucker (KKT) conditions are all satisfied, that is, when the following three requirements are fulfilled: one, the MSE no longer decreases; two, each precoder (decoder) converges; three, the transmission powers at the transmitter(s) meet the desired power constraints. Since the MSE has a lower bound at zero and each of the GIA steps actually enforces one of the KKT conditions of the MMSE problem, the GIA can converge quickly to a local minimum at low powers. At high transmit powers, a scaling initialization (scaling the MMSE MIMO precoders and decoders given by the GIA at lower powers) is very effective and efficient. Note that the GIA can deal with arbitrary source covariance matrices {Φ𝐬𝑖𝑐}𝑖𝑆,𝑐𝐷, thus allowing 𝑚𝑖𝑐, the number of data streams intended from the 𝑐th eq-transmitter to the 𝑖th eq-receiver to be prespecified for all 𝑖𝑆,forall𝑐𝐷,𝐬𝑖𝑐𝟎. Since the numbers of data streams can be pre-specified, the GIA allows for tradeoff between diversity and multiplexing gains.

4. Optimum Approaches for Special MTMR Systems

When the source covariance matrices are diagonal matrices with the same diagonal elements, that is, Φ𝐬𝑖𝑐=𝜎2𝐈𝑚𝑖𝑐,𝑖𝑆,𝑐𝐷, optimum approaches for the MMSE design subject to the general linear power constraints may be developed for special MTMR systems: uplink systems (e.g., JP-UL, JP-SU, and Non-CoMP where 𝑆 has only one element) in Section 4.1 and downlink systems (e.g., JP-DL, JP-SU, and Non-CoMP where 𝐷 has only one element) in Section 4.2. For convenience and without loss of generality, in the section, we assume 𝜎2=1.

4.1. TCOA [29, 30] for Systems with One Eq-Receiver

The TCOA [29, 30] can be used for JP-UL, JP-SU, and Non-CoMP where 𝑆 has only one element (but not for JP-DL and CBF) under general linear power constraint. (Note that in [30, 31], the TCOA is only for the per-user power constraint. We use it here to deal with the per-antenna power constraint.) It is motivated by the fact that the MMSE problem may be solved by searching for the transmit covariance matrices {𝐔𝑖𝑐}𝑖𝑆,𝑐𝐷 to jointly minimize 𝜂1 in (19). The optimum numbers of data streams {𝑚𝑖𝑐}𝑖𝑆,𝑐𝐷 are determined by the rank of optimum {𝐔𝑖𝑐}𝑖𝑆,𝑐𝐷. The TCOA [30] can be reformulated in terms of an SDP formulation which can be solved numerically by SDP solvers (such as SeDuMi [36] and Yalmip [37]) in polynomial time.

4.2. DCOA for Systems with One Eq-Transmitter

The DCOA can be developed for JP-DL, JP-SU, and Non-CoMP where 𝐷 has only one element (but not for JP-UL and CBF). It is motivated by the fact that the MMSE problem may be solved by searching for the decoder covariance matrices {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷 to jointly minimize 𝜉2 in (22). Using (24), 𝜉2 in (22) becomes𝜉2𝚲=tr𝑐𝐍𝑐𝐏𝑐+𝑖𝑆𝐕tr𝑖𝑐𝚽𝐧𝑖+𝑖𝑆𝑚𝑖𝑐𝑡𝑐,(31) where 𝐍𝑐=𝑘𝐷𝑗𝑆𝐇𝑗𝑐𝐕𝑗𝑘𝐇𝑗𝑐+𝚲𝑐1.(32) The MMSE transceiver design problem becomesmin𝐕𝑖𝑐𝑖𝑆max𝚲𝑐𝜉2,subjectto𝐕𝑖𝑐𝐕0,rank𝑖𝑐=𝑚𝑖𝑐,𝚲𝑐0,𝑖𝑆,𝑐𝐷.(33) The problem in (33) is not cing with the numbers of data streams, that is, rank(𝐕𝑖𝑐)=𝑚𝑖𝑐,𝑖𝑆,𝑐𝐷. Allowing {𝑚𝑖𝑐}𝑖𝑆,𝑐𝐷 to be unspecified, we obtain the rank-relaxed decoder covariance optimization problem:min𝐕𝑖𝑐0,𝑖𝑆max𝚲𝑐0𝜉2,rel𝜉,𝑐𝐷,2,rel𝚲=tr𝑐𝐍𝑐𝚲𝑐𝐏𝑐+𝑖𝑆𝐕tr𝑖𝑐𝚽𝐧𝑖.(34) The cost function 𝜉2,rel in (34) is convex with respect to {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷and concave with respect to Λ𝑐. Define min𝐕𝑖𝑐0,𝑖𝑆maxΛ𝑐0𝜉2,rel as the primal problem and maxΛ𝑐0min𝐕𝑖𝑐0,𝑖𝑆𝜉2,rel as the dual problem. Since both the primal problem and the dual problem are convex and strictly feasible, strong duality holds, that is, the optimum values of {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷, Λ𝑐, and 𝜉2,rel obtained from the primal problem are the same as those obtained from the dual problem.

4.2.1. Primal-Dual Algorithm

We propose a novel primal-dual algorithm to solve the rank-relaxed decoder covariance optimization problem in (34). Denote the feasible set of values for {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷as the primal domain and the feasible set of values for Λ𝑐 as the dual domain. In short, the approach consists of iterating between a primal domain step and a dual domain step. (Both subproblems, defined in (30) and (31), are convex because their cost functions are convex and concave, respectively, and their constraints are all linear matrix inequalities. The solution of each sub-problem is optimum for that sub-problem.) For the (𝑗+1)th iteration:

Primal Domain Substep
Given Λ𝑐=Λ𝑐(𝑗), find the {𝐕(𝑗+1)𝑖𝑐}𝑖𝑆,𝑐𝐷 which solves min𝐕𝑖𝑐tr𝑗𝑆𝐇𝑗𝑐𝐕𝑗𝑐𝐇𝑗𝑐+𝚲𝑐1𝚲𝑐+𝑖𝑆𝐕tr𝑖𝑐𝚽𝐧𝑖,subjectto𝐕𝑖𝑐0,𝑖𝑆,𝑐𝐷.(35)

Dual Domain Substep
Given {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷={𝐕(𝑗+1)𝑖𝑐}𝑖𝑆,𝑐𝐷, find the Λ𝑐(𝑗+1) which solves max𝚲𝑐tr𝑗𝑆𝐇𝑗𝑐𝐕𝑗𝑐𝐇𝑗𝑐+𝚲𝑐1𝚲𝑐𝚲𝑐𝐏𝑐,subjectto𝚲𝑐0,𝑐𝐷.(36) The convexity of the rank-relaxed decoder covariance optimization problem guarantees the solution provided by the primal-dual algorithm is a global optimum. The iterative procedure stops when the 𝜉2,rel’s corresponding to the primal domain step and the dual domain step converge to the same value and when {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷 converge and Λ𝑐 converge. In practice, the DCOA given by solving (35) and (36) is considered to have converged at the (𝑗+1)th iteration when {𝐕(𝑗+1)𝑖𝑐𝐕(𝑗)𝑖𝑐𝐹}𝑖𝑆,𝑐𝐷,Λ𝑐(𝑗+1)Λ𝑐(𝑗)𝐹, and the duality gap of the values of 𝜉2,rel derived from the two steps gap(𝑗+1)=𝜉2,rel𝐕(𝑗+1)𝑖𝑐,𝚲𝑐(𝑗+1)𝜉2,rel𝐕(𝑗+1)𝑖𝑐,𝚲𝑐(𝑗)(37) are less than some pre-specified thresholds. Note that, in all this, the power constraints have been accounted for by the Lagrange multipliers. The optimum numbers of data streams {𝑚𝑖𝑐}𝑖𝑆,𝑐𝐷 are determined by the rank of optimum {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷.

4.2.2. Two-Semidefinite Programming (Two-SDP) Procedure

Similar to the TCOA [30] in uplink, (35) and (36) can be reformulated in terms of the SDP formulation:min𝐖𝑝,𝐕𝑖𝑐𝑖𝑆𝐖tr𝑝𝚲𝑐+𝑖𝑆𝐕tr𝑖𝑐𝚽𝐧𝑖,subjectto𝐕𝑖𝑐𝐖0,𝑖𝑆,𝑐𝐷,𝑝𝐈𝑡𝑐𝐈𝑡𝑐𝑗𝑆𝐇𝑗𝑐𝐕𝑗𝑐𝐇𝑗𝑐+𝚲𝑐0.(38)min𝐖𝑑,𝚲𝑐𝐖tr𝑑𝑗𝑆𝐇𝑗𝑐𝐕𝑗𝑐𝐇𝑗𝑐𝚲+tr𝑐𝐏𝑐,subjectto𝚲𝑐𝐖0,𝑐𝐷,𝑑𝐈𝑡𝑐𝐈𝑡𝑐𝑗𝑆𝐇𝑗𝑐𝐕𝑗𝑐𝐇𝑗𝑐+𝚲𝑐0.(39) Both (38) and (39) can be solved numerically by SDP solvers (such as SeDuMi [36] and Yalmip [37]) in polynomial time. However, the primal-dual algorithm of the DCOA needs both the primal and dual sub-problems to be solved in each iteration. This leads to high computational complexity. Furthermore, the Two-SDP Procedure is sensitive to the numerical precisions of the SDP solvers. It works well at low transmit powers, but the duality gap cannot be made arbitrarily small at high transmit powers due to insufficient numerical precisions of the SDP solvers available in public. Nevertheless, a very important contribution here is that the MMSE transceiver design under general linear power constraints provided by the Two-SDP Procedure is optimal for downlink.

4.2.3. Numerically Efficient Procedure

To reduce the computational complexity and improve the convergence properties of the Two-SDP Procedure, the SDP formulation in (38) is still employed to solve for the primal domain step in (35). And we employ the explicit expressions of Λ𝑐 derived as follows for the dual domain step in (36).

Substituting (18) into (24) and using (23), we obtain𝐕𝑖𝑐=𝐌𝑖𝐇𝑖𝑐𝐔𝑖𝑐𝐇𝑖𝑐𝐌𝑖,𝐌𝑖=𝑘𝐷𝐇𝑖𝑘𝑗𝑆𝐔𝑗𝑘𝐇𝑖𝑘+𝚽𝐧𝑖1.(40) Similarly, substituting (21) into (23) and using (24), we obtain𝐔𝑖𝑐=𝐍𝑐𝐇𝑖𝑐𝐕𝑖𝑐𝐇𝑖𝑐𝐍𝑐,𝐍𝑐=𝑘𝐷𝑗𝑆𝐇𝑗𝑐𝐕𝑗𝑘𝐇𝑗𝑐+𝚲𝑐1.(41) To remove the dependence of {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷 on {𝐔𝑖𝑐}𝑖𝑆,𝑐𝐷, substitute (41) into (40) to yield𝐕𝑖𝑐=𝐌𝑖𝐇𝑖𝑐𝐍𝑐𝐇𝑖𝑐𝐕𝑖𝑐𝐇𝑖𝑐𝐍𝑐𝐇𝑖𝑐𝐌𝑖,𝐌𝑖=𝑘𝐷𝐇𝑖𝑘𝑗𝑆𝐍𝑘𝐇𝑗𝑘𝐕𝑗𝑘𝐇𝑗𝑘𝐍𝑘𝐇𝑖𝑘+𝚽𝐧𝑖1.(42) Similarly, substituting (23) into 𝐁𝑐 in (26) and using (41), we can express the Lagrange multipliers {Λ𝑐}𝑐𝐷 in (27) or (28) in terms of {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷.

5. Equivalence among the Proposed Approaches and Optimality of GIA

In this section, we focus the discussions on the optimality of and the relationships between the GIA, TCOA, and DCOA. Then, the optimality of the GIA can be established.

5.1. Equivalence of the TCOA and GIA for Systems with One Eq-Receiver

When the TCOA is applicable and the transmit covariance matrices {𝐔𝑖𝑐}𝑖𝑆,𝑐𝐷 obtained from the MMSE designs are of full rank, the TCOA and GIA are equivalent. Consequently, the solution of the GIA is actually optimum because the solution of the TCOA is optimum.

To prove the equivalence between the TCOA and GIA, it suffices to show that the KKT conditions of the two approaches are equivalent. This is because the TCOA is a convex approach. The KKT conditions common to both approaches are (18), the power constraint (12) or (13), the complementary slackness condition (29) or (30), and the nonnegativeness of the Lagrange multipliers. To obtain the unique KKT condition of the TCOA, we set up the following augmented cost function to include the nonnegative definite constraint on {𝐔𝑖𝑐}𝑖𝑆,𝑐𝐷:𝜁1𝚽=tr𝐧𝑖𝐌𝑖+𝑐𝐷𝚲tr𝑐𝐔𝑖𝑐𝐏𝑐𝚿𝑢𝑖𝑐𝐔𝑖𝑐,(43) where {Ψ𝑢𝑖𝑐}𝑖𝑆,𝑐𝐷 are the Lagrange multipliers satisfying tr(Ψ𝑢𝑖𝑐𝐔𝑖𝑐)=0, Ψ𝑢𝑖𝑐0,𝑖𝑆,𝑐𝐷. When {𝐔𝑖𝑐}𝑖𝑆,𝑐𝐷 are of full rank, the Lagragian variables {Ψ𝑢𝑖𝑐}𝑖𝑆,𝑐𝐷 are zero matrices. Making the gradients of (43) with respect to {𝐔𝑖𝑐}𝑖𝑆,𝑐𝐷 to be zeros, we have𝚲𝑐=𝐇𝑖𝑐𝐌𝑖𝚽𝐧𝑖𝐌𝑖𝐇𝑖𝑐,𝑖𝑆,𝑐𝐷.(44) The task of showing the equivalence of the KKT conditions of the two approaches boils down to showing that the above KKT condition of the TCOA, (44), can be derived from (and can be used to derive) the KKT conditions unique to the GIA, (21). Substitute (18) and (23) into (21) to obtain𝐅𝑖𝑐=𝐇𝑖𝑐𝐌𝑖𝑘𝐷𝐇𝑖𝑘𝐔𝑖𝑘𝐇𝑖𝑘𝐌𝑖𝐇𝑖𝑐+𝚲𝑐1𝐇𝑖𝑐𝐌𝑖𝐇𝑖𝑐𝐅𝑖𝑐.(45) Then right multiply (45) by 𝐅𝑖𝑐𝐔1𝑖𝑐 to get𝐈𝑡𝑐=𝐇𝑖𝑐𝐌𝑖𝑘𝐷𝐇𝑖𝑘𝐔𝑖𝑘𝐇𝑖𝑘𝐌𝑖𝐇𝑖𝑐+𝚲𝑐1𝐇𝑖𝑐𝐌𝑖𝐇𝑖𝑐.(46) With some matrix manipulations, we can show that (46) and (44) are equivalent. Since (21) and (44) can be derived from each other, this proof is complete. The above proof is done assuming Φ𝐬𝑖𝑐=𝜎2𝐈𝑚𝑖𝑐with 𝜎2=1,𝑖𝑆,𝑐𝐷. It is also applicable when 𝜎21.

5.2. Equivalence of the DCOA and GIA for Systems with One Eq-Transmitter

When the DCOA is applicable and the decoder covariance matrices {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷 obtained from the MMSE designs are of full rank, the DCOA and GIA are equivalent. Consequently, the solution of the GIA is actually optimum because the solution given by the DCOA is optimal.

To prove the equivalence between the DCOA and GIA, it suffices to show that the KKT conditions of the two approaches are equivalent. This is because the DCOA is a convex approach, so that its KKT conditions are sufficient conditions for optimality. The KKT conditions common to both approaches are (21), the power constraint (12) or (13), the complementary slackness condition (29) or (30), and the non-negativeness of the Lagrange Multipliers. To obtain the unique KKT condition of the DCOA, we set up the following augmented cost function from (34) to include the non-negative definite constraint on {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷𝜁2=tr𝑗𝑆𝐇𝑗𝑐𝐕𝑗𝑐𝐇𝑗𝑐+𝚲𝑐1𝚲𝑐𝚲𝑐𝐏𝑐+𝑖𝑆𝐕tr𝑖𝑐𝚽𝐧𝑖𝚿𝑣𝑖𝑐𝐕𝑖𝑐,(47) where {Ψ𝑣𝑖𝑐}𝑖𝑆,𝑐𝐷 are the Lagrange multipliers satisfying tr(Ψ𝑣𝑖𝑐𝐕𝑖𝑐)=0, Ψ𝑣𝑖𝑐0,𝑖𝑆,𝑐𝐷. When {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷 are of full rank, the Lagrange variables {Ψ𝑣𝑖𝑐}𝑖𝑆,𝑐𝐷are zero matrices. Making the gradients of (47) with respect to {𝐕𝑖𝑐}𝑖𝑆,𝑐𝐷 to be zeros, we have𝐇𝑖𝑐𝐍𝑐𝚲𝑐𝐍𝑐𝐇𝑖𝑐=𝚽𝐧𝑖,𝑖𝑆,𝑐𝐷.(48) The task of showing the equivalence of the KKT conditions of the two approaches boils down to showing that the above KKT condition of the DCOA, (48), can be derived from (and can be used to derive) the KKT conditions unique to the GIA, (18). Substitute (21) and (24) into (18) to obtain𝐆𝑖𝑐=𝐆𝑖𝑐𝐇𝑖𝑐𝐍𝑐𝐇𝑖𝑐𝐇𝑖𝑐𝐍𝑐𝑗𝑆𝐇𝑗𝑐𝐕𝑗𝑐𝐇𝑗𝑐𝐍𝑐𝐇𝑖𝑐+𝚽𝐧𝑖1.(49) Then left-multiply (49) by 𝐕1𝑖𝑐𝐆𝑖𝑐 to get𝐈𝑟𝑖=𝐇𝑖𝑐𝐍𝑐𝐇𝑖𝑐𝐇𝑖𝑐𝐍𝑐𝑗𝑆𝐇𝑗𝑐𝐕𝑗𝑐𝐇𝑗𝑐𝐍𝑐𝐇𝑖𝑐+𝚽𝐧𝑖1.(50) With some matrix manipulations, we can show that (50) and (48) are equivalent. Since (18) and (48) can be derived from each other, this proof is complete. The above proof is done assuming Φ𝐬𝑖𝑐=𝜎2𝐈𝑚𝑖𝑐 with 𝜎2=1,𝑖𝑆,𝑐𝐷. It is also applicable when 𝜎21.

6. Simulation Setup

In all of the simulations, the noise and nonzero source covariance matrices, Φ𝐚𝑖 and Φ𝐬𝑖𝑐, are all identity matrices of dimension 𝑟𝑖 and 𝑚𝑖𝑐, respectively. The nonzero source (data) vectors consist entirely of uncoded binary phase shift keying (BPSK) modulated bits. For the per-antenna power constraint, 𝑃𝑐𝑑=𝑃,𝑑=1,2,,𝑡𝑐,𝑐=1,2,,𝐶 (see (12)), and for the per-transmitter power constraint 𝑃𝑏𝑛𝑐=𝜏𝑛𝑃,forall𝑛𝐽𝑐,𝑐=1,2,,𝐶 (see (13)). Thus, the maximum transmission power from the 𝑛th transmitter is always the same (i.e., 𝜏𝑛𝑃) for both power constraints in (12) and (13).

Without loss of generality, in all of the simulations, the numbers of transmitters and receivers are the same and each cell has only one transmitter and receiver. Since the transmitter in the 𝑙th cell always (no matter which configuration) has data for the receiver in the 𝑙th cell, they are labeled the 𝑙th transmitter and receiver, respectively. Furthermore, for simplicity, 𝑑𝑙𝑙 (see (1)) is normalized to be equal to 1 for all 𝑙. Since all other links are possibly (depending on the configuration) interfering links, they are normalized such that 𝑑𝑙𝑛1,𝑙𝑛. Again, for the sake of simplicity, all 𝑑𝑙𝑛’s, 𝑙𝑛, are set equal thus giving rise to the parameter𝑑𝛿=2𝛽𝑙𝑙𝑑2𝛽𝑙𝑛=𝑑2𝛽𝑙𝑛.(51) Note that, in a cellular context, the users (base stations) are the receivers (transmitters) in downlink and the transmitters (receivers) in uplink. Thus, 𝑑𝑙𝑛=1 (𝛿=1) means that all of the users are cell edge users (system is in a cell edge scenario). Furthermore, as 𝑑𝑙𝑛 increases, 𝛿 increases and each user moves away from the cell edge toward its own base station. In all of the simulations, 2𝛽=4 in the path loss model of (1).

All of the setups (1a,1b,,5b) used in these simulations for the five CoMP configurations are defined in Table 1. (Note though that the distances are not specified in these baseline setups because they are example dependent.) For each CoMP configuration, there are various setups. The differences between the different setups for a particular CoMP configuration are marked in bold. For example, for JP-UL, setups 1a and 1b are exactly the same except for the values of {𝑚𝑖𝑐} and 𝑚. Unlike setups 1a–3b where each setup corresponds to only one configuration, setups 4a, 4b, 5a, and 5b can correspond to either Non-CoMP or CBF. Thus, to help distinguish whether a setup belongs to Non-CoMP or CBF, the name of the configuration is placed next to the setup number, for example, 5a (Non-CoMP) denotes setup 5a for Non-CoMP.

Note that not every approach can be used for every configuration and every setup in Table 1. Also note that the channel matrices generated numerically usually have full column and/or row rank. This in general results in maximum feasible rank transmit covariance matrices and/or decoder covariance matrices in the MMSE designs if the numbers of data streams are not pre-specified. Therefore, in such cases, the TCOA and DCOA are applicable in corresponding setups. The applicability of the proposed approaches in the setups is summarized in Table 2, where “Y” means an approach is applicable in a setup while “N” means it is not.

One last note, the results for setup 4b (Non-CoMP) under the per-antenna power constraint are obtained using the optimum closed-form solution (see Appendix B). The results for setups 5a (Non-CoMP) and 5b (Non-CoMP) can also be obtained by the optimum closed-form solution. But, they are omitted for the clarity of the figures.

7. Investigation into the Proposed Approaches

In this section, the convergence properties, optimality, and diversity/multiplexing tradeoff of the GIA, and numerical comparison of the GIA with the approach in [35] for CBF are investigated. All results except for the ones in Section 7.1 are obtained by averaging over 20 channel realizations. These results are consistent with those obtained by averaging over more channel realizations.

7.1. Convergence Properties of the Approaches

Consider setup 3a (JP-SU). All approaches are applicable. The convergence property (expressed as MSE, 𝑑𝐺, and 𝑑𝑃) of the GIA for the per-antenna power constraint for one set of channel realizations is shown in Figure 2. The difference in decoders 𝑑𝐺 and the difference in the per-antenna power constraint 𝑑𝑃 between the 𝑗th and (𝑗+1)th iteration are defined as𝑑𝐺(𝑗)=𝐆(𝑗+1)11𝐆(𝑗)11𝐹,𝑑𝑃(𝑗)=𝐅trabs(𝑗)11𝐅(𝑗)11𝐏1𝑃.(52)

The convergence property for the per-transmitter power constraint is similar and is omitted due to page limit. As shown in Figure 2, both the MSE and 𝑑𝐺 converge quickly. It is remarkable that the 𝑑𝑃s converge much slower in higher power. This is due to the fact that, when 𝑃 increases, the Lagrange multipliers decrease quickly (see (27) or (28)). Note that the usage of (27) or (28) enforces the corresponding complementary slackness conditions (29) or (30). For large 𝑃’s, the Lagrange multipliers are very small. For example, when 10log10𝑃=30 dB, they can be as small as 1010. Thus, the number of iteration increases drastically as 𝑃 increases if equality in the power constraints in (12) or (13) is insisted. The slow convergence behavior of the 𝑑𝑃’s is also observed in other configurations.

In Non-CoMP and CBF, the power constraints may not be met with equality for the MMSE results (where the corresponding Lagrange multipliers are essentially zeros). Although the Lagrange multipliers are formulated in this paper using equality power constraints to derive explicit expressions of the Lagrange multipliers, the GIA can be in fact used to solve inequality power constraints. When the equality of a particular power constraint is not met, the corresponding Lagrange multiplier becomes zero (which shows the complementary slackness condition).

For the DCOA, the convergence properties of the Two-SDP Procedure and Numerically Efficient Procedure, using SDP solvers SeDuMi [36] and Yalmip [37], are shown in Figure 3 for setup 3a (JP-SU) for the per-antenna power constraint for one set of channel realizations. It is found (from observing the convergence rates of the duality gap in (37) and the antenna powers in Figure 3) that the Numerically Efficient Procedure converges faster than the Two-SDP Procedure.

7.2. Optimality of the GIA

This sub-section investigates numerically the equivalence relationships stated in Section 5 and verifies the optimality of the GIA. Only examples for the per-antenna power constraints are shown for simplicity. In setup 1a (JP-UL), the MSE curves of the GIA and TCOA merge in the left sub-plot of Figure 4. The GIA is equivalent to the TCOA and yields the globally optimum solution. On the other hand, in setup 2a (JP-DL), the MSE curves of the GIA and DCOA merge in the right subplot of Figure 4. The GIA is equivalent to the DCOA and yields the globally optimum solution. Similarly, in setups 3a, 3c, and 3d (JP-SU) (see Figure 5), the MSE curves of all approaches merge. The GIA is equivalent to both the TCOA and DCOA and yields globally optimum solution.

7.3. Diversity/Multiplexing Tradeoff by the GIA

In setups 1a (JP-UL), 2a (JP-DL), and 3a (JP-SU), the GIA is able to transmit the maximum number of data streams as other proposed approaches. On the other hand, in setups 1b (JP-UL), 2b (JP-DL), and 3b (JP-SU), the GIA is also able to transmit a fewer number of data streams resulting in a lower MSE and BER performance (see the dashed curves in Figures 4 and 5), while the other proposed approaches are not applicable. In other words, the GIA is able to, unlike the other approaches, provide a tradeoff between multiplexing gain and diversity gain.

7.4. Comparison between the GIA and the Approach in [35]

As in Section 7.1, our proposed GIA in fact can solve the inequality power constraint. So, both our proposed GIA and the approach in [35] are 3-step iteratively approaches applicable in CBF with the per-transmitter power constraint. The only difference is the way of finding the Lagrange multipliers. Reference [35] uses a linear search method to find the Lagrange multipliers when the equality power constraint is enforced, while the GIA uses a more efficient explicit expression (28). In setup 5a (CBF), the MSE (BER) curves of the GIA and the approach in [35] merge, as in Figure 6. It shows that the GIA performs as good as the approach in [35] numerically, but is more efficient. Furthermore, the approach in [35] is only applicable with the per-transmitter power constraint while the GIA can deal with the more practical per-antenna power constraint.

8. Performance Benchmark

As in the previous section, the proposed unified approach, the GIA, is applicable to all setups. It is optimal when the number of data streams is equal to the rank of the channel, and it provides diversity gain when the number of data streams is less than the rank of the channel (e.g., in setups 1b, 2b, and 3b). In this section, all results are generated using the GIA for simplicity. The performances of the five different CoMP configurations will be studied. In particular, the impacts of the level of cooperation (Section 8.1), system load (Sections 8.1 and 8.3), system size (Sections 8.2 and 8.3), and severity of the path loss (Section 8.3) on the performance are analyzed and used to come up with some guidelines for configuration selection (Section 8.4). All of the MSE and BER results are obtained by averaging over 20 channel realizations. These results are consistent with those obtained by averaging over more channel realizations.

8.1. Impact of the Level of Cooperation and System Load

To understand the impact of different levels of cooperation on the performance of MTMR MIMO systems, we compare the performance of the five configurations. Case A consists of setups 1a (JP-UL), 2a (JP-DL), 3a (JP-SU), 4a (Non-CoMP), and 4a (CBF), and Case B consists of setups 1b (JP-UL), 2b (JP-DL), 3b (JP-SU), 4b (Non-CoMP), and 4b (CBF). For all of the setups in Cases A and B, the total number of transmit (receive) antennas are the same, the power constraints are the same, and the distances are the same (𝑑𝑙𝑛=1 for 𝑙,𝑛=1,2). (Note that this choice of 𝑑𝑙𝑛 makes 𝛿=1. It also makes all of the users be at the cell edge). The difference between the two cases lies in the number of data streams transmitted; all setups in Case A have four data streams transmitted in total (i.e., fully loaded systems) while all setups in Case B have two data streams transmitted in total (i.e., partially loaded systems). Figures 7(a) and 7(b) show the MSE and BER results, respectively.

Before comparing the results of Case A and Case B, let us compare the individual setups within each case first. Firstly, observe that, in both cases, the performance order of the configurations is exactly the same as the level of cooperation order. The performance improves as the level of cooperation increases. Note that, the MSE and BER performance order agrees with that of the ergodic sum rate in [22, 23]. Secondly, note that in both cases, the per-transmitter power constraint in CBF does not usually meet with equality for every pair. However, it always does for the Non-CoMP one. The reason is quite interesting. In Non-CoMP, each pair designs its precoder and decoder to minimize its own MSE. Thus, there is no reason for any of the pairs to limit their transmit power. However, in CBF, all the pairs jointly design their precoders and decoders to minimize the system-wide MSE. Thus, it may not be always beneficial for all transmitters to transmit on full power since the mutual interference may be large. Thirdly, note that both the per-transmitter and per-antenna power constraints usually meet with equality for the three JP configurations.

With that done, let us now compare the results of Cases A and B. The first observation is that limiting the numbers of data streams is crucial for the performance. The second observation is that, in Case B, the MSE performances of CBF and the higher level of cooperation configurations (JP-UL, JP-DL, and JP-SU) are actually similar at high transmit power. The last observation, somewhat related to the first, is that the performances of Non-CoMP and CBF are much more dependent on the number of data streams than JP-UL, JP-DL, and JP-SU. Comments similar to this last observation are made in [22, 23] for the ergodic sum rate results of JP-DL and CBF with multiple receivers per cell.

The difference in the BERs of Non-CoMP and CBF between the two cases is remarkable and can be explained as follows. Using (2) and (3d), we havê𝐬𝑐𝑐=𝐆𝑐𝑐𝐇𝑐𝑐𝐅𝑐𝑐𝐬𝑐𝑐+𝐆𝑐𝑐𝐇𝑐𝑘𝐅𝑘𝑘𝐬𝑘𝑘+𝐆𝑐𝑐𝐚𝑐,𝑐,𝑘{1,2},𝑐𝑘,(53) where ̂𝐬𝑐𝑐 is the soft output data at the 𝑐th eq-receiver. As can be easily seen, 𝐆𝑐𝑐𝐇𝑐𝑐𝐅𝑐𝑐𝐬𝑐𝑐 is the desired term, 𝐆𝑐𝑐𝐇𝑐𝑘𝐅𝑘𝑘𝐬𝑘𝑘 is the interference term, and 𝐆𝑐𝑐𝐚𝑐 is the noise term. Since each of the channels is 2×2 and will be of full rank with probability 1, their nonsingularity will be assumed throughout this explanation.

In Case A, the 𝑐th receiver, 𝑐=1,2, needs 𝐆𝑐𝑐𝐇𝑐𝑐𝐅𝑐𝑐 (the effective channel from input data to output data) to be of full rank in order to successfully receive its two data streams. But, if 𝐆𝑐𝑐𝐇𝑐𝑐𝐅𝑐𝑐 is of full rank for both receivers (i.e., for 𝑐=1,2), 𝐆𝑐𝑐𝐇𝑐𝑘𝐅𝑘𝑘,𝑐,𝑘=1,2,𝑘𝑐, are of full rank as well. Thus, the interference and desired signals cannot be separated. If the interference is significant, as is likely at the cell edge, the performance will suffer greatly. On the other hand, it is possible in Case B for both pairs to successfully receive each of their data streams and null out the interference. This is because rank(𝐇𝑐𝑐𝐅𝑐𝑐)=rank(𝐇𝑐𝑘𝐅𝑘𝑘)=1 and therefore span(𝐇𝑐𝑐𝐅𝑐𝑐) is not necessarily equal to span(𝐇𝑐𝑘𝐅𝑘𝑘), 𝑐,𝑘=1,2,𝑘𝑐. In CBF, the precoders can be chosen to steer 𝐇𝑐𝑘𝐅𝑘𝑘,𝑘𝑐, away from 𝐇𝑐𝑐𝐅𝑐𝑐 and the decoders can be chosen to sufficiently null out 𝐇𝑐𝑘𝐅𝑘𝑘, 𝑘𝑐. In Non-CoMP, the 𝑐th pair does not know 𝐇𝑐𝑘𝐅𝑘𝑘,𝑘𝑐, but it knows the estimated noise plus interference covariance matrix Φ𝐧𝑐 (see Appendix A). It can therefore design 𝐅𝑐𝑐 and 𝐆𝑐𝑐 based on its knowledge of Φ𝐧𝑐. As can be seen, the performance of Non-CoMP is quite good under the per-transmitter power constraint; it is poor under the more stringent per-antenna power constraint though.

8.2. Impact of System Size (the Number of Transmitter Receiver Pairs)

To gain some understanding on what happens when the number of transmitter receiver pairs increases, we consider five different setups: 4b (Non-CoMP), 5a (Non-CoMP), 4b (CBF), 5a (CBF), and 5b (CBF) in Table 1. For convenience, we choose 𝑑𝑙𝑛=1 for 𝑙,𝑛=1,2 (cell edge scenario). Figure 8 shows the resulting MSEs and BERs. Note that the maximum antenna power is 𝑃 in all of the setups. The normalized MSE shown in Figure 8 is defined to be the average MSE per data stream.

Firstly, we compare the results of CBFs setups 4b, 5a, and 5b to see the performance degradation when more transmitter receiver pairs join the wireless environment. Consider setup 4b (CBF) as a baseline system. We observe that setups 5a (CBF) and 5b (CBF), respectively, have 2–5 dB and 7–14 dB loss in the normalized MSE results. In addition, the BER results of setups 5a (CBF) and 5b (CBF) have smaller diversity gains (absolute values of the slopes) than setup 4b (CBF). However, more data streams are transmitted in setups 5a and 5b.

How does CBF handle the 𝐶=𝐾=3 (setup 5a) and 𝐶=𝐾=4 (setup 5b) systems when each node has only 2 antennas? Does it perform IA, that is, does its precoders and decoders satisfy rank(𝐆𝑐𝑐𝐇𝑐𝑐𝐅𝑐𝑐)=𝑚𝑐𝑐 and 𝐆𝑐𝑐𝐇𝑐𝑘𝐅𝑘𝑘=𝟎, 𝑐,𝑘=1,2,,𝐶, 𝑘𝑐 [912, 38]? Well, MMSE designs are more general than IA because IA is not always feasible and does not take into account arbitrary Φ𝐧𝑐. But, even so, the MMSE design is seen, at times, to exhibit IA-like features, that is, the interference projections, 𝐇𝑐𝑘𝐅𝑘𝑘,forall𝑘𝑐, are steered by the MMSE design such that they lie predominantly in a subspace not containing the signal projection, 𝐇𝑐𝑐𝐅𝑐𝑐. As to be expected, the MMSE decoders take into account both the noise and interference—not merely always nulling out the interference as the IA conditions would dictate. In addition, better IA is generally achieved at higher transmit SNR’s due to the reduction in the significance of the noise. Furthermore, it is seen that our MMSE design supports more transmitter receiver pairs than [38]’s upper bound for IA designs.

Secondly, we compare Non-CoMP and CBF to see how important joint system-wide transceiver design is to systems with more than 2 transmitter-receiver pairs. BER-wise, it can be seen that, under the per-transmitter power constraint, the best curve for Non-CoMP (the setup 4b (Non-CoMP) one) only has a 1 dB gain over the worst of CBF curves. Actually, only 2 transmitter receiver pairs are communicating in setup 4b (Non-CoMP) as opposed to the 4 transmitter receiver pairs in setup 5b (CBF). When under the per-antenna constraint, all of the CBF BER curves are better than the best Non-CoMP one. Furthermore, the performance for setup 5a (Non-CoMP) is terrible. Thus, it is clear that joint system-wide transceiver design can greatly help systems with multiple transmitter receiver pairs by mitigating multiple intercell interferences.

8.3. Impact of the Path Loss

Firstly, using Cases A and B (as defined in Section 8.1), the system performance of all five CoMP configurations under different path losses and system loads is studied. As such, 𝑑𝑙𝑛,𝑙𝑛, varies between 1 and 4 (𝑑𝑙𝑙=1,𝑙=1,2 as always). Figures 9(a) and 9(b) show, respectively, the MSE and BER results against 𝑑𝑙𝑛,𝑙𝑛, for 10log10𝑃=5dB.

In both Cases A and B, as 𝑑𝑙𝑛,𝑙𝑛, (and thus 𝛿) gets larger, the performances of both Non-CoMP and CBF improve while the performances of JP-UL, JP-DL, and JP-SU worsen. This is because 𝑑𝑙𝑛,𝑙𝑛, corresponds to interference channels (channels which do not carry desired data) in Non-CoMP and CBF and to desired channels (channels which can carry desired data) in JP-UL, JP-DL, and JP-SU. As 𝑑𝑙𝑛,𝑙𝑛, (and thus 𝛿) increases, the path losses of the interference channels increase for Non-CoMP and CBF and the path losses of some of the desired channels increase for JP-UL, JP-DL and JP-SU. Actually, the MSE performances of the five configurations eventually merge when 𝑑𝑙𝑛,𝑙𝑛, (and thus 𝛿) is large. This is because the system essentially ends up consisting of two independent and interference-free transmitter-receiver pairs when 𝑑𝑙𝑛,𝑙𝑛, (and thus 𝛿) is large enough. It is remarkable that this merging of performances can already be seen when 𝑑𝑙𝑛=3,𝑙𝑛, in Case A and when 𝑑𝑙𝑛=2,𝑙𝑛, in Case B. It is also remarkable (but to be expected) that this merging phenomenon of JP-DL and CBF is also seen with ergodic sum rates in [22, 23].

Secondly, using the five setups (4b (Non-CoMP), 5a (Non-CoMP), 4b (CBF), 5a (CBF), and 5b (CBF)) employed in Section 8.2, further path loss studies are conducted for Non-CoMP and CBF with respect to different system sizes. With 𝑑𝑙𝑙=1, forall𝑙,and10log10𝑃=5dB, Figure 10 shows the MSE and BER results against 𝑑𝑙𝑛,𝑙𝑛. As 𝑑𝑙𝑛,𝑙𝑛, (and thus 𝛿) gets larger, it is clearly seen that the performances of the setups improve and merge together. This behavior is because 𝑑𝑙𝑛,𝑙𝑛, corresponds to the interference channels for both Non-CoMP and CBF. As 𝑑𝑙𝑛,𝑙𝑛, increases, both the inter-pair interference and the importance of joint design across the pairs decrease.

8.4. Guidelines for Configuration Selection

The purpose of this sub-section is to gain some understanding about when should each configuration be used. The understanding also helps to determine CSI feedback and data sharing requirements, since different CoMP configurations require different levels of CSI feedback and data sharing. For example, if based on the BER performance, only Non-CoMP is needed, a downlink user only needs to feed back the desired channel and inter-cluster interference covariance matrix but not intercell channels,

To this end, consider the following example: there are two transmitters and two receivers (i.e., 𝑇=𝑅=2). The MMSE design of their precoders and decoders is subject to the per-transmitter power constraint with 10log10𝑃=5dB. If the desired BER threshold is 3×102, when should JP-UL, JP-DL, JP-SU, Non-CoMP, and CBF be used?

Well, looking at Figures 9(a) and 9(b), it is surprising but clear that, for Case B (partially loaded systems), Non-CoMP should always be used—even at the cell edge. (Note though that, for the per-antenna power constraint, the performance of Non-CoMP is marginally acceptable at the cell edge.) Non-CoMP is good enough; the other configurations with their greater network overheads (e.g., information exchange and synchronization) are not needed. For Case A (fully loaded systems), on the other hand, which configuration should be used depends on 𝑑𝑙𝑛 (and thus 𝛿). For small enough 𝑑𝑙𝑛,𝑙𝑛 (and thus small enough 𝛿), that is, for a cell edge type scenario, either JP-UL or JP-DL should be used. The interference is too much for Non-CoMP and CBF. However, for larger 𝑑𝑙𝑛,𝑙𝑛, Non-CoMP should be used. With respect to JP-SU, it is remarkable that, in both Cases A and B, it has no significant performance advantage over JP-UL and JP-DL and is not needed here.

Looking at Figure 10, it is clear that CBF should be used when there are a few transmitter receiver pairs, all at the cell edge, who want to have 1 data stream each. In that case, CBF’s interference management capabilities aid it in being able to satisfy the BER threshold when Non-CoMP cannot. It is also clear that for any number of transmitter receiver pairs, there will be a 𝑑 such that, when 𝑑𝑙𝑛>𝑑,𝑙𝑛, Non-CoMP is good enough and should be employed.

9. Conclusion

For developing a practical CoMP technology in future cellular systems, there are two crucial needs: a performance benchmark and a unified approach for different CoMP configurations. For the need of a performance benchmark, joint MMSE transceiver designs of various CoMP configurations are considered. The joint MMSE design is nearly optimum in maximizing sum rate. The MSE and BER performances of five CoMP systems (JP-SU, JP-DL, JP-UL, CBF, Non-CoMP) under various levels of cooperation, system loads, system sizes, and path losses are investigated thoroughly. Guidelines for CoMP configuration selection are then established. For the need of a unified approach, the GIA is proposed for performing joint MMSE transceiver designs for general MTMR MIMO systems subject to general linear power constraint. In addition, the optimum DCOA for downlink is developed to validate the optimality of the GIA results when applicable. Remarkably, the GIA is shown equivalent to the TCOA when each of them converges and the transmit covariance matrices obtained from them are of full rank. They are also shown equivalent to the DCOA when each of them converges and the decoder covariance matrices obtained from them are of full rank. This means that the GIA gives globally optimum results under the abovementioned special conditions. Convergence properties of the proposed approaches, optimality, and diversity/multiplexing tradeoff of the GIA are verified numerically.

The performance analysis of the five CoMP configurations is conducted using the GIA to provide physical insights and performance benchmark. Firstly, in the cell edge scenario, it is found that the higher the level of cooperation, the better the performance. Actually, JP-UL and JP-DL achieve essentially the same performance as JP-SU. Note that CBF and Non-CoMP considered in this paper give the achievable performance upper bound for the respective category, given same number of total transmit antennas and same number of total receive antennas.

Secondly, in the cell edge scenario, it is found that the performances of Non-CoMP and CBF are much more dependent on the number of data streams than JP-UL, JP-DL, and JP-SU. When the system is fully loaded, both Non-CoMP and CBF suffer severe interference and thus have poor performances. However, for a partially loaded, two transmitter receiver pairs, system, CBF is able to give good performances under both the per-transmitter and per-antenna power constraints. Non-CoMP also gives good performances, but only for the per-transmitter power constraint (the per-antenna power constraint turns out to be too stringent for it). Thirdly, CBF is able to take care of even more than two transmitter receiver pairs because of its superior interference management capabilities (such as its ability to perform IA-like maneuvers). Not only that, it can actually support more pairs than the upper bound for IA designs in [38]. Fourthly, it is found that the per-transmitter power constraint in the CBF configuration does not usually meet with equality for every pair. However, it always does for the Non-CoMP configuration. This phenomenon is due to the following: (a) in Non-CoMP, each pair cares only about its own MSE while, in CBF, each pair cares for the system-wide MSE and (b) increasing the power at a pair will always be good for the MSE of that pair but not necessarily good for the MSE of the entire system. Fifthly, for a given system, as the path loss of the channels corresponding to the interfering links of Non-CoMP and CBF increases, interesting trends are observed; the performances of CBF and Non-CoMP improve greatly whereas the performances of JP-UL, JP-DL, and JP-SU worsen. Actually, the MSE performances of the five configurations eventually merge together.

In addition to producing these findings, these simulations numerically put forth performance benchmarks for the JP, CBF, and Non-CoMP categories—actually, due to JP-SU, performance benchmarks are given for all CoMP configurations. Moreover, due to the use of the MMSE criterion, benchmarks are put forth for the transceiver designs under other criteria as well (such as maximum capacity and minimum BER). These simulations also provide some guidelines for configuration selection.

These performance benchmarks and guidelines are produced under ideal conditions; for example, the synchronization requirements, and so forth of the configurations are not taken into account. The modulation coding scheme (MCS) selection and CSI error are not accounted for either. Even so, they can be used to greatly simplify the complex configuration selection problem under practical conditions; they can help to show which schemes need or do not need to be considered in a particular scenario. Take, for example, the typical two BS-user pair downlink system with the users at the cell edge. In the partially loaded case, it is clear from this paper that Non-CoMP and CBF should be considered first. In the fully loaded case, it is even simpler: it is clear that JP-DL should be considered first. After such large reductions in scope as these, accounting for the various parameters (MCS, limited feedback, etc.) will thus be much more manageable to perform. Furthermore, one can use the guidelines to choose the CSI feedback and data sharing schemes, since different CoMP configurations require different levels of CSI feedback and data sharing. For example, in one of our papers, we demonstrate a practical scheme for decentralized CBF in TDD systems [39].

Appendices

A. Noise Plus Interference Covariance Matrix in Non-CoMP

Since 𝐸(𝐇𝑊,𝑖𝑐𝐌𝐇𝑊,𝑖𝑐)=tr(𝐌)𝐈𝛾𝑖 for any deterministic matrix 𝐌, the noise plus interference covariance matrix for the 𝑖th eq-receiver in Non-CoMP can be expressed as 𝚽𝐧𝑖=𝐶𝑙=1,𝑙𝑖𝐸𝐇𝑖𝑙𝐅𝑙𝑙𝚽𝐬𝑙𝑙𝐅𝑙𝑙𝐇𝑖𝑙+𝚽𝐚𝑖=𝐶𝑙=1,𝑙𝑖𝑑2𝛽𝑖𝑙𝐅tr𝑙𝑙𝚽𝐬𝑙𝑙𝐅𝑙𝑙𝐈𝑟𝑖+𝚽𝐚𝑖.(A.1) If each transmitter transmits with full power, the trace in (A.1) can be replaced by 𝑃𝑏𝑙𝑙 and the following expression is exact:𝚽𝐧𝑖=𝐶𝑙=1,𝑙𝑖𝑑2𝛽𝑖𝑙𝑃𝑏𝑙𝑙𝐈𝑟𝑖+𝚽𝐚𝑖.(A.2) Note that even when there is receive spatial correlation (not considered in (1)), (A.2) still holds. When some transmitters do not transmit with full power, (A.2) is a “worst case” approximation and is still used for the design in this paper.

B. Alternative Approach to the MMSE Transceiver Design of Non-CoMP under the Per-Antenna Power Constraint

For Non-CoMP with one data stream, this appendix shows a different approach to the MMSE transceiver design problem subject to the per-antenna power constraint. Without loss of generality, consider the 𝑖th eq-transmitter eq-receiver pair and let Φ𝐬𝑖𝑖=𝜎2𝑖𝑖𝐈𝑡𝑖 for all 𝑖. Given the MMSE decoder (18), the reduced MMSE problem can be written as min𝐅𝑖𝑖𝜎2𝑖𝑖𝐅𝑖𝑖𝐇𝑖𝑖𝚽1𝐧𝑖𝐇𝑖𝑖𝐅𝑖𝑖+11𝜎2𝑖𝑖,(B.1) or equivalently, max𝐅𝑖𝑖𝐅𝑖𝑖𝐁𝐅𝑖𝑖𝑏,𝐁=𝑚𝑛𝐇𝑖𝑖𝚽1𝐧𝑖𝐇𝑖𝑖,(B.2) subject to (12). Here, 𝑏𝑚𝑛 is the 𝑚𝑛th element of the nonnegative definite Hermitian matrix 𝐁. Expressing 𝐅𝑖𝑖 in polar form, 𝐅𝑖𝑖=𝑎1𝑃𝑖1𝜎2𝑖𝑖𝑒𝑗𝜃1𝑎𝑡𝑖𝑃𝑖𝑡𝑖𝜎2𝑖𝑖𝑒𝑗𝜃𝑡𝑖,(B.3) the original problem is further reduced to max0𝜃1,,𝜃𝑡𝑖2𝜋0𝑎1,,𝑎𝑡𝑖1𝛾,𝛾=𝑡𝑖𝑡𝑛=1𝑖𝑚=1𝑎𝑛𝑎𝑚𝑃𝑖𝑛𝑃𝑖𝑚𝑏𝑚𝑛𝑒𝑗(𝜃𝑛𝜃𝑚).(B.4) A closed-form solution can be easily obtained for solving (B.4) when 𝑡𝑖=2. For 𝑡𝑖>2, however, one generally needs to use some solvers for nonlinear equations.

Let 𝑡𝑖 = 2 and express 𝑏12=|𝑏12|𝑒𝑗(𝑏12). Then,𝛾=𝑎21𝑃𝑖1𝑏11+𝑎22𝑃𝑖2𝑏22+2𝑎1𝑎2𝑃𝑖1𝑃𝑖2||𝑏12||𝜃×cos1𝜃2𝑏12,𝑏11,𝑏220.(B.5) If 𝑏120, 𝛾 is maximized if and only if𝑎1=𝑎2=1,𝜃1𝜃2𝑏12=2𝑘𝜋,(B.6) for some integer 𝑘. If 𝑏12=0, 𝛾 is maximized if and only if 𝑎1 = 𝑎2 = 1. It is remarkable that, in this case, optimality happens only when the equality in the per-antenna power constraint in (12) is met.

Acknowledgment

Note that different parts of the work have been published in our conference papers [4047].