ISRN Communications and Networking

Volume 2012 (2012), Article ID 682090, 21 pages

http://dx.doi.org/10.5402/2012/682090

## Joint MMSE Transceiver Designs and Performance Benchmark for CoMP Transmission and Reception

Department of ECE, Polytechnic Institute of NYU, 6 Metrotech Center, Brooklyn, NY 11201, USA

Received 22 February 2012; Accepted 3 April 2012

Academic Editors: J. M. Bahi, R. Dinis, M. I. Hayee, and M. Potkonjak

Copyright © 2012 Jialing Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Coordinated Multipoint (CoMP) transmission and reception has been suggested as a key enabling technology of future cellular systems. To understand different CoMP configurations and to facilitate the configuration selection (and thus determine channel state information (CSI) feedback and data sharing requirements), performance benchmarks are needed to show what performance gains are possible. A unified approach is also needed to enable the cluster of cooperating cells to systematically take care of the transceiver design. To address these needs, the generalized iterative approach (GIA) is proposed as a unified approach for the minimum mean square error (MMSE) transceiver design of general multiple-transmitter multiple-receiver multiple-input-multiple-output (MIMO) systems subject to general linear power constraints. Moreover, the optimum decoder covariance optimization approach is proposed for downlink systems. Their optimality and relationships are established and shown numerically. Five CoMP configurations (Joint Processing-Equivalent Uplink, Joint Processing-Equivalent Downlink, Joint Processing-Equivalent Single User, Noncoordinated Multipoint, and Coordinated Beamforming) are studied and compared numerically. Physical insights, performance benchmarks, and some guidelines for CoMP configuration selection are presented.

#### 1. Introduction

Though cellular has many challenges such as multipath fading, cell edge interference, and scarce spectrum, there is a demand for even better cellular performance than what is achieved today. In order to meet this demand, revolutionary ideas are needed. Coordinated Multipoint (CoMP) transmission and reception, a type of Network MIMO (multiple-input and multiple-output) in Long-Term Evolution-Advanced (LTE-A) [1], is one of those ideas and is a key enabling technology of future cellular systems. It, being a MIMO technique, actually exploits the multipath fading. Furthermore, it lowers the cell edge interference by having potential interfering cells cooperate. And lastly, its lowering of the interference allows for better spectrum reuse and, therefore, better use of the scarce spectrum. Since there are various levels of cell cooperation, there are various CoMP configurations [1–4]. As such, the following three categories of configurations are generally considered.

The first category is Noncoordinated Multipoint (Non-CoMP) and does not use CoMP at all. In it, each base station (BS) communicates with its own user(s) and does so without cooperating with the other cells in data sharing or channel state information (CSI) exchange. Each BS either ignores or tries to estimate the intercell interference. It has the lowest level of cooperation.

The second category is Coordinated Beamforming (CBF). (In LTE-A, it is also referred to as Coordinated Scheduling and Coordinated Beamforming (CS/CB).) Here, each BS again only communicates with its own user(s) and there is no data sharing between BSs and no data sharing between users. This time though, the cells do cooperate to minimize the interference they cause to each other through coordination and joint transmitter and/or receiver design. It has the second lowest level of cooperation. Much work has been done for CBF configurations where each cell has one transmitter and receiver pair [5–12] and where each cell has one transmitter and multiple receivers [13–16]. There also are different CSI considerations (e.g., CSI only available at receivers [5–8, 16], full CSI available at a central processing unit [9–14], CSI available only on a per-cell basis [15]) and different design strategies (e.g., centralized [9–14] or distributed [15] designs).

The third category is Joint Processing (JP). Here, the cells fully cooperate; the BSs act as a single equivalent transmitter in downlink (the data is processed and transmitted jointly from the BSs) to form the Joint Processing-Equivalent Downlink (JP-DL) [17–19] and act as a single equivalent receiver in uplink (all received signals are shared and jointly processed) to form the Joint Processing-Equivalent Uplink (JP-UL) [20]. It is shown that JP-UL [20] and JP-DL [17] bring significant gains to both the cell average throughput and the cell edge user throughput. Note that JP-UL and JP-DL have higher level of cooperation than the previous two categories (Non-CoMP and CBF). When the users act as a single equivalent receiver (resp., transmitter) in downlink (resp., uplink), it forms the Joint Processing-Equivalent Single User (JP-SU), which is essentially a point-to-point MIMO system. JP-SU has the highest level of cooperation and is only of theoretical interest.

In addition, a few attempts have also been made to jointly consider different categories/configurations. For example, joint precoder and decoder designs (e.g., SINR balancing, user rate balancing and maximum sum rate) are proposed for Non-CoMP, JP-DL and CBF and numerical comparison of their ergodic sum rates is made in [21–23]. But to the best of our knowledge, there are no comparison and configuration selection guidelines for various CoMP configurations in the literature.

As seen from these previous works, the precoder and decoder designs and performance evaluation for CoMP systems can be very complex and diverse. This is due to the fact that there exist various CoMP configurations, design criteria, and constraints (e.g., the per-antenna power constraint, per-transmitter power constraint). There also exists a vast number of design approaches associated with each of the design criteria, each of the constraints, and each of the CoMP configurations. Moreover, CoMP was not considered mature and was not adopted by 3GPP in LTE release 10 [24]. Thus, performance benchmarks (which show what performance gains are possible) for CoMP configurations are needed to help determine rules for configuration selection. Since different CoMP configurations require different levels of CSI feedback and data sharing, these rules also help to determine CSI feedback and data sharing requirements. There is also a need for a unified approach to enable the cluster of cooperating cells to systematically take care of the transceiver design of whatever configuration they choose to implement. Both of these two needs will be addressed in this paper.

To address the need for performance benchmarks, we consider joint MMSE precoder and decoder designs for JP-UL, JP-DL, JP-SU, Non-CoMP, and CBF. Firstly, this is because joint MMSE designs can be considered as performance benchmarks for other practical design criteria; an MMSE solution is near optimum in some other senses (e.g., maximum sum rate [25, 26], minimum BER [27]) as well. It has been shown that maximizing the sum rate is equivalent to minimizing the *geometric mean* of the MSEs of all data streams [25]. Moreover, minimizing the sum MSE is equivalent to minimizing the upper bound of the MSEs geometric mean. Thus, the MMSE results are nearly optimum in the maximum sum rate sense. Regarding BER, it has been shown that the MMSE design minimizes the lower bound of BER [27]. In addition, the BER results of the MMSE and minimum BER designs in [26] are very comparable. So, the MMSE results are nearly optimum in the minimum BER sense as well. Though studies in [25–27] are for single-user systems, these remarks are also true for CoMP systems. Secondly, note that with full CSI, JP-SU provides a performance upper bound for all CoMP configurations with same total number of transmit antennas and same total number of receive antennas, as shown in Figure 1. Similarly, Non-CoMP and CBF, where each cell has one transmitter and receiver pair, provide performance upper bounds for their respective categories, given same total number of transmit antennas and same total number of receive antennas. Thus, the performance benchmarks can be set forth numerically for various simulation setups; these numerical performance benchmarks can then be used to compare the different configurations and/or categories.

Although not much MMSE work has been published for the CoMP configurations, joint MMSE transceiver designs for the single-user, multiuser downlink, multiuser uplink, and CBF MIMO systems have been studied. For example, for single-user MIMO systems, closed-form expressions of the MMSE design have been derived for the total power constraint [25, 26] and for the shaping constraints [28]. For uplink MIMO systems subject to the per-user power constraint, numerical solutions are provided mainly by the optimal *transmit covariance optimization approach* (*TCOA*) [29, 30] and suboptimal iterative approaches such as in [29]. For downlink systems, numerical solutions are provided mainly by iterative approaches such as in [31] for the total power constraint and in [18] for the per-antenna and per-cell power constraints. Dual uplink approaches [32–34] have also been employed for the total power constraint. Recently, for K-user MIMO interference channels (a case of CBF), a joint MMSE design subject to per-transmitter power constraint, using a linear search for each Lagrange multiplier, is proposed [35].

Note that various CoMP configurations can be considered as special cases of general multiple-transmitter multiple-receiver (MTMR) systems. In this paper, the novel *generalized iterative approach* (*GIA*) is proposed as the unified approach to take care of the MMSE design of general MTMR MIMO systems subject to general linear power constraints, including the per-transmitter power constraint and the more practical per-antenna power constraint. The *GIA* can provide tradeoff between multiplexing and diversity gains. In addition, the optimum *decoder covariance optimization approach *(*DCOA*) for the MMSE design of downlink systems (i.e., JP-SU, JP-DL, and Non-CoMP) subject to general linear power constraints is also proposed so that the optimality of the *GIA* can be studied. For this purpose, the equivalence between the *GIA* and the optimum *TCOA* [29, 30] for the uplink or *DCOA* for the downlink is established in the respective configurations.

In the numerical simulations, firstly, aspects pertaining to the proposed *approaches* are investigated. The convergence properties of the proposed approaches are investigated; the optimality and diversity/multiplexing tradeoff of the *GIA* are verified numerically; numerical comparison between the *GIA* and the approach in [35] is investigated. Secondly, aspects pertaining to performance benchmark are investigated. To set forth a benchmark among different CoMP configurations, MSE and BER performances for the five CoMP configurations (JP-SU, JP-DL, JP-UL, CBF, and Non-CoMP) are compared. Since this paper is concerned with performance benchmarks (achievable theoretical upper bounds), fairness-type criteria, and practical issues such as synchronization required by different CoMP configurations are not considered here. Various important factors (level of cooperation, system load, system size, and path loss) are studied though. The performance benchmarks and the resulting physical insights (into the mechanisms and performances of CoMP configurations) are very useful. In particular, much needed guidelines for the configuration selection process are obtained.

Notations are as follows. All boldface letters indicate vectors (lower case) or matrices (upper case). , , , , , , and stand for the transpose, conjugate transpose, inverse, trace, expectation, rank, and Frobenius norm of , respectively. denotes taking the absolute value element-wise of . represents the subspace spanned by the columns of . Matrix signifies an identity matrix with rank . Matrix signifies a zero matrix with proper dimension. denotes the diagonal matrix with elements on the main diagonal. means that is positive definite (semidefinite). denotes the Schur product of and (element-wise product of and ). CN() denotes a complex normal random variable with mean and variance . Finally, i.i.d. stands for independent and identically distributed.

#### 2. Formulation

##### 2.1. A Single Formulation for General MTMR MIMO Systems

In this subsection, we derive a single formulation to describe a general MTMR MIMO system including the five CoMP configurations (JP-UL, JP-DL, JP-SU, Non-CoMP, and CBF) investigated in this paper. Consider an MTMR MIMO system with transmitters and receivers. Let and denote the numbers of antennas at the th transmitter and the th receiver, respectively. Accounting for the path loss (spatial correlation can be easily incorporated as well but has been omitted for simplicity), the channel from the th transmitter to the th receiver is modeled as Here, denotes the distance between the th receiver and the th transmitter, and is the path loss exponent. The entries of are i.i.d. CN(0,1). Here, the subscript represents spatially white noise.

Some of the transmitters (resp., receivers) in the CoMP system may be sharing and jointly processing their data (resp., received signals). Such a collection of transmitters (resp., receivers), which are connected via backhaul, share CSI and data, and act like a single transmitter (resp., receiver) in transmission and data processing, is a *composite* transmitter (resp., receiver) and thus an *equivalent *transmitter (resp., receiver). For the sake of having a single formulation, a transmitter (resp., receiver) which does not collaborate with other transmitters (resp., receivers) in the above way is also considered to be an *equivalent *transmitter (resp., receiver). Thus, this MTMR MIMO system can also be (and will be) considered as having * equivalent *transmitters (*eq-*transmitters for short) and * equivalent *receivers (*eq-*receivers for short). Obviously, and .

Let and denote the numbers of antennas at the th *eq-*transmitter and the th *eq-*receiver, respectively. Then, and are the total numbers of transmit and receive antennas, respectively. Also let denote the composite channel matrix from the th *eq-*transmitter to the th *eq-*receiver. At the th *eq-*transmitter*,* let , , and denote the data, number of data streams, and precoder for the th *eq-*receiver, respectively. Furthermore, let and be, respectively, the source covariance matrix for and the decoder for . Which transmitter transmits to which receiver is configurable. When the th *eq-*transmitter has no data to transmit to the th* eq-*receiver, , = 0, , **,** and . When it does, is positive definite and and must be designed.

In this system, there may be multiple clusters where each cluster jointly designs the MIMO processors for its own *eq-*transmitters and *eq-*receivers but does so independently of the other clusters. There is no CSI sharing between clusters and the intercluster interference is formulated as noise. Let and define one such cluster; being the set of *eq-*transmitter indices in the cluster and being the set of *eq-*receiver indices in the cluster. and are introduced to allow a single formulation to take care of the MMSE transceiver design for different CoMP configurations. At the th *eq-*receiver, , the received signal is thus
Here, , and are the noise plus intercluster interference vector, the noise vector, and the intercluster interference vector, respectively, at the th *eq-*receiver. The interference is from all of the *eq-*transmitters which do not belong to . Thus, when there is only one cluster in the system, there is no interference and , for every . Note that, except in Non-CoMP, the possible intercell interference is implicitly included in the first term in (2), and is considered to be manageable.

##### 2.2. Five CoMP Configurations

The needed CSI feedback and data sharing in each CoMP configuration are assumed done through ideal link and of zero delay. The above single formulation is able to describe any general MTMR MIMO system including JP-UL, JP-DL, JP-SU, Non-CoMP, and CBF. There is only one cluster in JP-UL, JP-DL, JP-SU, and CBF. But, there are clusters in Non-CoMP. Without loss of generality and for convenience, Non-CoMP and CBF considered in this paper have only one transmitter-receiver pair per cluster.

###### 2.2.1. Configuration I: JP-UL

In JP-UL, the system has only one cluster and is just an equivalent uplink MIMO system, that is, there are multiple transmitters (each being an *eq-*transmitter) but only one *eq-*receiver (full cooperation among all receivers). Thus,

For both FDD and TDD systems, each BS estimates all uplink CSI and sends the CSI to a central processing unit via the backhaul (if the BSs are colocated, the backhaul is not needed). The central processing unit performs the system-wide transceiver design and sends each user its optimized precoder through the serving BS. Each user uses the received precoder for transmitting data. Lastly, the BSs share their received signals with the central processing unit for joint decoding.

###### 2.2.2. Configuration II: JP-DL

In JP-DL, the system has only one cluster and is just an equivalent downlink MIMO system, that is, there are multiple receivers (each being an *eq-*receiver) but only one *eq-*transmitter (full cooperation among all transmitters). Thus,

In TDD systems, the BSs estimate downlink CSI through reciprocity. In FDD systems, each user estimates all intracluster downlink CSI and feeds back the CSI to its serving BS. After obtaining the CSI, each BS sends the CSI to a central processing unit via the backhaul (if the BSs are co-located, the backhaul is not needed). The central processing unit performs the system-wide transceiver design and sends the optimized precoders and decoders to the BSs. Each BS uses the optimized precoder for transmitting data. Each BS also sends the decoder to its users for processing the received data.

###### 2.2.3. Configuration III: JP-SU

In JP-SU, essentially a point-to-point MIMO system, there is only one *eq-*transmitter (full cooperation among all transmitters) and only one *eq-*receiver (full cooperation among all receivers). It is only of theoretical interest (showing performance upper bound for all CoMP systems) and the signaling issues are irrelevant and omitted. It is assumed that a central processing unit knows all the channels and performs the system-wide transceiver design. Thus,

###### 2.2.4. Configuration IV: Non-CoMP

In Non-CoMP, each transmitter (being an *eq*-transmitter) is paired with a unique receiver (being an *eq*-receiver). Each pair is a cluster of the system, so the intercell interference is the inter-cluster interference. Thus, pairwise transceiver design is performed and the system with * eq-*transmitter *eq-*receiver pairs () is decoupled into single user clusters with the th one being

In TDD systems, each transmitter estimates the forward link CSI through reciprocity. The transmitter performs the joint transceiver design and sends the decoder to the receiver. In FDD systems, each receiver estimates the forward link CSI and sends the estimated information to the transmitter. Both transmitter and receiver can independently perform the joint transceiver design. The transmitter will use the resulting precoder to transmit data and the receiver will use the decoder to process the received data.

###### 2.2.5. Configuration V: CBF

Like Non-CoMP, there are multiple pairs of transmitters and receivers in CBF. However, unlike Non-CoMP, there is only one cluster here. Note that in CBF, for and the BSs do not share data. The CSI acquisition and signaling requirement in uplink (resp., downlink) for a central processing unit are the same as in JP-UL (resp., JP-DL). The central processing unit performs the system-wide transceiver design. Thus,
Note that, for the composite channel matrix in (4)–(8), the subscript is the *eq*-receiver index and the subscript is the *eq*-transmitter index. However, for the channel matrix , the subscript is the receiver index and the subscript is the transmitter index.

##### 2.3. MMSE Design Subject to General Linear Power Constraints

For a given cluster, define the MSE with respect to the th* eq-*receiver and the th *eq-*transmitter, , as
Note that when the th *eq*-transmitter has no data for the th* eq*-receiver, . The sum MSE is

###### 2.3.1. MMSE Problem

We will jointly choose to minimize the sum MSE *η*:
subject to general linear power constraints, for example, the per-antenna power constraint at the th *eq*-transmitter
or the per-transmitter power constraint at the th transmitter of the th *eq*-transmitter,
Here, denotes the set of all cooperating transmitters that form the th *eq*-transmitter. When there is only one element in , that is, , in (13). When there are more than one element in , is a matrix whose entries are all equal to zero except for the diagonal elements corresponding to the antennas of the th transmitter. The values of these nonzero diagonal elements are equal to one.

###### 2.3.2. Augmented Cost Function

To solve (11) subject to (12) or (13), one can use the method of Lagrange multipliers to set up the augmented cost function for general linear power constraints where represents the Lagrange multipliers. Only the widely considered per-transmitter power constraint and the practical per-antenna power constraint are given as examples. For the per-antenna power constraint in (12), For the per-transmitter power constraint in (13), let . Thus

##### 2.4. MMSE Decoders and Precoders

Define the noise covariance matrix and the noise plus interference covariance matrix at the th* eq-*receiver as and , respectively. Assume is known. Therefore, is also known in JP-SU, JP-UL, JP-DL and CBF because . In Non-CoMP, can be estimated explicitly as , and (see Appendix A).

After some math manipulations, (9) becomes There are two possible directions to solve the MMSE problem.

###### 2.4.1. MMSE Decoder

On one hand, for a given set of precoders , setting the gradient of in (10) with respect to equal to zero yields the MMSE decoder for , : Substituting (18) into (17), in (10) is reduced to The augmented cost function in (14) is also reduced to Note that in (19) and in (20) are merely functions of precoders (and Lagrange multipliers .

###### 2.4.2. MMSE Precoder

On the other hand, for a given set of decoders and Lagrange multipliers , setting the gradient of in (14) with respect to equal to zero yields the MMSE precoder for : Substituting (21) into (14), the augmented cost function in (14) is reduced to Note that in (22) is merely a function of precoders and Lagrange multipliers .

###### 2.4.3. Transmit and Decoder Covariance Matrices

When the nonzero source covariance matrices are diagonal matrices with the same diagonal elements (i.e., ), replacing by ( is an arbitrary unitary matrix with proper dimension) does not change the power constraint (12) or (13). Furthermore, . Define the transmit covariance matrices as and the decoder covariance matrices as Essentially, for arbitrary unitary matrices . Therefore, the transmit and decoder covariance matrices can be used to determine the MSE (in fact, the transmit and decoder covariance matrices also determine the achievable sum rate) and consequently determine the precoders and decoders. Thus, if the transmit covariance matrices which minimize the MSE are found, the precoders can be obtained using (23) and the decoders can be obtained from (18). Similarly, if the decoder covariance matrices which minimize the MSE are found, the decoders can be obtained using (24) and the precoders can be obtained from (21).

#### 3. Unified Approach for General MTMR MIMO Systems

The *GIA* is proposed as a unified approach for the MMSE design for general MTMR MIMO systems. It is motivated by the fact that, if the Lagrange multipliers in (21) are known, we can solve the coupled equations (18) and (21) iteratively for the decoders and precoders . Note that, in most literatures (e.g., [35]), the Lagrange multipliers are obtained through linear search, in which the search space increases significantly as the system size increases. We herein propose a much more efficient approach using an explicit expression for the Lagrange multipliers.

To obtain an explicit expression for the Lagrange multipliers , , set the gradient of in (20) with respect to equal to zero and then left-multiply the resulting equation with . Once this is done for each , sum them all up to obtain the following equation: Utilizing (12), for the per-antenna power constraint, Utilizing (13), for the per-transmitter power constraint, Note that the usage of (27) or (28) enforces the corresponding complementary slackness conditions

With the explicit expression for the Lagrange multipliers in (27) or (28) in hand, a *GIA* can be developed. There are three steps in each iteration of the *GIA. *

*Step 1. *Given , obtain using (18).

*Step 2. *Given , obtain using (27) or (28).

*Step 3. *Given and , obtain using (21).

The iterative procedure of the *GIA* stops when the Karesh-Kuhn-Tucker (KKT) conditions are all satisfied, that is, when the following three requirements are fulfilled: one, the MSE no longer decreases; two, each precoder (decoder) converges; three, the transmission powers at the transmitter(s) meet the desired power constraints. Since the MSE has a lower bound at zero and each of the *GIA* steps actually enforces one of the KKT conditions of the MMSE problem, the *GIA* can converge quickly to a local minimum at low powers. At high transmit powers, a scaling initialization (scaling the MMSE MIMO precoders and decoders given by the *GIA* at lower powers) is very effective and efficient. Note that the *GIA* can deal with arbitrary source covariance matrices , thus allowing , the number of data streams intended from the th *eq-*transmitter to the th *eq-*receiver to be prespecified for all . Since the numbers of data streams can be pre-specified, the *GIA *allows for tradeoff between diversity and multiplexing gains.

#### 4. Optimum Approaches for Special MTMR Systems

When the source covariance matrices are diagonal matrices with the same diagonal elements, that is, , optimum approaches for the MMSE design subject to the general linear power constraints may be developed for special MTMR systems: uplink systems (e.g., JP-UL, JP-SU, and Non-CoMP where has only one element) in Section 4.1 and downlink systems (e.g., JP-DL, JP-SU, and Non-CoMP where has only one element) in Section 4.2. For convenience and without loss of generality, in the section, we assume .

##### 4.1. *TCOA* [29, 30] for Systems with One Eq-Receiver

The *TCOA* [29, 30] can be used for JP-UL, JP-SU, and Non-CoMP where has only one element (but not for JP-DL and CBF) under general linear power constraint. (Note that in [30, 31], the *TCOA* is only for the per-user power constraint. We use it here to deal with the per-antenna power constraint.) It is motivated by the fact that the MMSE problem may be solved by searching for the transmit covariance matrices to jointly minimize in (19). The optimum numbers of data streams are determined by the rank of optimum . The *TCOA* [30] can be reformulated in terms of an SDP formulation which can be solved numerically by SDP solvers (such as SeDuMi [36] and Yalmip [37]) in polynomial time.

##### 4.2. *DCOA* for Systems with One *Eq*-Transmitter

The *DCOA* can be developed for JP-DL, JP-SU, and Non-CoMP where has only one element (but not for JP-UL and CBF). It is motivated by the fact that the MMSE problem may be solved by searching for the decoder covariance matrices to jointly minimize in (22). Using (24), in (22) becomes
where
The MMSE transceiver design problem becomes
The problem in (33) is not cing with the numbers of data streams, that is, . Allowing to be unspecified, we obtain the rank-relaxed decoder covariance optimization problem:
The cost function in (34) is convex with respect to and concave with respect to . Define as the primal problem and as the dual problem. Since both the primal problem and the dual problem are convex and strictly feasible, strong duality holds, that is, the optimum values of , , and obtained from the primal problem are the same as those obtained from the dual problem.

###### 4.2.1. Primal-Dual Algorithm

We propose a novel *primal-dual algorithm* to solve the rank-relaxed decoder covariance optimization problem in (34). Denote the feasible set of values for as the primal domain and the feasible set of values for as the dual domain. In short, the approach consists of iterating between a primal domain step and a dual domain step. (Both subproblems, defined in (30) and (31), are convex because their cost functions are convex and concave, respectively, and their constraints are all linear matrix inequalities. The solution of each sub-problem is optimum for that sub-problem.) For the th iteration:

*Primal Domain Substep*

Given , find the which solves

*Dual Domain Substep*

Given , find the which solves
The convexity of the rank-relaxed decoder covariance optimization problem guarantees the solution provided by the *primal-dual algorithm* is a global optimum. The iterative procedure stops when the ’s corresponding to the primal domain step and the dual domain step converge to the same value and when converge and converge. In practice, the *DCOA* given by solving (35) and (36) is considered to have converged at the ()th iteration when , and the duality gap of the values of derived from the two steps
are less than some pre-specified thresholds. Note that, in all this, the power constraints have been accounted for by the Lagrange multipliers. The optimum numbers of data streams are determined by the rank of optimum .

###### 4.2.2. Two-Semidefinite Programming (Two-SDP) Procedure

Similar to the *TCOA* [30] in uplink, (35) and (36) can be reformulated in terms of the SDP formulation:
Both (38) and (39) can be solved numerically by SDP solvers (such as SeDuMi [36] and Yalmip [37]) in polynomial time. However, the *primal-dual algorithm* of the *DCOA* needs both the primal and dual sub-problems to be solved in each iteration. This leads to high computational complexity. Furthermore, the *Two-SDP Procedure* is sensitive to the numerical precisions of the SDP solvers. It works well at low transmit powers, but the duality gap cannot be made arbitrarily small at high transmit powers due to insufficient numerical precisions of the SDP solvers available in public. Nevertheless, a very important contribution here is that the MMSE transceiver design under general linear power constraints provided by the *Two-SDP Procedure* is optimal for downlink.

###### 4.2.3. Numerically Efficient Procedure

To reduce the computational complexity and improve the convergence properties of the *Two-SDP Procedure*, the SDP formulation in (38) is still employed to solve for the primal domain step in (35). And we employ the explicit expressions of derived as follows for the dual domain step in (36).

Substituting (18) into (24) and using (23), we obtain Similarly, substituting (21) into (23) and using (24), we obtain To remove the dependence of on , substitute (41) into (40) to yield Similarly, substituting (23) into in (26) and using (41), we can express the Lagrange multipliers in (27) or (28) in terms of .

#### 5. Equivalence among the Proposed Approaches and Optimality of *GIA*

In this section, we focus the discussions on the optimality of and the relationships between the *GIA*, *TCOA,* and *DCOA*. Then, the optimality of the *GIA* can be established.

##### 5.1. Equivalence of the *TCOA* and *GIA* for Systems with One *Eq*-Receiver

When the *TCOA* is applicable and the transmit covariance matrices obtained from the MMSE designs are of full rank, the *TCOA* and* GIA* are equivalent. Consequently, the solution of the *GIA *is actually optimum because the solution of the *TCOA* is optimum.

To prove the equivalence between the *TCOA *and *GIA*, it suffices to show that the KKT conditions of the two approaches are equivalent. This is because the *TCOA* is a convex approach. The KKT conditions common to both approaches are (18), the power constraint (12) or (13), the complementary slackness condition (29) or (30), and the nonnegativeness of the Lagrange multipliers. To obtain the unique KKT condition of the *TCOA*, we set up the following augmented cost function to include the nonnegative definite constraint on :
where are the Lagrange multipliers satisfying , . When are of full rank, the Lagragian variables are zero matrices. Making the gradients of (43) with respect to to be zeros, we have
The task of showing the equivalence of the KKT conditions of the two approaches boils down to showing that the above KKT condition of the *TCOA*, (44), can be derived from (and can be used to derive) the KKT conditions unique to the *GIA*, (21). Substitute (18) and (23) into (21) to obtain
Then right multiply (45) by to get
With some matrix manipulations, we can show that (46) and (44) are equivalent. Since (21) and (44) can be derived from each other, this proof is complete. The above proof is done assuming with . It is also applicable when .

##### 5.2. Equivalence of the *DCOA* and *GIA* for Systems with One *Eq*-Transmitter

When the *DCOA* is applicable and the decoder covariance matrices obtained from the MMSE designs are of full rank, the *DCOA* and* GIA* are equivalent. Consequently, the solution of the *GIA *is actually optimum because the solution given by the *DCOA* is optimal.

To prove the equivalence between the *DCOA *and *GIA*, it suffices to show that the KKT conditions of the two approaches are equivalent. This is because the *DCOA* is a convex approach, so that its KKT conditions are sufficient conditions for optimality. The KKT conditions common to both approaches are (21), the power constraint (12) or (13), the complementary slackness condition (29) or (30), and the non-negativeness of the Lagrange Multipliers. To obtain the unique KKT condition of the *DCOA*, we set up the following augmented cost function from (34) to include the non-negative definite constraint on
where are the Lagrange multipliers satisfying tr(, . When are of full rank, the Lagrange variables are zero matrices. Making the gradients of (47) with respect to to be zeros, we have
The task of showing the equivalence of the KKT conditions of the two approaches boils down to showing that the above KKT condition of the *DCOA*, (48), can be derived from (and can be used to derive) the KKT conditions unique to the *GIA*, (18). Substitute (21) and (24) into (18) to obtain
Then left-multiply (49) by to get
With some matrix manipulations, we can show that (50) and (48) are equivalent. Since (18) and (48) can be derived from each other, this proof is complete. The above proof is done assuming with . It is also applicable when .

#### 6. Simulation Setup

In all of the simulations, the noise and nonzero source covariance matrices, and , are all identity matrices of dimension and , respectively. The nonzero source (data) vectors consist entirely of uncoded binary phase shift keying (BPSK) modulated bits. For the per-antenna power constraint, (see (12)), and for the per-transmitter power constraint (see (13)). Thus, the maximum transmission power from the th transmitter is always the same (i.e., ) for both power constraints in (12) and (13).

Without loss of generality, in all of the simulations, the numbers of transmitters and receivers are the same and each cell has only one transmitter and receiver. Since the transmitter in the th cell always (no matter which configuration) has data for the receiver in the th cell, they are labeled the th transmitter and receiver, respectively. Furthermore, for simplicity, (see (1)) is normalized to be equal to 1 for all . Since all other links are possibly (depending on the configuration) interfering links, they are normalized such that . Again, for the sake of simplicity, all ’s, , are set equal thus giving rise to the parameter Note that, in a cellular context, the users (base stations) are the receivers (transmitters) in downlink and the transmitters (receivers) in uplink. Thus, () means that all of the users are cell edge users (system is in a cell edge scenario). Furthermore, as increases, increases and each user moves away from the cell edge toward its own base station. In all of the simulations, in the path loss model of (1).

All of the setups used in these simulations for the five CoMP configurations are defined in Table 1. (Note though that the distances are not specified in these baseline setups because they are example dependent.) For each CoMP configuration, there are various setups. The differences between the different setups for a particular CoMP configuration are marked in bold. For example, for JP-UL, setups 1a and 1b are exactly the same except for the values of and . Unlike setups 1a–3b where each setup corresponds to only one configuration, setups 4a, 4b, 5a, and 5b can correspond to either Non-CoMP or CBF. Thus, to help distinguish whether a setup belongs to Non-CoMP or CBF, the name of the configuration is placed next to the setup number, for example, 5a (Non-CoMP) denotes setup 5a for Non-CoMP.

Note that not every approach can be used for every configuration and every setup in Table 1. Also note that the channel matrices generated numerically usually have full column and/or row rank. This in general results in maximum feasible rank transmit covariance matrices and/or decoder covariance matrices in the MMSE designs if the numbers of data streams are not pre-specified. Therefore, in such cases, the *TCOA* and *DCOA* are applicable in corresponding setups. The applicability of the proposed approaches in the setups is summarized in Table 2, where “Y” means an approach is applicable in a setup while “N” means it is not.

One last note, the results for setup 4b (Non-CoMP) under the per-antenna power constraint are obtained using the optimum closed-form solution (see Appendix B). The results for setups 5a (Non-CoMP) and 5b (Non-CoMP) can also be obtained by the optimum closed-form solution. But, they are omitted for the clarity of the figures.

#### 7. Investigation into the Proposed Approaches

In this section, the convergence properties, optimality, and diversity/multiplexing tradeoff of the *GIA*, and numerical comparison of the *GIA* with the approach in [35] for CBF are investigated. All results except for the ones in Section 7.1 are obtained by averaging over 20 channel realizations. These results are consistent with those obtained by averaging over more channel realizations.

##### 7.1. Convergence Properties of the Approaches

Consider setup 3a (JP-SU). All approaches are applicable. The convergence property (expressed as MSE, , and ) of the *GIA *for the per-antenna power constraint for one set of channel realizations is shown in Figure 2. The difference in decoders and the difference in the per-antenna power constraint between the th and ()th iteration are defined as

The convergence property for the per-transmitter power constraint is similar and is omitted due to page limit. As shown in Figure 2, both the MSE and converge quickly. It is remarkable that the s converge much slower in higher power. This is due to the fact that, when increases, the Lagrange multipliers decrease quickly (see (27) or (28)). Note that the usage of (27) or (28) enforces the corresponding complementary slackness conditions (29) or (30). For large ’s, the Lagrange multipliers are very small. For example, when dB, they can be as small as . Thus, the number of iteration increases drastically as increases if equality in the power constraints in (12) or (13) is insisted. The slow convergence behavior of the ’s is also observed in other configurations.

In Non-CoMP and CBF, the power constraints may not be met with equality for the MMSE results (where the corresponding Lagrange multipliers are essentially zeros). Although the Lagrange multipliers are formulated in this paper using equality power constraints to derive explicit expressions of the Lagrange multipliers, the *GIA *can be in fact used to solve inequality power constraints. When the equality of a particular power constraint is not met, the corresponding Lagrange multiplier becomes zero (which shows the complementary slackness condition).

For the *DCOA*, the convergence properties of the *Two-SDP Procedure* and *Numerically Efficient Procedure*, using SDP solvers SeDuMi [36] and Yalmip [37], are shown in Figure 3 for setup 3a (JP-SU) for the per-antenna power constraint for one set of channel realizations. It is found (from observing the convergence rates of the duality gap in (37) and the antenna powers in Figure 3) that the *Numerically Efficient Procedure* converges faster than the *Two-SDP Procedure*.

##### 7.2. Optimality of the *GIA*

This sub-section investigates numerically the equivalence relationships stated in Section 5 and verifies the optimality of the *GIA*. Only examples for the per-antenna power constraints are shown for simplicity. In setup 1a (JP-UL), the MSE curves of the *GIA *and *TCOA* merge in the left sub-plot of Figure 4. The *GIA* is equivalent to the *TCOA* and yields the globally optimum solution. On the other hand, in setup 2a (JP-DL), the MSE curves of the *GIA *and *DCOA* merge in the right subplot of Figure 4. The *GIA* is equivalent to the *DCOA* and yields the globally optimum solution. Similarly, in setups 3a, 3c, and 3d (JP-SU) (see Figure 5), the MSE curves of all approaches merge. The *GIA* is equivalent to both the *TCOA* and *DCOA *and yields globally optimum solution.

##### 7.3. Diversity/Multiplexing Tradeoff by the *GIA*

In setups 1a (JP-UL), 2a (JP-DL), and 3a (JP-SU), the *GIA* is able to transmit the maximum number of data streams as other proposed approaches. On the other hand, in setups 1b (JP-UL), 2b (JP-DL), and 3b (JP-SU), the *GIA* is also able to transmit a fewer number of data streams resulting in a lower MSE and BER performance (see the dashed curves in Figures 4 and 5), while the other proposed approaches are not applicable. In other words, the *GIA* is able to, unlike the other approaches, provide a tradeoff between multiplexing gain and diversity gain.

##### 7.4. Comparison between the *GIA* and the Approach in [35]

As in Section 7.1, our proposed *GIA* in fact can solve the inequality power constraint. So, both our proposed *GIA* and the approach in [35] are 3-step iteratively approaches applicable in CBF with the per-transmitter power constraint. The only difference is the way of finding the Lagrange multipliers. Reference [35] uses a linear search method to find the Lagrange multipliers when the equality power constraint is enforced, while the *GIA* uses a more efficient explicit expression (28). In setup 5a (CBF), the MSE (BER) curves of the *GIA* and the approach in [35] merge, as in Figure 6. It shows that the *GIA* performs as good as the approach in [35] numerically, but is more efficient. Furthermore, the approach in [35] is only applicable with the per-transmitter power constraint while the *GIA* can deal with the more practical per-antenna power constraint.

#### 8. Performance Benchmark

As in the previous section, the proposed unified approach, the *GIA*, is applicable to all setups. It is optimal when the number of data streams is equal to the rank of the channel, and it provides diversity gain when the number of data streams is less than the rank of the channel (e.g., in setups 1b, 2b, and 3b). In this section, all results are generated using the *GIA* for simplicity. The performances of the five different CoMP configurations will be studied. In particular, the impacts of the level of cooperation (Section 8.1), system load (Sections 8.1 and 8.3), system size (Sections 8.2 and 8.3), and severity of the path loss (Section 8.3) on the performance are analyzed and used to come up with some guidelines for configuration selection (Section 8.4). All of the MSE and BER results are obtained by averaging over 20 channel realizations. These results are consistent with those obtained by averaging over more channel realizations.

##### 8.1. Impact of the Level of Cooperation and System Load

To understand the impact of different levels of cooperation on the performance of MTMR MIMO systems, we compare the performance of the five configurations. Case A consists of setups 1a (JP-UL), 2a (JP-DL), 3a (JP-SU), 4a (Non-CoMP), and 4a (CBF), and Case B consists of setups 1b (JP-UL), 2b (JP-DL), 3b (JP-SU), 4b (Non-CoMP), and 4b (CBF). For all of the setups in Cases A and B, the total number of transmit (receive) antennas are the same, the power constraints are the same, and the distances are the same ( for ). (Note that this choice of makes . It also makes all of the users be at the cell edge). The difference between the two cases lies in the number of data streams transmitted; all setups in Case A have four data streams transmitted in total (i.e., fully loaded systems) while all setups in Case B have two data streams transmitted in total (i.e., partially loaded systems). Figures 7(a) and 7(b) show the MSE and BER results, respectively.

Before comparing the results of Case A and Case B, let us compare the individual setups within each case first. Firstly, observe that, in both cases, the performance order of the configurations is exactly the same as the level of cooperation order. The performance improves as the level of cooperation increases. Note that, the MSE and BER performance order agrees with that of the ergodic sum rate in [22, 23]. Secondly, note that in both cases, the per-transmitter power constraint in CBF does not usually meet with equality for every pair. However, it always does for the Non-CoMP one. The reason is quite interesting. In Non-CoMP, each pair designs its precoder and decoder to minimize its own MSE. Thus, there is no reason for any of the pairs to limit their transmit power. However, in CBF, all the pairs jointly design their precoders and decoders to minimize the system-wide MSE. Thus, it may not be always beneficial for all transmitters to transmit on full power since the mutual interference may be large. Thirdly, note that both the per-transmitter and per-antenna power constraints usually meet with equality for the three JP configurations.

With that done, let us now compare the results of Cases A and B. The first observation is that limiting the numbers of data streams is crucial for the performance. The second observation is that, in Case B, the MSE performances of CBF and the *higher* level of cooperation configurations (JP-UL, JP-DL, and JP-SU) are actually similar at high transmit power. The last observation, somewhat related to the first, is that the performances of Non-CoMP and CBF are much more dependent on the number of data streams than JP-UL, JP-DL, and JP-SU. Comments similar to this last observation are made in [22, 23] for the ergodic sum rate results of JP-DL and CBF with multiple receivers per cell.

The difference in the BERs of Non-CoMP and CBF between the two cases is remarkable and can be explained as follows. Using (2) and (3d), we have
where is the soft output data at the th *eq-*receiver. As can be easily seen, is the desired term, is the interference term, and is the noise term. Since each of the channels is and will be of full rank with probability 1, their nonsingularity will be assumed throughout this explanation.

In Case A, the th receiver, , needs (the effective channel from input data to output data) to be of full rank in order to successfully receive its two data streams. But, if is of full rank for both receivers (i.e., for ), , are of full rank as well. Thus, the interference and desired signals *cannot* be separated. If the interference is significant, as is likely at the cell edge, the performance will suffer greatly. On the other hand, it is possible in Case B for both pairs to successfully receive each of their data streams and null out the interference. This is because and therefore span() is not necessarily equal to span(), . In CBF, the precoders can be chosen to steer , away from and the decoders can be chosen to sufficiently null out , . In Non-CoMP, the th pair does not know , but it knows the estimated noise plus interference covariance matrix (see Appendix A). It can therefore design and based on its knowledge of . As can be seen, the performance of Non-CoMP is quite good under the per-transmitter power constraint; it is poor under the more stringent per-antenna power constraint though.

##### 8.2. Impact of System Size (the Number of Transmitter Receiver Pairs)

To gain some understanding on what happens when the number of transmitter receiver pairs increases, we consider five different setups: 4b (Non-CoMP), 5a (Non-CoMP), 4b (CBF), 5a (CBF), and 5b (CBF) in Table 1. For convenience, we choose for (cell edge scenario). Figure 8 shows the resulting MSEs and BERs. Note that the maximum antenna power is in all of the setups. The normalized MSE shown in Figure 8 is defined to be the average MSE per data stream.

Firstly, we compare the results of CBFs setups 4b, 5a, and 5b to see the performance degradation when more transmitter receiver pairs join the wireless environment. Consider setup 4b (CBF) as a baseline system. We observe that setups 5a (CBF) and 5b (CBF), respectively, have 2–5 dB and 7–14 dB loss in the normalized MSE results. In addition, the BER results of setups 5a (CBF) and 5b (CBF) have smaller diversity gains (absolute values of the slopes) than setup 4b (CBF). However, more data streams are transmitted in setups 5a and 5b.

How does CBF handle the (setup 5a) and (setup 5b) systems when each node has only 2 antennas? Does it perform IA, that is, does its precoders and decoders satisfy and , , [9–12, 38]? Well, MMSE designs are more general than IA because IA is not always feasible and does not take into account arbitrary . But, even so, the MMSE design is seen, at times, to exhibit IA-like features, that is, the interference projections, , are steered by the MMSE design such that they lie *predominantly* in a subspace not containing the signal projection, . As to be expected, the MMSE decoders take into account both the noise and interference—not merely always nulling out the interference as the IA conditions would dictate. In addition, better IA is generally achieved at higher transmit SNR’s due to the reduction in the significance of the noise. Furthermore, it is seen that our MMSE design supports more transmitter receiver pairs than [38]’s upper bound for IA designs.

Secondly, we compare Non-CoMP and CBF to see how important joint system-wide transceiver design is to systems with more than 2 transmitter-receiver pairs. BER-wise, it can be seen that, under the per-transmitter power constraint, the best curve for Non-CoMP (the setup 4b (Non-CoMP) one) only has a 1 dB gain over the worst of CBF curves. Actually, only 2 transmitter receiver pairs are communicating in setup 4b (Non-CoMP) as opposed to the 4 transmitter receiver pairs in setup 5b (CBF). When under the per-antenna constraint, *all *of the CBF BER curves are better than the *best* Non-CoMP one. Furthermore, the performance for setup 5a (Non-CoMP) is terrible. Thus, it is clear that joint system-wide transceiver design can greatly help systems with multiple transmitter receiver pairs by mitigating multiple intercell interferences.

##### 8.3. Impact of the Path Loss

Firstly, using Cases A and B (as defined in Section 8.1), the system performance of all five CoMP configurations under different path losses and system loads is studied. As such, , varies between 1 and 4 ( as always). Figures 9(a) and 9(b) show, respectively, the MSE and BER results against , for .

In both Cases A and B, as ,, (and thus ) gets larger, the performances of both Non-CoMP and CBF improve while the performances of JP-UL, JP-DL, and JP-SU worsen. This is because , corresponds to interference channels (channels which do not carry desired data) in Non-CoMP and CBF and to desired channels (channels which can carry desired data) in JP-UL, JP-DL, and JP-SU. As , (and thus ) increases, the path losses of the interference channels increase for Non-CoMP and CBF and the path losses of some of the desired channels increase for JP-UL, JP-DL and JP-SU. Actually, the MSE performances of the five configurations eventually merge when , (and thus ) is large. This is because the system essentially ends up consisting of two independent and interference-free transmitter-receiver pairs when , (and thus ) is large enough. It is remarkable that this merging of performances can already be seen when , in Case A and when , in Case B. It is also remarkable (but to be expected) that this merging phenomenon of JP-DL and CBF is also seen with ergodic sum rates in [22, 23].

Secondly, using the five setups (4b (Non-CoMP), 5a (Non-CoMP), 4b (CBF), 5a (CBF), and 5b (CBF)) employed in Section 8.2, further path loss studies are conducted for Non-CoMP and CBF with respect to different system sizes. With , , Figure 10 shows the MSE and BER results against . As , (and thus ) gets larger, it is clearly seen that the performances of the setups improve and merge together. This behavior is because , corresponds to the interference channels for both Non-CoMP and CBF. As , increases, both the inter-pair interference and the importance of joint design across the pairs decrease.

##### 8.4. Guidelines for Configuration Selection

The purpose of this sub-section is to gain some understanding about when should each configuration be used. The understanding also helps to determine CSI feedback and data sharing requirements, since different CoMP configurations require different levels of CSI feedback and data sharing. For example, if based on the BER performance, only Non-CoMP is needed, a downlink user only needs to feed back the desired channel and inter-cluster interference covariance matrix but not intercell channels,

To this end, consider the following example: there are two transmitters and two receivers (i.e., ). The MMSE design of their precoders and decoders is subject to the per-transmitter power constraint with . If the desired BER threshold is , when should JP-UL, JP-DL, JP-SU, Non-CoMP, and CBF be used?

Well, looking at Figures 9(a) and 9(b), it is surprising but clear that, for Case B (partially loaded systems), Non-CoMP should always be used—even at the cell edge. (Note though that, for the per-antenna power constraint, the performance of Non-CoMP is marginally acceptable at the cell edge.) Non-CoMP is good enough; the other configurations with their greater network overheads (e.g., information exchange and synchronization) are not needed. For Case A (fully loaded systems), on the other hand, which configuration should be used depends on (and thus ). For small enough (and thus small enough ), that is, for a cell edge type scenario, either JP-UL or JP-DL should be used. The interference is too much for Non-CoMP and CBF. However, for larger , Non-CoMP should be used. With respect to JP-SU, it is remarkable that, in both Cases A and B, it has no significant performance advantage over JP-UL and JP-DL and is not needed here.

Looking at Figure 10, it is clear that CBF should be used when there are a few transmitter receiver pairs, all at the cell edge, who want to have 1 data stream each. In that case, CBF’s interference management capabilities aid it in being able to satisfy the BER threshold when Non-CoMP cannot. It is also clear that for any number of transmitter receiver pairs, there will be a such that, when , Non-CoMP is good enough and should be employed*. *

#### 9. Conclusion

For developing a practical CoMP technology in future cellular systems, there are two crucial needs: a performance benchmark and a unified approach for different CoMP configurations. For the need of a performance benchmark, joint MMSE transceiver designs of various CoMP configurations are considered. The joint MMSE design is nearly optimum in maximizing sum rate. The MSE and BER performances of five CoMP systems (JP-SU, JP-DL, JP-UL, CBF, Non-CoMP) under various levels of cooperation, system loads, system sizes, and path losses are investigated thoroughly. Guidelines for CoMP configuration selection are then established. For the need of a unified approach, the *GIA* is proposed for performing joint MMSE transceiver designs for general MTMR MIMO systems subject to general linear power constraint. In addition, the optimum *DCOA* for downlink is developed to validate the optimality of the *GIA *results when applicable. Remarkably, the *GIA* is shown equivalent to the *TCOA *when each of them converges and the transmit covariance matrices obtained from them are of full rank. They are also shown equivalent to the *DCOA* when each of them converges and the decoder covariance matrices obtained from them are of full rank. This means that the *GIA* gives globally optimum results under the abovementioned special conditions. Convergence properties of the proposed approaches, optimality, and diversity/multiplexing tradeoff of the *GIA *are verified numerically.

The performance analysis of the five CoMP configurations is conducted using the *GIA *to provide physical insights and performance benchmark. Firstly, in the cell edge scenario, it is found that the higher the level of cooperation, the better the performance. Actually, JP-UL and JP-DL achieve essentially the same performance as JP-SU. Note that CBF and Non-CoMP considered in this paper give the achievable performance upper bound for the respective category, given same number of total transmit antennas and same number of total receive antennas.

Secondly, in the cell edge scenario, it is found that the performances of Non-CoMP and CBF are much more dependent on the number of data streams than JP-UL, JP-DL, and JP-SU. When the system is fully loaded, both Non-CoMP and CBF suffer severe interference and thus have poor performances. However, for a partially loaded, two transmitter receiver pairs, system, CBF is able to give good performances under both the per-transmitter and per-antenna power constraints. Non-CoMP also gives good performances, but only for the per-transmitter power constraint (the per-antenna power constraint turns out to be too stringent for it). Thirdly, CBF is able to take care of even more than two transmitter receiver pairs because of its superior interference management capabilities (such as its ability to perform IA-like maneuvers). Not only that, it can actually support more pairs than the upper bound for IA designs in [38]. Fourthly, it is found that the per-transmitter power constraint in the CBF configuration does not usually meet with equality for every pair. However, it always does for the Non-CoMP configuration. This phenomenon is due to the following: (a) in Non-CoMP, each pair cares only about its own MSE while, in CBF, each pair cares for the system-wide MSE and (b) increasing the power at a pair will always be good for the MSE of that pair but not necessarily good for the MSE of the entire system. Fifthly, for a given system, as the path loss of the channels corresponding to the interfering links of Non-CoMP and CBF increases, interesting trends are observed; the performances of CBF and Non-CoMP improve greatly whereas the performances of JP-UL, JP-DL, and JP-SU worsen. Actually, the MSE performances of the five configurations eventually merge together.

In addition to producing these findings, these simulations numerically put forth performance benchmarks for the JP, CBF, and Non-CoMP categories—actually, due to JP-SU, performance benchmarks are given for all CoMP configurations. Moreover, due to the use of the MMSE criterion, benchmarks are put forth for the transceiver designs under other criteria as well (such as maximum capacity and minimum BER). These simulations also provide some guidelines for configuration selection.

These performance benchmarks and guidelines are produced under ideal conditions; for example, the synchronization requirements, and so forth of the configurations are not taken into account. The modulation coding scheme (MCS) selection and CSI error are not accounted for either. Even so, they can be used to greatly simplify the complex configuration selection problem under practical conditions; they can help to show which schemes need or do not need to be considered in a particular scenario. Take, for example, the typical two BS-user pair downlink system with the users at the cell edge. In the partially loaded case, it is clear from this paper that Non-CoMP and CBF should be considered first. In the fully loaded case, it is even simpler: it is clear that JP-DL should be considered first. After such large reductions in scope as these, accounting for the various parameters (MCS, limited feedback, etc.) will thus be much more manageable to perform. Furthermore, one can use the guidelines to choose the CSI feedback and data sharing schemes, since different CoMP configurations require different levels of CSI feedback and data sharing. For example, in one of our papers, we demonstrate a practical scheme for decentralized CBF in TDD systems [39].

#### Appendices

#### A. Noise Plus Interference Covariance Matrix in Non-CoMP

Since for any deterministic matrix , the noise plus interference covariance matrix for the th *eq-*receiver in Non-CoMP can be expressed as
If each transmitter transmits with full power, the trace in (A.1) can be replaced by and the following expression is exact:
Note that even when there is receive spatial correlation (not considered in (1)), (A.2) still holds. When some transmitters do not transmit with full power, (A.2) is a “worst case” approximation and is still used for the design in this paper.

#### B. Alternative Approach to the MMSE Transceiver Design of Non-CoMP under the Per-Antenna Power Constraint

For Non-CoMP with one data stream, this appendix shows a different approach to the MMSE transceiver design problem subject to the per-antenna power constraint. Without loss of generality, consider the th *eq-*transmitter *eq-*receiver pair and let for all . Given the MMSE decoder (18), the reduced MMSE problem can be written as
or equivalently,
subject to (12). Here, is the th element of the nonnegative definite Hermitian matrix . Expressing in polar form,
the original problem is further reduced to
A closed-form solution can be easily obtained for solving (B.4) when . For , however, one generally needs to use some solvers for nonlinear equations.

Let = 2 and express . Then,
If , is maximized if and only if
for some integer . If , is maximized if and only if = = 1. It is remarkable that, in this case, optimality happens *only* when the equality in the per-antenna power constraint in (12) is met.

#### Acknowledgment

Note that different parts of the work have been published in our conference papers [40–47].

#### References

- “Further advancements for E-UTRA,” 3GPP TR36.814, 2009.
- F. Zheng, M. Wu, and H. Lu, “Coordinated multi-point transmission and reception for LTE-advanced,” in
*Proceedings of the 5th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM '09)*, September 2009. View at Publisher · View at Google Scholar · View at Scopus - S. Parkvall, E. Dahlman, A. Furuskär et al., “LTE-Advanced - Evolving LTE towards IMT-Advanced,” in
*Proceedings of the 68th Semi-Annual IEEE Vehicular Technology Conference (VTC '08)*, September 2008. View at Publisher · View at Google Scholar · View at Scopus - R. Irmer, H. Droste, P. Marsch et al., “Coordinated multipoint: concepts, performance, and field trial results,”
*IEEE Communications Magazine*, vol. 49, no. 2, pp. 102–111, 2011. View at Publisher · View at Google Scholar · View at Scopus - S. Catreux, P. F. Driessen, and L. J. Greenstein, “Simulation results for an interference-limited multiple-input multiple-output cellular system,”
*IEEE Communications Letters*, vol. 4, no. 11, pp. 334–336, 2000. View at Publisher · View at Google Scholar · View at Scopus - R. S. Blum, J. H. Winters, and N. R. Sollenberger, “On the capacity of cellular systems with MIMO,”
*IEEE Communications Letters*, vol. 6, no. 6, pp. 242–244, 2002. View at Publisher · View at Google Scholar · View at Scopus - H. Dai and H. V. Poor, “Asymptotic spectral efficiency of multicell MIMO systems with frequency-flat fading,”
*IEEE Transactions on Signal Processing*, vol. 51, no. 11, pp. 2976–2988, 2003. View at Publisher · View at Google Scholar · View at Scopus - M. Chiani, M. Z. Win, and H. Shin, “MIMO networks: the effects of interference,”
*IEEE Transactions on Information Theory*, vol. 56, no. 1, pp. 336–349, 2010. View at Publisher · View at Google Scholar · View at Scopus - V. R. Cadambe and S. A. Jafar, “Interference alignment and degrees of freedom of the K-user interference channel,”
*IEEE Transactions on Information Theory*, vol. 54, no. 8, pp. 3425–3441, 2008. View at Publisher · View at Google Scholar · View at Scopus - K. Gomadam, V. R. Cadambe, and S. A. Jafar, “Approaching the capacity of wireless networks Through distributed interference alignment,” in
*Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM '08)*, pp. 4260–4265, December 2008. View at Publisher · View at Google Scholar · View at Scopus - A. Ö. Zgür and D. Tse, “Achieving linear scaling with interference alignment,” in
*Proceedings of the IEEE International Symposium on Information Theory (ISIT '09)*, pp. 1754–1758, July 2009. View at Publisher · View at Google Scholar · View at Scopus - R. Tresch, M. Guillaud, and E. Riegler, “On the achievability of interference alignment in the K-user constant MIMO interference channel,” in
*Proceedings of the IEEE/SP 15th Workshop on Statistical Signal Processing (SSP '09)*, pp. 277–280, September 2009. View at Publisher · View at Google Scholar · View at Scopus - C. B. Chae, S. H. Kim, and R. W. Heath, “Linear network coordinated beamforming for cell-boundary users,” in
*Proceedings of the IEEE 10th Workshop on Signal Processing Advances in Wireless Communications (SPAWC '09)*, pp. 534–538, June 2009. View at Publisher · View at Google Scholar · View at Scopus - H. Dahrouj and W. Yu, “Coordinated beamforming for the multi-cell multi-antenna wireless system,” in
*Proceedings of the 42nd Annual Conference on Information Sciences and Systems (CISS '08)*, pp. 429–434, March 2008. View at Publisher · View at Google Scholar · View at Scopus - R. Zakhour, Z. K. M. Ho, and D. Gesbert, “Distributed beamforming coordination in multicell MIMO channels,” in
*Proceedings of the IEEE 69th Vehicular Technology Conference (VTC '09)*, April 2009. View at Publisher · View at Google Scholar · View at Scopus - H. Dai, A. F. Molisch, and H. V. Poor, “Downlink capacity of interference-limited MIMO systems with joint detection,”
*IEEE Transactions on Wireless Communications*, vol. 3, no. 2, pp. 442–453, 2004. View at Publisher · View at Google Scholar · View at Scopus - W. Qixing, J. Dajie, L. Guangyi, and Y. Zhigang, “Coordinated multiple points transmission for LTE-advanced systems,” in
*Proceedings of the 5th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM '09)*, September 2009. View at Publisher · View at Google Scholar · View at Scopus - S. Shi, M. Schubert, N. Vucic, and H. Boche, “MMSE optimization with per-base-station power constraints for network MIMO systems,” in
*Proceedings of the IEEE International Conference on Communications (ICC '08)*, pp. 4106–4110, May 2008. View at Publisher · View at Google Scholar · View at Scopus - J. Zhang, R. Chen, J. G. Andrews, and R. W. Heath, “Coordinated multi-cell MIMO systems with cellular block diagonalization,” in
*Proceedings of the 41st Asilomar Conference on Signals, Systems and Computers (ACSSC '07)*, pp. 1669–1673, November 2007. View at Publisher · View at Google Scholar · View at Scopus - J. Dajie, W. Qixing, L. Jianjun, L. Guangyi, and C. Chunfeng, “Uplink coordinated multi-point reception for LTE-advanced systems,” in
*Proceedings of the 5th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM '09)*, September 2009. View at Publisher · View at Google Scholar · View at Scopus - A. Tölli, M. Codreanu, and M. Juntti, “Linear multiuser MIMO transceiver optimization in cooperative networks,” in
*Proceedings of the 2nd International Conference on Communications and Networking in China (ChinaCom '07)*, pp. 513–517, August 2007. View at Publisher · View at Google Scholar · View at Scopus - A. Tölli, H. Pennanen, and P. Komulainen, “On the value of coherent and coordinated multi-cell transmission,” in
*Proceedings of the IEEE International Conference on Communications Workshops (ICC '09)*, June 2009. View at Publisher · View at Google Scholar · View at Scopus - A. Tölli, H. Pennanen, and P. Komulainen, “SINR balancing with coordinated multi-cell transmission,” in
*Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC '09)*, April 2009. View at Publisher · View at Google Scholar · View at Scopus - M. Boldi, A. Tölli, M. Olsson et al., “Coordinated MultiPoint (CoMP) Systems,” in
*Mobile and Wireless Communications for IMT-Advanced and Beyond*, A. Osseiran, J. F. Monserrat, and W. Mohr, Eds., pp. 121–155, John Wiley & Sons, Chichester, UK, 2011. View at Google Scholar - D. P. Palomar, J. M. Cioffi, and M. A. Lagunas, “Joint Tx-Rx beamforming design for multicarrier MIMO channels: a unified framework for convex optimization,”
*IEEE Transactions on Signal Processing*, vol. 51, no. 9, pp. 2381–2401, 2003. View at Publisher · View at Google Scholar · View at Scopus - A. Scaglione, P. Stoica, S. Barbarossa, G. B. Giannakis, and H. Sampath, “Optimal designs for space-time linear precoders and decoders,”
*IEEE Transactions on Signal Processing*, vol. 50, no. 5, pp. 1051–1064, 2002. View at Publisher · View at Google Scholar · View at Scopus - C.-C. Weng and P. P. Vaidyanathan, “MIMO transceiver optimization with linear constraints on transmitted signal covariance components,”
*IEEE Transactions on Signal Processing*, vol. 58, no. 1, pp. 458–462, 2010. View at Publisher · View at Google Scholar · View at Scopus - D. P. Palomar, “Unified framework for linear MIMO transceivers with shaping constraints,”
*IEEE Communications Letters*, vol. 8, no. 12, pp. 697–699, 2004. View at Publisher · View at Google Scholar · View at Scopus - S. Serbetli and A. Yener, “Transceiver optimization for multiuser MIMO systems,”
*IEEE Transactions on Signal Processing*, vol. 52, no. 1, pp. 214–226, 2004. View at Publisher · View at Google Scholar · View at Scopus - Z. Q. Luo, T. N. Davidson, G. B. Giannakis, and K. M. Wong, “Transceiver optimization for block-based multiple access through ISI channels,”
*IEEE Transactions on Signal Processing*, vol. 52, no. 4, pp. 1037–1052, 2004. View at Publisher · View at Google Scholar · View at Scopus - J. Zhang, Y. Wu, S. Zhou, and J. Wang, “Joint linear transmitter and receiver design for the downlink of multiuser MIMO systems,”
*IEEE Communications Letters*, vol. 9, no. 11, pp. 991–993, 2005. View at Publisher · View at Google Scholar · View at Scopus - M. Schubert, S. Shi, E. A. Jorswieck, and H. Boche, “Downlink sum-MSB transceiver optimization for linear multi-user MIMO systems,” in
*Proceedings of the 39th Asilomar Conference on Signals, Systems and Computers*, pp. 1424–1428, November 2005. View at Scopus - G. Zheng, T.-S. Ng, and K. K. Wong, “Optimal beamforming for sum-MSE minimization in MIMO downlink channels,” in
*Proceedings of the IEEE 63rd Vehicular Technology Conference (VTC '06)*, pp. 1830–1834, July 2006. View at Scopus - A. J. Tenenbaum and R. S. Adve, “Minimizing sum-MSE implies identical downlink and dual uplink power allocations,”
*IEEE Transactions on Communications*, vol. 59, no. 3, pp. 686–688, 2011. View at Publisher · View at Google Scholar · View at Scopus - S. W. Peters and R. W. Heath, “Cooperative algorithms for MIMO interference channels,”
*IEEE Transactions on Vehicular Technology*, vol. 60, no. 1, pp. 206–218, 2011. View at Publisher · View at Google Scholar · View at Scopus - J. F. Sturm, “Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones,”
*Optimization Methods and Software*, vol. 11, no. 1, pp. 625–653, 1999. View at Google Scholar · View at Scopus - J. Lofberg, “YALMIP: a toolbox for modeling and optimization in MATLAB,” in
*Proceedings of the IEEE International Symposium on Computer-Aided Control System Design (CACSD '04)*, Taipei, Taiwan, 2004. - C. M. Yetis, S. A. Jafar, and A. H. Kayran, “Feasibility conditions for interference alignment,”
*IEEE Transactions on Signal Processing*, vol. 58, no. 9, pp. 4771–4782, 2010. View at Google Scholar - E. Lu and I-T. Lu, “Practical decentralized high-performance coordinated beamforming,” in
*Proceedings of the 34th IEEE Sarnoff Symposium, (SARNOFF '11)*, May 2011. View at Publisher · View at Google Scholar · View at Scopus - I-T. Lu, “Joint MMSE precoder and decoder design for downlink multiuser MIMO systems with arbitrary transmit power constraints,” in
*Proceedings of the IEEE Sarnoff Symposium (SARNOFF '09)*, April 2009. View at Publisher · View at Google Scholar · View at Scopus - I-T. Lu, “Joint MMSE precoder and decoder design subject to arbitrary power constraints for uplink multiuser MIMO systems,” in
*Proceedings of the IEEE 70th Vehicular Technology Conference Fall (VTC '09)*, September 2009. View at Publisher · View at Google Scholar · View at Scopus - I-T. Lu, J. Li, and E. Lu, “Novel MMSE precoder and decoder designs subject to per-antenna power constraint for uplink multiuser MIMO systems,” in
*Proceedings of the 3rd International Conference on Signal Processing and Communication Systems (ICSPCS'09)*, September 2009. View at Publisher · View at Google Scholar · View at Scopus - J. Li, I-T. Lu, and E. Lu, “Optimum mmse transceiver designs for the downlink of multicell mimo systems,” in
*Proceedings of the IEEE Military Communications Conference (MILCOM '09)*, October 2009. View at Publisher · View at Google Scholar · View at Scopus - J. Li, I-T. Lu, and E. Lu, “Unified framework and MMSE transceiver designs for multiple-transmitter- multiple-receiver MIMO systems,” in
*Proceedings of the 33rd IEEE Sarnoff Symposium*, April 2010. View at Publisher · View at Google Scholar · View at Scopus - E. Lu, J. Li, and I-T. Lu, “Comparison of coordinated beamforming and non-coordinated multipoint using MMSE transceiver designs,” in
*Proceedings of the 33rd IEEE Sarnoff Symposium*, April 2010. View at Publisher · View at Google Scholar · View at Scopus - J. Li, I-T. Lu, and E. Lu, “Novel MMSE precoder and decoder designs for single-user MIMO systems under general power constraints,” in
*Proceedings of the IEEE 71st Vehicular Technology Conference (VTC '10)*, May 2010. View at Publisher · View at Google Scholar · View at Scopus - J. Li, E. Lu, and I-T. Lu, “Performance benchmark for network MIMO systems: a unified approach for mmse transceiver design and performance analysis,” in
*Proceedings of the 53rd IEEE Global Communications Conference (GLOBECOM '10)*, December 2010. View at Publisher · View at Google Scholar · View at Scopus