Abstract

Heterogeneous networks (HetNets) can increase network capacity through complementing the macro-base-station with low-power nodes, in response to the ongoing exponential growth in data traffic demand. While, unprecedented challenges exist in the planning, optimization, and maintenance in HetNets, especially activities such as cell outage detection and mitigation are labor-intensive and costly. One potential solution to address these issues is to introduce the extensively attracted self-organizing network (SON). This paper is mainly devoted to cell outage detection and compensation methods in two-tier HetNets where macrocell and picocells are coexisted. A -nearest neighbor (KNN) classification algorithm is employed to detect the cell outage automatically. Consider the breakdown picocell can reload its degraded service to the overlapped macrocell via vertical handover; only the breakdown macrocell executes the performance compensation. Power adjustment on each resource block is carried out via Lagrange optimizing algorithm to compensate the breakdown cell. Through intensive numerical experiments, with the help of our proposal, the outage cells can be successfully detected and performance gain for the outage macrocell can reach 91.4% with .

1. Introduction

The proliferation in traffic consumption, stimulated by a new generation of wireless devices, urges network operators to achieve dramatic capacity enhancement. To meet this overwhelming requirement cost-effectively, a paradigm shift in cellular network deployment is occurring towards heterogeneity, creating what is referred to as heterogeneous networks (HetNets) through increasing node density with low-power nodes (LPNs), such as pico-, femto-, and relay nodes [1]. These newly introduced LPNs usually provide small coverage with transmit power ranging from 250 mW to 2 W, making a distinct difference from high-power macronode which has a transmit power of about 40 W. In addition, they are expected to be densely deployed in a targeted manner [2]. Femtocells are intended for indoor use with a restricted association, and picocells are popular for hotspot coverage. The relay nodes can be deployed where wired backhaul is not available. Such dynamic network topologies and the intense interactions among heterogeneous LPNs and the attaching macrocell would impose great complexity for network operation and maintenance. Traditional troubleshooting on fault management is a manual process. The growing network sizes and increased complexity make it unrealistic for a human to analyze such a large amount of information. Hence, to minimize human involvement and maximize maintaining efficiency, intelligent network aiming at automating most of network procedures could be potentially agreeable [3].

Self-organizing network (SON) is one of the most promising paradigm proposed for future networks by the next generation mobile networks (NGMN) Alliance and the third generation partnership project (3GPP) in 2008 [3]. The principle objective for SON is to achieve operational and capital expenditure reduction substantially by means of self-configuring, self-optimizing, and self-healing functionalities [4], which can minimize human involvement and improve service quality with lower investment. Such advantages in SON make it attractive and beneficial to enable operators to streamline operational activities and ameliorate the overall maintaining efficiency into HetNets. Recently, considerable research works on self-configuration and self-optimization have been conducted. A set of use cases including automatic physical cell identifier assignment, mobility management, and energy saving have been addressed in HetNets [4]. While the study of self-healing technique in the context of HetNets still remains an open issue.

Self-healing technique is supposed to handle outage cells, where a cell outage refers to the total loss of radio services in its coverage area that resulted from hardware/software failures or other functional faults [5]. It can be divided into two parts: cell outage detection (COD) and cell outage compensation (COC). The former attempts to detect and locate potential faults, and the latter is responsible for alleviating performance degradation. Instead of employing highly experienced staff, the automated detection entity now depends on data mining or mathematical statistics [6, 7], which can transform raw large database into meaningful information and then identify possible faults. As for performance compensation, optimization theory is exploited on the basis of radio parameters like the antenna tilt and the cell transmit power in surrounding cells [8]. It is worth nothing that the capacity/coverage offered to the outage area should be retained as large as possible, while that in neighboring cells cannot be affected significantly at the same time [9].

To solve the self-healing problem, most, if not all, conventional research works on cell outage detection and cell outage compensation are conducted separately. Most proposed methods are designed for homogeneous networks where only macrocells are considered. Regardless of the communication overhead incurred by dense deployments of LPNs, some centralized statistics analysis, like [6], may be applicable for HetNets. But small coverage of LPNs leads to sparse user statistics available sometime, and the approach in [6] falls short since it considers only event-triggered measurements. In the worst case, the gathered measurements may be caused by momentary severe shadow fading. Therefore, self-healing schemes in homogeneous networks usually cannot be applied directly into HetNets. In such context, this paper is devoted to focusing on the design of self-healing mechanism for HetNets, enduring the aforementioned challenges. There is still a few literature resources devoted to this realm. To develop autonomous femtocell outage detection, cooperative outage detection architecture in femtomacro cellular networks is brought forward in [10], which designs a trigger decision through investigating intracell correlations of RSRP (reference signal received power) statistics in space domain and detects outage femtocells through extracting correlations of intercell RSRP statistics in both space and time domains. As for mitigation measures, [11] proposes a cooperative resource allocation algorithm based on subchannel and power resources via cooperation among femtocells.

In this paper, we consider a systematic self-healing scheme in a two-tier macropico network, which consists of detection and compensation stages. In the detection stage, inspired by [12], -nearest neighbor (KNN) is adopted to detect anomaly macrocell and picocells, when collected periodical measurements start to become similar to the previously known radio link failure (RLF) samples. -measurment is then employed for further assessment with respect to detection accuracy. In the compensation stage, only the outage macrocell is considered since users in the outage picocell can handover to an available macrocell. Firstly, the unoccupied spectrum resources, once belonging to the outage macrocell, are reallocated to users by compensation picocells that lie in the outage area. Power adjustment on each resource block (RB) is carried out via Lagrange optimizing algorithm. Meanwhile, compensation gains in the sense of average throughput per user and per cell are investigated.

The remainder of this paper is organized as follows. Section 2 presents the system model for achievable self-healing mechanism. Section 3 includes a detailed description for the outage detection and the corresponding performance compensation algorithms. Further, the simulation results to demonstrate the efficiency of our proposed methods are provided in Section 4. Finally, the conclusion of this paper is given in Section 5.

2. System Model

The self-healing is a functionality aiming to minimize the network performance deterioration, when failures occur in a network element (NE), through immediately autonomous cell outage detection and compensation actions. Firstly, performance parameters from NEs in both access and core networks are collected. The outage detection entity then exploits them to perform problem identification and localization during current monitoring period. When any type of fault described above is detected, the outage compensation entity is timely activated to execute feasible recovery procedures so as to restore the degraded service.

To evaluate the self-healing algorithms into a two-tier LTE-Advanced system, picocells are deployed to eliminate coverage holes in hot spots. The simulation environment is comprised of 19 regular hexagonal macrocells and 76 overlaid picocells. Each macrocell is filled with four picocells. The outer tier 12 macrocells and 48 picocells are merely used to generate interference without any user equipments (UEs) deployed. The 7 center macrocells and 28 picocells are the main cells of interest with UEs randomly distributed working together for healing purpose. The system scenario is shown in Figure 1, where macrocell0 and picocell23 are configured as faulty cells. Upon the detection of configured failures, the normal four picocells deployed at the outage area are to be treated as compensation cells to alleviate the degradation respecting coverage and quality.

2.1. Macropicocell Outage Detection Framework

Cell outage detection process in the two-tier macropico network has recently been studied in [12]. As shown in Figure 1, it consists of two main phases: model-learning phase and problem-detecting phase. To construct a robust learning model, reference simulation is implemented, during which RLF event is assumed to be triggered slightly faster than usual so that not only periodical measurements but also some RLF samples can be gathered. The model is created by labeling the training data as periodical and RLF-like categories. In the anomaly detection phase, hardware failure is simulated by transmit power outage. In Figure 1, macrocell0 and picocell23 turn into faulty cells by lowering the power at some time during normal operation. Consequently, periodical measurements and more RLF-triggered data are collected as testing data for further deep analysis. The data part is made up of four numerical features: serving and maximum neighbor RSRP, serving and maximum neighbor signal to interference plus noise ratio (SINR) [12]. Moreover, additional information including position and serving cell global identification (CGI) is obtained to demonstrate the detection performance.

2.2. Macropicocell Outage Compensation Framework

As mentioned above, compensation for outage picocell users is not considered in this paper. When an outage macrocell is detected, picocells overlaid in the macrocell, rather than in surrounding macrocells, will be triggered to act as compensation cells. They are mainly responsible for resources reallocation and power optimization activities, as illustrated in Figure 1. Due to the occurrence of numerous RLF events, the affected macrousers that have lost connections to the previous serving cell try to launch connection reestablishments with picocell19–picocell22. The chosen picocells then allocate spectrum resources to newly added users using RBs once a part of the faulty macrocell resources. The power for each RB is initialized by average allocation. In order to maximize the throughput for users, including previously served picousers and newly added users, each compensation picocell sequentially executes power optimization based on the specified compensation timeslot. Here, compensation timeslot is defined as a timeslot during which no data or control information of neighboring macrocells is transmitted. Therefore, partial interference cancelation and power optimization are achievable. To confirm the feasibility of this scheme, compensation gains measured on average throughput per user and per cell are displayed.

Further, some necessary assumptions are made to cater to simulation simplicity. Firstly, spectrum resources applied for macrocells and picocells are orthogonal, so that users among different type of cells will get no interference. Secondly, the cell outage is emerged by reducing the transmit power of a cell to a certain extent so as to cause performance degradation. Thirdly, during the interruption, UEs located in outage areas are also able to receive weak signals, which is critical for anomaly detection [12].

3. Algorithms Description

In this section, a detailed description about the algorithm in performance compensation is presented. The algorithm for outage detection is given briefly; more details can be got from [12]. In the detection stage, we first construct a training database and then process the testing database as a classification problem. Finally, evaluation criterion regarding classification accuracy is provided. Since the problem to maximize the throughput for users in compensation picocells is a constrained nonconvex problem, finding an optimal solution is NP hard. Thus, the compensation stage is proposed to involve two steps, namely, RBs reallocation and power per RB optimization.

3.1. Cell Outage Detection Stage

For outage detection, the algorithm to analyze collected measurements is often achieved via knowledge-based approaches. Clustering in [13] has been applied; while the number of clusters is usually hard to decide, any improved clustering algorithms may take a relatively longer time. So we consider the application of classification. Classification is an area of machine learning that takes raw data and classifies it as belonging to a particular class [14]. -nearest neighbor (KNN) is a supervised learning algorithm which involves two steps: training model construction and testing data labeling. Here, the training data collected in reference simulation are labeled as periodical and RLF-like, in which RLF-like samples are regarded as anomalies. Once the configured outages happen, the testing data are gathered and classified by examining the best possible match against the training data.

Assume the training data set is denoted as , where , a four-dimensional data vector expressed by , means the th collected training data. denotes the total number of the training data. The data set is labeled as , where represents the periodical class and is the RLF-like class. When there occur cell outages in the simulation scenario, the testing data will be collected and is defined as , where is the th collected testing data similar to , . Before classification, the training data and testing data should be normalized first to eliminate errors caused by nonuniform measurement units. To determine the label for each unknown testing data , according to KNN, a set of nearest neighbors from the training database is pivotal for accurate labeling. One method is achieved by calculating the Enclidean distance from the testing data to all points in training database. For testing data and training data , . For the first -element set , its corresponding label set is . Then, the label for testing data can be decided as follows: where the indicator function is equal to 0 if is false and 1 otherwise. is an adjustable integer parameter. Different may lead to different classification results.

In order to validate the KNN performance on cell outage detection efficiency, -measurement is learned on the basis of each serving cell, where performance statistics collected in a cell including macrocell and picocell is taken as a cluster. Based on the defined precision and recall [12], -measurement is determined in the following form:

For cluster and RLF-like label , and , where denotes the total number of data in cluster , denotes the number of RLF-like data in all clusters, and is the number of RLF-like data in cluster . Then -measurement is further expressed by

If in the problematic simulation is much larger than that in reference simulation, the cell is very likely at an outage status, since it does not fit well with the normal observation.

3.2. Performance Compensation Stage

Upon the detection of the anomaly macrocell and picocell, users in the picocell will smoothly handover to the overlapped macrocell; while users in the macrocell will make an attempt to re-establish a connection with compensation picocells, because they experience more severe peformance degradations. It is based on the principle that each user chooses the picocell that can offer the strongest signal power as its new serving cell. To provide satisfied serving quality, compensation picocells allocate unoccupied RBs through a priority system to the newly added users. Besides, these RBs are orthogonal to ones in picocells, as assumed in Section 2. The priority sequence is determined by a Manhattan distance from an outage user to its new serving picocell. Assume the distance set is denoted by , where refers to an outage user and means the total number of the users. The user whose is larger will get a better priority to choose a vacant RB. So, the distance values should be arranged in descending order for picocells to cope with RBs allocation. Once the RB is selected, it will be taken out of the remaining available RBs in case of interference.

The next step is to perform power adjustment. Each compensation picocell makes it separately. For a picocell, the power per RB is initialized by average allocation. Lagrange optimizing algorithm is then exploited to maximize the throughput for users in compensation picocells so as to provide the best possible compensation gains for outage users while, at the same time, not much affecting previously served picousers. However, the outage users are interfered more seriously by users in neighboring macrocells than ones in picocells, because the employed spectrum resources were once owned by the outage macrocell. According to the algorithm, severely interfered users tend to be allocated low transmit power, which does not conform with our expectation. The only solution to cope with such problem is to reduce intercell interference generated by neighboring macrocells. With the introduction of compensation timeslot, during which neighboring macrocells are in a sleep mode, outage users get smaller interference and thus receive more power. A parameter is defined as a proportion of compensation timeslot to the total transmission time. Take , for example, as shown in Figure 2. Assume the information transmission requires 3 timeslots, the first one is regarded as a compensation timeslot. In this context, the outage users can get interference reduced by about 1/3.

The intercell interference among macrocells can be dynamically alleviated through adjustment of the parameter . Hence, the power allocated to outage users would be improved based on the optimization algorithm. However, should be moderate, instead of the larger the better. A tradeoff in the sense of serving quality among outage users, previously served picocell users, and users in surrounding macrocells should be taken into account.

Assume the set of compensation picocells is denoted by CP. According to our simulation scenario, there are four picocells to be exploited for compensation. For picocell , the throughput of a user with RB occupied in its covered area can be written as where the SINR for previously served picousers is For outage users, it is is the allocated power for user who uses RB and is served by picocell . represents the path loss from user to its new serving cell . During the compensation timeslot, the SINR for outage users turns into SNR. That is to say, users will no longer suffer from severe interference caused by surrounding macrocells. The SNR is expressed as

Given the above equations, the objective to maximize the throughput for users in compensation picocells can be achieved by solving the following optimizing problem: where and denote a set of RBs offered to outage users and previously served picousers, respectively. indicates the set of outage users now served by picocell , and means the set of original picousers in .

To find an optimal solution, Lagrange optimization scheme is adopted. It is achieved by the well known Lagrange function [15]: where is a nonnegative Lagrange multiplier and is the number of users in compensation picocells. Also, the equation can be simplified by taking the derivative with respect to :

The unknown can be obtained by a bisection algorithm.

As a supplement, the interference of previously served picousers in mainly comes from other compensation picocells, which leads to the situation that the interference changes with RB’s power alteration. So it should be updated after each iteration.

4. Simulation Results

For the simulation, a system simulation tool is employed in compliance with 3GPP specification [16]. Based on the system model depicted in Section 2, macrocell0 and picocell23 are configured as faulty cells and picocell19–picocell22 are taken as compensation cells of the outage macrocell0. The detailed simulation parameters are listed in Table 1. The simulation begins at a proper operational state with shadow fading added. Normal periodical performance metrics and a small amount of RLF-triggered data are reported to construct a training model. At some point in simulation, transmit power of macrocell0 and picocell23 is set to decrease to 40dBm to simulate hardware failures. The performance in outage cells then experiences a dramatic breakdown: numerous RLF events happen, and most of periodical data collected in outage cells at this moment start to show indication of outage. Upon the discovery of the outage macrocell, picocell19–picocell22 will allocate unoccupied RBs and optimize the power per RB for their new serving users, so that the degraded performance gets restored.

4.1. Cell Outage Detection Results

The training database that defines the characteristics of two distinct categories is constructed from 136 RLF-triggered data points and 1260 periodical data points. KNN is adopted to undertake the testing data labeling task. As a comparison, not only testing data but also reference periodical data are labeled. In Figure 3(a), we can see that a few normal periodical data are labeled as RLF-like, which is caused by shadow fading. While, in Figure 3(b), two distinct clusters are shown and more testing periodical data perform similarly as RLF-triggered ones, since there exist transmit power outages.

After the implementation of KNN classifier, RLF-labeled data are utilized to make a relationship with the additional collected information, such as position information and serving cell global identification (CGI). We take CGI for validating of the classification results. It can be observed from Figure 4(a) that there indeed occur a few radio link failures in normal operational phase, especially in ID 19, 25, 37, and 45, with around 1.11% of all training samples detected as RLF-like. It should be pointed out that each cell including macrocell and picocell is numbered sequentially for brevity. Figure 4(b) depicts that about 1130 macrodata and 180 picodata are labeled as RLF-like, which is consistent with the preconfigured simulation configuration.

Afterwards, -measurement is applied to further verify the performance of KNN classifier. We can see from Figure 5 that, in reference simulation, -measurement values in ID 19, 25, 37, and 45 are relatively larger, but they are all less than 0.05, while, in the outage situation, -measurement reaches 0.91 in ID 0 and 0.24 in ID 23, which refer to the faulty macrocell0 and picocell23, respectively. Since the number of users in a macrocell is larger than that in a picocell, the limit of the -measurement for macrocell is bigger than that for the picocell. Then it can be concluded that macrocell0 and picocell23 are experiencing performance degradations.

4.2. Cell Outage Compensation Results

As described in Section 3.2, an adaptive parameter relating compensation timeslot is introduced to mitigate the interference caused by surrounding macrocells. As a result, outage users now served by picocells are able to receive more power through Lagrange optimizing. Figure 6 shows the average user throughput with respect to compensation timeslot for various interference reduction strategies. Two system states, preoutage and postcompensation, are considered. Further, it should be noted that the axis represents the proportion of compensation timeslot to the total transmission timeslots.

Figure 6(a) depicts the circumstances in the outage area. It can be seen that, before outage, the average throughput of users in previous serving picocells and the macrocell keeps a straight line, which indicates the introduction of compensation timeslot dose not work on the normal system state. Since the resources applied by macrocells and picocells are orthogonal, compensation timeslot proposed in the sense of neighbor macrocells actually has little impact on previously served picousers. Thus, the average user throughput for picocells keeps almost unchanged before outage and after compensation. For outage users, the performance gets compensated with satisfying average user throughput but is worse than that of previously served picousers, because the interference of outage users is still relatively larger compared with that of picousers. Also, the average outage user throughput is gradually improved as the proportion of compensation timeslot increases. As for the performance impact on surrounding macrocells, Figure 6(b) plots the average neighbor macrouser throughput regarding different values of compensation timeslot. Before outage, the throughput for macrocell1–macrocell6 has nothing to do with the proposed compensation mechanism. Once the outage occurs and at the time at which it is compensated, the average user throughput gets increased with the reduced proportion of compensation timeslot. When the proportion reaches 1/10, the throughput is approaching that before outage.

From Figure 6, it can be concluded that the average user throughput offered to outage users is improved at the cost of decreased serving qualities of surrounding macrousers. So a balance to obtain satisfying service for both outage users and neighbor macrousers should be made. Fortunately, the tradeoff can be achieved through adaptively adjusting the proportion of compensation timeslot. The performance gain can be presented in terms of average cell throughput for both the outage area and surrounding macrocells. Figure 7 displays such changes in the form of bars under three network states: preoutage, postoutage before compensation, and postcompensation. Moreover, two different compensated results are represented by right two bars when the proportion of compensation timeslot is 1/3 and 1/10, respectively.

Figure 7(a) shows the change of average cell throughput in the outage area. We can see that the degraded serving qualities for outage users get enhanced while they are worse than that before outage. That is readily understandable, since the affected performance cannot be entirely restored without provision of external resources. When the proportion of compensation timeslot is 1/10, the average cell throughput after compensation is 88.0% of the one before outage, while it reaches 91.4% with 1/3 compensation timeslot. It means that the larger the proportion of compensation timeslot is, the better compensated performance will be obtained for outage area. However, the impact on surrounding macrousers should also be taken into account. From Figure 7(b), it can be known that the average surrounding cell throughput gets increased during postoutage before compensation stage, for the reason that the faulty macrocell0 does not generate any interference. When the outage is compensated, the throughput is reduced to 72.9% with 1/3 compensation timeslot and 98.3% with 1/10 compensation timeslot, respectively. Consider the performance loss for neighbor macrocells; a tradeoff should be made with a favorable .

5. Conclusion

In this paper, we present a self-healing process in a macropico heterogeneous network through the employment of key performance indicators. Using collected periodical and RLF-triggered data, -nearest neighbor (KNN) classifier has been implemented successfully to detect the outage macrocell and picocell. Due to the fact that users in the anomaly picocell can restore its degraded service by smoothly handover, only the outage macrocell is considered regarding performance compensation. Four picocells located in the outage macrocell are used as compensation cells. They allocate RBs that once belong to the outage macrocell to affected users and employ Lagrange function to optimize the power per RB. To reduce the intercell interference, a new concept “compensation timeslot” is introduced. Finally, verification for KNN classifier on the basis of -measurement and that for compensation mechanism in terms of compensation gains are illustrated.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grant no. 61361166005), the State Major Science and Technology Special Projects (Grant no. 2013ZX03001001), Beijing Natural Science Foundation (Grant no. 4131003), and the Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP) (Grant no. 20120005140002).