Table of Contents Author Guidelines Submit a Manuscript
Mobile Information Systems
Volume 2017, Article ID 7969102, 11 pages
Research Article

Estimating Spectral Efficiency Curves from Connection Traces in a Live LTE Network

1Departamento de Ingeniería de Comunicaciones, Universidad de Málaga, Málaga, Spain
2Ericsson, Madrid, Spain

Correspondence should be addressed to Matías Toril;

Received 3 February 2017; Revised 1 May 2017; Accepted 18 May 2017; Published 19 June 2017

Academic Editor: Quansheng Guan

Copyright © 2017 Matías Toril et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


In cellular networks, spectral efficiency is a key parameter when designing network infrastructure. Despite the existence of theoretical model for this parameter, experience shows that real spectral efficiency is influenced by multiple factors that greatly vary in space and time and are difficult to characterize. In this paper, an automatic method for deriving the real spectral efficiency curves of a Long Term Evolution (LTE) system on a per-cell basis is proposed. The method is based on a trace processing tool that makes the most of the detailed network performance measurements collected by base stations. The method is conceived as a centralized scheme that can be integrated in commercial network planning tools. Method assessment is carried out with a large dataset of connection traces taken from a live LTE system. Results show that spectral efficiency curves largely differ from cell to cell.

1. Introduction

In the coming years, an exponential growth of cellular traffic is expected. Specifically, a 10-fold increase in mobile data traffic is forecast from 2015 to 2021 [1]. Meanwhile, the proliferation of smartphones and tablets has changed the most demanded services in cellular networks. These changes will continue with the massive deployment of machine-type communications in Internet-of-Things applications [2]. To cope with these changes, future mobile networks will have to combine multiple technologies. Thus, service and network heterogeneity has been identified as a critical issue in future 5G networks [3, 4].

In parallel, the increasing size and complexity of cellular networks is making it very difficult for operators to manage their networks. Thus, network management is one of the main bottlenecks for the successful deployment of mobile networks. To tackle this problem, industry fora and standardization bodies set up activities in the field of Self-Organizing Networks (SON) while defining 4G networks [5]. Self-organization refers to the capability of network elements to self-plan, self-configure, self-tune, and self-heal [6]. This need for self-organization has also been identified by vendors, which now offer automated network management solutions to reduce the workload of operational staff.

Legacy SON solutions are restricted to the replication of routine tasks that were done manually in the past. Currently, network planning and optimization is mostly based on performance counters and alarms in the network management system [79]. Thus, other data from network equipment and interfaces that could give very detailed information is discarded. Such a piece of information is only used in very rare cases for troubleshooting after a tedious analysis. However, with recent advances in information technologies, it is now possible to process all these data on a regular basis by means of Big Data Analytics (BDA) techniques [10]. In cellular networks, “big data” refers to configuration parameter settings, performance counters, alarms, events, charging data records, or trouble tickets [11].

While BDA have long attracted the attention of the computing research community, this field is relatively new in the telecommunication industry. In [3], the authors propose a generic framework for improving SON algorithms with big data techniques to meet the requirements of future 5G networks. With a more limited scope, a self-tuning method is proposed for adjusting antenna tilts in a Long Term Evolution (LTE) system on a cell basis based on call traces [12]. Likewise, a review of network data used for self-healing in cellular networks is presented in [13]. However, few works have used BDA for self-planning cellular networks.

In radio network planning, the key figure of merit to evaluate network (or channel) capacity is spectral efficiency (SE). A theoretical upper bound on the channel capacity of a single-input single-output wireless link is given by the Shannon capacity formula [14]. This formula can be adapted to approximate the maximum channel capacity under certain assumptions specific to each radio access technology [1519]. However, even if channel capacity is mainly determined by signal quality, it is also affected by the radio environment (user speed, propagation channel, etc.), the traffic properties (service type, burstiness, etc.), and the techniques in the different communication layers (multiantenna configuration, interference cancellation, channel coding, radio resource management, etc.). As considering all these factors is extremely difficult, most network planning tools rely on mapping curves relating signal quality to SE (a.k.a. SE curves), generated by link-level simulators [2022]. This approach is still limited, as simulators make simplifications for computational reasons, and there remains the problem of selecting the right combination of simulation parameters that closely match the reality.

In this work, a new automatic method for deriving the real SE mapping curves for the downlink of a LTE system on a cell-by-cell basis is proposed. The method is based on a trace processing tool that makes the most of detailed network performance measurements collected by base stations (specifically, signal strength, traffic, and resource utilization measurements). Method assessment is carried out with a large dataset of connection traces taken from a live LTE system. The main contributions of this work are (a) a data-driven methodology for deriving SE mapping curves from real network measurements, which can be integrated in commercial network planning tools, and (b) a set of SE curves obtained from connection traces collected in two live LTE systems.

The rest of the paper is organized as follows. Section 2 presents the classical approach to derive SE curves in radio network planning tools. Section 3 explains the trace collection process. Section 4 describes the new methodology to derive SE curves from user connection traces. Section 5 presents the results of the proposed method over a real trace dataset taken from the live network. Finally, Section 6 presents the main conclusions of the study.

2. Current Approach

In wireless technologies, SE is strongly affected by the link adaptation scheme. For clarity, a brief overview of the link adaptation process in LTE is first given. Then, the classical abstraction model of the link layer integrated in most network planning tools is explained.

2.1. Link Adaptation Process

Link Adaptation (LA) aims to ensure the most effective use of radio resources assigned to a user. In LTE, this is achieved by dynamically changing the Modulation and Coding Scheme (MCS) depending on radio link conditions. Figure 1 shows the structure of the classical LA scheme for the downlink of LTE [23]. LA is performed in the eNodeB based on the feedback from the UE. The UE estimates downlink channel quality based on the experienced Signal-to-Interference-plus-Noise Ratio (SINR), , which is reported to the eNodeB in the form of a Channel Quality Indicator (CQI). The reported CQI value is processed at the eNodeB (eNB) to build an estimate of the measured downlink SINR, . Such an estimate is corrected by an Outer Loop Link Adaptation (OLLA) mechanism to compensate for systematic errors, based on Hybrid Automatic Repeat reQuest (HARQ) positive and negative acknowledgments (ACKs/NACKs). Thus, a corrected SINR, , is obtained. Then, an Inner Loop Link Adaptation (ILLA) mechanism determines the MCS that the eNodeB should use from the corrected SINR, so that the UE is able to demodulate and decode the transmitted downlink data and not to exceed a certain Block Error Rate (BLER) threshold, usually set to 10%.

Figure 1: Link adaptation process in LTE.

Better radio link conditions translate into a higher reported CQI, thus allowing the eNB to select more effective MCSs (i.e., higher order modulations with more bits per symbol and less redundancy). Conversely, in poor radio link conditions, a lower CQI is reported, and more robust MCSs are selected (i.e., lower order modulations with less bits per symbol and more redundancy).

The actual SINR values triggering the use of different MCSs in ILLA are vendor-specific and depend on the network conditions assumed by the vendor (radio environment, antenna configuration, traffic properties, network features, etc.).

2.2. Link Abstraction Model

As a result of LA, SE (and link capacity) can be treated as a function of SINR. In most network planning tools, SINR is estimated on a per-location basis. Then, the maximum SE of a single-input single-output system (in bits/s/Hz) for infinite block length and infinite decoding complexity in an Additive White Gaussian Noise (AWGN) channel can be obtained by the Shannon capacity formula [14] aswhere SNR is the signal-to-noise ratio. For general multiple-input multiple-output systems with perfect transmitted knowledge, the Shannon capacity is [16]where and are the number of transmit and receive antennas, respectively, and is the SNR of the th spatial subchannel. In practice, real implementations are below the theoretical limit given by (2). Thus, the real SE of the limited set of MCS specified in the standard can be better approximated by the Truncated Shannon Bound (TSB) formula [24] suggested in [19]:where is the SINR of the link, is a lower limit on SINR below which SE is zero, is an upper limit on SINR associated with the SE of the highest implemented MCS (e.g., 64 QAM, rate 4/5, in this work), , is the system bandwidth efficiency that accounts for different overheads (pilots, cyclic prefix, control channels, etc.), and is a correction factor to reflect implementation losses. The values of and for different antenna configurations and packet scheduling schemes are presented in [24].

SE estimate in (3) is still an optimistic value of the link SE. Classical LA schemes based on adaptive thresholds (i.e., OLLA + ILLA) suffer from slow convergence with strongly biased CQI reporting [25, 26]. Such a slow convergence is a major issue in current LTE networks due to the prevalence of short connections [27]. Even if more realistic values of SE could be obtained from simulations, these cannot capture all possible factors, which greatly vary from cell to cell and dynamically change with time. As a result, SE and throughput measurements are much lower than expected in live networks [28].

Network planning is negatively affected by the overestimation of SE, as this parameter controls the expected demand of network resources. Thus, underestimating the average cell load during network coverage planning might lead to a too optimistic cell radius from unreal cell-edge performance. Likewise, underestimating cell load might give an inadequate amount of the traffic resources needed per cell during network capacity planning. All these problems can be solved by deriving a more realistic SINR-to-SE mapping from connection traces.

3. Connection Traces

Data for managing a radio access network includes(a) Configuration Management data (CM), consisting of current network parameter settings,(b) Performance Management data (PM), consisting of counters reflecting the number of times some event has happened per network element and Reporting Output Period (ROP),(c) Data Trace Files (DTFs), consisting of multiple records (known as events) with radio related measurements stored when some event occurs for a single User Equipment (UE) or a base station.DTFs can be further classified into User Equipment Traffic Recording (UETR) and Cell Traffic Recording (CTR) [29]. UETRs are used to single out a specific user, while CTRs are used to monitor cell performance by monitoring all (or a random subset of) anonymous connections [30]. The former are used for network troubleshooting, whereas the latter are used for network planning and optimization purposes.

Depending on the involved network entities, events can be classified in external or internal events. External events include signaling messages that eNBs exchange with other network elements (e.g., UE or eNB) through the Uu, X2, or S1 interfaces [3133]. Internal events include vendor-specific information about the performance of the eNB.

3.1. Trace Collection

Figure 2 depicts the reference architecture for trace collection in LTE [30]. CTR collection starts by the operator preparing a Configuration Trace File (CTF) in the Operation Support System (OSS), with (a) the event(s) to be monitored, (b) the cells and the ratio of calls for which traces are collected (i.e., UE fraction), (c) the ROP (typically, 15 minutes), (d) the maximum number of traces activated simultaneously in the OSS, and (e) the time period when trace collection is enabled. After enabling trace collection, UEs transfer their event records to their serving eNB. When ROP is finished, the eNB generates CTR files, which are then sent to the OSS asynchronously.

Figure 2: Architecture for trace reporting.
3.2. Trace Preprocessing

Trace files are binary files encoded in ASN.1 format [29]. The structure of events consists of a header and a message container including different attributes (referred to as event parameters). The header contains general attributes associated with the event description, such as the timestamp, the eNB, the UE, the message type, or the event length, while the message container includes specific attributes associated with the message type.

Trace decoding is performed by a parsing tool that extracts the information contained on fields. In most cases, the output is one file per event type, eNB, and ROP. Then, traces are synchronized by merging files from different eNBs by event type and ROP and ordering events by the timestamp attribute. Thus, it is possible to link simultaneous events of the same type from different eNBs (e.g., incoming and outgoing handover events).

4. Estimating Spectral Efficiency from Traces

A method for building a link-layer abstraction model for LTE downlink from network measurements is proposed here. The model relates SINR to SE based on signal strength, traffic, and radio resource measurements obtained from the live network. Such measurements are generated by the UE and the eNB, and later uploaded to the OSS in the form of connection traces. The inputs to the algorithm are CTR files with the following events:(a)UE Traffic Report: this internal (i.e., Ericsson specific) event includes the total carried traffic volume and the total amount of used Resource Elements (REs) per connection. A RE consists of 1 subcarrier during 1 OFDM symbol. In most vendors, this event is reported once at the end of each connection, so that there is a one-to-one mapping between traffic reports and connections.(b)RRC Measurement Report: this standard event includes Reference Signal Received Power (RSRP) measurements from cells detected by the UE. Each measurement record includes the pilot signal level from 1 serving cell and up to 8 neighbor cells [34]. It can be configured to be reported periodically or event-triggered. In the former case, each connection can comprise many records of this event. A measurement report is said to belong to a given connection if it is reported during such connection.

Figure 3 illustrates an example of how these events are distributed within a call. A call starts with a connection setup and ends with a connection release. While in a call, the UE may perform a handover between cells. The term “connection” refers to the time spent by a UE in a cell, until a handover is executed or the call is finished. Therefore, a call may contain more than one connection. A UE traffic event is reported at the end of each connection, while RRC measurements are generated periodically along a connection.

Figure 3: Example of events in a call.

Tables 1 and 2 present the most relevant parameters in the UE Traffic Report and RRC Measurement Report events. In the tables, subindex refers to the traffic report (i.e., connection), and subindex refers to the RRC Measurement Report. In Table 1, it is worth noting that only counts REs used for user data transmission in the Physical Shared Data CHannel (PDSCH) and thus excludes REs used for Cell Reference Signals (CRS) and other signaling information (e.g., Physical Common Control CHannel, PDCCH) [35].

Table 1: Parameters in UE Traffic Report event.
Table 2: Parameters in RRC Measurement Report event.

Figure 4 shows the flow diagram of the proposed algorithm. In stage 1, the time distribution of cell load is calculated per cell as the percentage of used REs during a fixed time period based on the information in UE Traffic Report events. In stage 2, the average SINR per connection is calculated as the ratio between the average received power from the serving cell and the sum of the interference power plus background noise (in linear units). To estimate interference levels, RSRP samples in RRC Measurement Report events are combined with cell load estimates computed in stage 1. In stage 3, the average SE per connection is calculated as the ratio between the total carried traffic volume and the amount of used REs based on the information in UE Traffic Report events. In stage 4, a fitting curve is built relating average SINR and average SE estimates from stages 2 and 3. All these operations are described in more detail in the following paragraphs.

Figure 4: Flow diagram of the trace processing algorithm.
4.1. Stage  1: Estimation of Cell Load Distribution over Time

In this work, cell load is defined as the ratio of REs occupied for transmission. In the network, cell load changes every Time Transmission Interval (TTI). As the number of REs used per connection is only available at the end of the connection, cell load cannot be calculated on a TTI basis. Alternatively, cell load is estimated with a lower resolution by defining a fixed time granularity of several TTIs. Then, the total number of REs used by a connection is evenly distributed across the equally spaced time intervals from the start to the end of the connection.

First, the average resource usage rate (in RE/s) in cell from the th connection (where is the serving cell of that connection) is computed aswhere is the total amount of resources used by the th connection (in REs), and and are the start and end times of the th connection (in s), respectively, as illustrated in Figure 5.

Figure 5: Temporal distribution of resources in a connection.

By assuming equally spaced time intervals, the total amount of REs used by connection in time interval , , is calculated aswhere is the resource usage rate for the th connection, and are the start and end points of the th connection (in s), and is the sampling period defining the time resolution (in s).

Finally, the sampled average load distribution of cell , , is calculated as the ratio between the sum of resources used by connections and the total amount of available resources in that cell in the th period, aswhere is the total number of available REs per time slot, is the number of subcarriers per Physical Resource Block (PRB), set to 12, is the number of PRBs in the cell, given by the system bandwidth, is the number of OFDM symbols per slot (6 or 7 for normal or extended cyclic prefix, resp.), is the slot duration (0.5 ms), and is the time interval duration (i.e., the sampling period). Also, is a correcting factor that represents the traced connection ratio configured by the operator. If all connections are traced in the network (i.e., UE fraction is 100%), then, .

4.2. Stage  2: Estimation of Average SINR per Connection

The SINR is defined as the ratio between the received power from the serving cell and the sum of the interference power (i.e., received power from adjacent cells) plus background noise (in linear units). In LTE, different REs transmit different signals, causing the fact that not all REs in a resource block experience the same SINR. 3GPP specifications do not standardize how SINR is measured, so the actual definition is vendor-specific. It can be measured in data or in reference signals REs. However, SINR is generally calculated on the REs carrying reference signals [36]. In our case, the average SINR (in natural units) for the th measurement report can be estimated aswhere is the RSRP (in mW) of the serving cell in the th measurement report, is the RSRP (in mW) of the th neighbor cell in the th measurement report, is the number of cells in the th measurement report, is the average load of the th interfering cell in report at the time interval when the th measurement report was sent, and is the background noise (in mW).

As previously stated, the UE may send more than one RRC Measurement Report per connection. Therefore, it is necessary to obtain an average SINR per connection. The average SINR for the th connection is obtained aswhere is the connection to which the th measurement report belongs and is the number of measurement reports in the th connection.

4.3. Stage 3: Estimation of Average SE per Connection

SE is defined as the data rate that can be transmitted over a given bandwidth in a communication system. Based on the UE Traffic Report, the average SE (in bps/Hz) in REs assigned to the th connection can be estimated aswhere is the traffic volume in the th connection (in bytes), is the total amount of resources used in th connection (in REs), is the subcarrier bandwidth (15 kHz in LTE), is the number of OFDM symbols per slot (6 or 7 for normal or extended cyclic prefix, resp.), and is the slot duration (0.5 ms).

Note that (9) is restricted to data REs. Thus, it considers the loss of SE due to cyclic prefix, but does not take into account other factors such as (a) the limited BW occupancy to satisfy the Adjacent Channel Leakage Ratio (ACLR), (b) the pilot overhead due to CRSs, and (c) the dedicated and common control channel overhead. All these factors can be added later if needed for planning purposes, based on the values suggested in [19].

4.4. Stage  4: Construction of Link-Level Mapping Curves

The SINR-to-SE curve is computed by regression analysis of the scatter plot built with the average SINR, , and average SE, , estimated on a per-connection basis. Depending on the aggregation level, the output of the regression analysis is a single mapping curve for the whole network or a set of curves constructed on a per-cell basis.

In principle, any regression method could be applied as long as it provides good fitting. Previous studies suggest a logarithmic fitting, based on the expression of the Shannon bound [19], or an arctangent-based approach, based on empirical results [37]. In this work, a simple polynomial regression from logarithmic SINR values is used for simplicity and flexibility, as it is included in most statistical analysis packages and does not presume any shape of the mapping function.

Several factors may add dispersion to the SINR-to-SE estimates, causing two connections with the same average SINR to have different average SE. A first reason is instantaneous SINR fluctuations due to fading, multipath, and other propagation phenomena, which is not reflected in SINR averages. A second reason is the limited time resolution of RRC measurements, which may cause the average SINR estimate to not reflect the true average SINR of the connection. Another reason is the service type, as the LA scheme requires certain time to converge, which might not be satisfied in short connections. All these factors degrade regression performance.

To increase the robustness of regression, several actions are taken. To improve the accuracy of SINR measurements per connection, regression analysis is carried out over connections with more than 1 RRC Measurement Report. Likewise, piecewise regression is used to avoid the fact that the most populated SINR values dominate the regression equation. Thus, SINR measurements are divided into bins of 1 dB, centered at integer SINR values (i.e.). Then, a single SE value is computed per bin by averaging the SE of all connections in the bin. Bins with less than 50 samples (connections) are discarded for the regression analysis.

It should be pointed out that the output of the method is a curve relating the average SINR and SE of a connection. This is the information needed by a network planning tool, where SINR and SE are calculated per location in the form of averages. Thus, the resulting curve might differ from the curves used in system-level simulators, where the link-layer model considers instantaneous SINR and SE values.

5. Results

The proposed method is tested with trace datasets taken from a live LTE network. For clarity, the analysis methodology is first described and results are presented later. Finally, implementation issues are discussed.

5.1. Analysis Setup

Two trace datasets are used in the analysis, taken from different networks (referred to as Network 1 and Network 2). Table 3 describes their main parameters. The bulk of the analysis is carried out on Network 1, and Network 2 is only used to check the impact of the network configuration and service mix. Even if traces include both downlink and uplink measurements, the analysis presented here is restricted to the downlink.

Table 3: Trace datasets.

The proposed trace-based approach to derive the SINR-SE curves is compared with a theoretical bound in the absence of a dynamic system-level simulator that captures the diversity of services, radio environments, and features in the real network. Specifically, the following approaches are evaluated:(a)TSB-MIMO: modified truncated Shannon bound adjusted for best fit to link-level simulation curves for 2 × 2 multiple-input multiple-output antenna configuration with Alamouti Space Time Coding under Typical Urban (TU) channel at 3 km/h and Proportional Fair-Time Dependent Packet Scheduling (PF-TDPS) [38]; it corresponds to transmission mode 3 (open-loop spatial multiplexing) with Rank 2; specifically, , , , , and bps/Hz;(b)TB-N: the proposed trace-based approach applied to the complete set of traces, resulting in a single mapping curve valid for the whole network;(c)TB-C: the proposed trace-based approach applied to the traces of a single cell, resulting in a mapping curve per cell.

Method assessment is carried out by comparing the shape of the SINR-SE curves. For a fair comparison, the SE of all methods is restricted to data REs. Thus, the bandwidth efficiency parameter in TSB-MIMO only considers the loss due to cyclic prefix; that is, for long prefix. Hence, the maximum achievable SE in antenna configurations with 1 spatial stream, , corresponding to the highest MCS (i.e., 64 QAM, rate 4/5), is 6 (ms·subc.).

5.2. Results

Figures 6(a) and 6(b) illustrate how the trace-based approaches work. Figure 6(a) shows the original SINR-SE scatter plot together with a simple polynomial regression. In the figure, each point is a connection. It is observed that connections with the same SINR have very different SE, which is the reason for the low , . Figure 6(b) shows the simplified scatter plot obtained by discretizing SINR values and computing a piecewise regression of order 0 (denoted as piecewise regression). To aid comparison, the curves obtained by polynomial regression on the original and simplified data are also superimposed (denoted as original and piecewise, resp.) and the -axis is restricted to the range of  dB. From the figure, it is clear that the regression curve derived from the points computed by piecewise regression better captures the average SE trend. This is confirmed by the large value of .

Figure 6: Influence of SINR discretization in trace-based approach.

To show the benefit of using real traces, Figure 7 compares the TSB-MIMO (theoretical) and TB-N (practical) approaches. For TB-N, 95% confidence intervals for the average SE in each SINR band are included. Note that both methods result in a single curve for the whole network. It is observed that SE values in traces are consistently below the maximum theoretical values suggested by TSB-MIMO. This gives clear evidence of the need for computing SINR-SE curves from real connection traces.

Figure 7: Comparison between theoretical and real spectral efficiency curves.

The reasons for such differences are the link adaptation process and the transport protocol. In [28], it was shown that connection length has strong impact on user throughput. Short connections, prevailing in current LTE networks, suffer from reduced user throughput. This is due to the slow Outer Loop Link Adaptation Process (OLLA) convergence and the slow-start feature of Transport Control Protocol (TCP), causing throughput to ramp up. Figure 8 confirms this observation by showing the SE curve obtained by TB-N for short and long connections. In this work, a connection with less than 20 ACK + NACK is classified as a short connection. Conversely, a connection with more than 100 ACK + NACK is classified as a long connection. In the figure, it is observed that the maximum SE for long connections is more than three times larger than for short connections (1.45 versus 0.45 bps/Hz). This is mainly due to OLLA convergence issues, as traffic burstiness caused by TCP ramp-up should not affect the selected MCS. By comparing Figures 7 and 8, it can be deduced that, even for long connections, the theoretical curve is a loose upper bound for the average SE with good radio link conditions.

Figure 8: Impact of connection length on spectral efficiency.

To show the benefit of computing a curve per cell, Figure 9 compares the output of the trace-based approach executed on a cell basis, TB-C, for two cells in the system. The network-wide curve obtained by TB-N is also included as a reference. It is observed that SE values may differ from cell to cell up to a 150% for the same SINR value. A closer analysis (not presented here) shows that this is due to the fact that the ratio of long connections in Cell A is 41% and it is only 20% in Cell B. Recall that connection length has a strong impact on user throughput due to OLLA convergence issues and TCP slow-start feature. Thus, the connection length distribution in a cell strongly influences the spectral efficiency curve measured for that cell. The observed differences justify the need for deriving SE curves on a cell basis.

Figure 9: Impact of selected cell on spectral efficiency.

Finally, Figure 10 compares the results of the trace-based method in the two datasets from different networks. For brevity, the analysis is restricted to the network-wide solution for long connections. In the figure, even if trends are similar, Network 1 has a SE lower than Network 2 for the same SINR. This might be due to the different service mix in both networks. To back up this statement, a deeper analysis of radio network measurements is done. On the one hand, traces show that 50% of long connections in Network 1 have less than 300 ACKs + NACKs, compared to only 10% in Network 2. Thus, the probability that OLLA has reached steady state before the end of a connection is higher in Network 2. On the other hand, network counters show that the percentage of active TTIs where the user buffer is emptied (i.e., last TTI transmissions [34]) is 41% for Network 1 and only 25% for Network 2. In last TTI transmissions, some REs in the PRBs assigned to the user might not carry data because there is not enough data, decreasing the link SE. Thus, the number of underutilized resources for this reason should be larger in Network 1. Both effects indicate that traffic in Network 1 is more bursty than in Network 2. These differences justify the need for deriving a specific SE curve for each network.

Figure 10: Impact of selected network on spectral efficiency.
5.3. Implementation Issues

The method is designed as a centralized scheme that can be integrated in a commercial radio network planning tool. Its low computational load makes it a perfect candidate for improving measurement-based replanning algorithms. The worst-case time complexity is linear in the product of the number of cells and trace collection periods. In practice, the most time consuming process is parsing and synchronizing the traces, which can be done with trace processing tools provided by OSS vendors. The rest of the method can be developed in any programming language (in this work, R [39]). Specifically, the total execution time for Network 1 dataset in a 2.6-GHz quad-core processor laptop is less than 780 s (3 s per 1000 connections).

6. Conclusions

Link spectral efficiency is a key parameter when designing and optimizing cellular networks. Unfortunately, such a parameter is difficult to estimate, as it depends on multiple factors that cannot be monitored and greatly vary from cell to cell. In this work, a data-driven methodology for deriving the SINR-to-spectral efficiency mapping curves for LTE downlink on a cell basis based on connection traces has been proposed. The method relies on the activation of standard periodic RSRP measurements and the provision of user traffic reporting events by the base station vendor. As these requirements are common, the method can easily be adapted to other radio access technologies, even if it is initially conceived for LTE. The method has been tested with a large dataset of connection traces taken from a live LTE system. Results have shown that the current approach of deriving the spectral efficiency curves, based on Truncated Shannon Bound formula, is too optimistic. Differences with real traces are more significant in large SINR values, for which a fourfold reduction in spectral efficiency has been observed. Likewise, active connection length has been shown to have a strong impact on spectral efficiency due to the OLLA convergence process. In particular, it has been observed that spectral efficiency can be up to three times larger for long connections than for short ones for the same average SINR. Likewise, it has been checked that average connection length largely differs between cells and network operators, which is one of the reasons for the differences in spectral efficiency curves at cell and network level.

The proposed method can be used to build link-layer mapping curves from traces on a cell or a network basis. It is expected that mapping curves derived from connections with similar radio and traffic conditions have similar link-level performance and therefore provide more accurate results. Thus, it seems reasonable to define clusters of cells with the same properties to segregate data based on cell type, depending on multiantenna configuration, user mobility, terrain, or service mix. Nonetheless, defining too many cell groups might lead to regression with insufficient data, which would not be reliable either, so a trade-off must be considered.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work has been funded by the Spanish Ministry of Economy and Competitiveness (TEC2015-69982-R) and Ericsson Spain.


  1. Ericsson, “Ericsson Mobility Report,” Tech. Rep., Jun 2016, View at Google Scholar
  2. A. Gupta and R. K. Jha, “A survey of 5G network: architecture and emerging technologies,” IEEE Access, vol. 3, pp. 1206–1232, 2015. View at Publisher · View at Google Scholar
  3. A. Imran and A. Zoha, “Challenges in 5G: how to empower SON with big data for enabling 5G,” IEEE Network, vol. 28, no. 6, pp. 27–33, 2014. View at Publisher · View at Google Scholar · View at Scopus
  4. E. Hossain and M. Hasan, “5G cellular: key enabling technologies and research challenges,” IEEE Instrumentation & Measurement Magazine, vol. 18, no. 3, pp. 11–21, 2015. View at Publisher · View at Google Scholar
  5. NGMN, “Radio access performance evaluation methodology,” Version 1.0, Jan 2008, View at Google Scholar
  6. J. Ramiro and K. Hamied, “Self-organizing networks: self-planning, self-optimization and self-healing for GSM, UMTS and LTE,” John Wiley and Sons, pp. 1–292, 2011, New York, NY, USA. View at Publisher · View at Google Scholar · View at Scopus
  7. A. R. Mishra, “Advanced cellular network planning and optimisation: 2G/2.5G/3G. Evolution to 4G,” John Wiley and Sons, 2007, New York, NY, USA. View at Google Scholar
  8. M. J. Nawrocki, M. Dohler, and A. Hamid Aghvami, “Understanding UMTS radio network,” Modelling, Planning and Automated Optimisation, 2006. View at Publisher · View at Google Scholar · View at Scopus
  9. L. Song and J. Shen, “Evolved cellular network planning and optimization for UMTS and LTE,” CRC Press, 2011, NW, USA. View at Publisher · View at Google Scholar
  10. I. H. Witten, E. Frank, and M. Hall, “Data Mining: Practical Machine Learning Tools and Techniques,” 2011.
  11. N. Baldo, L. Giupponi, and J. Mangues-Bafalluy, “Big data empowered self organized networks,” in Proceedings of the 20th European Wireless Conference (EW '14), pp. 1–8, May 2014. View at Scopus
  12. V. Buenestado, M. Toril, S. Luna-Ramirez, J. M. Ruiz-Aviles, and A. Mendo, “Self-tuning of remote electrical tilts based on call traces for coverage and capacity optimization in LTE,” IEEE Transactions on Vehicular Technology, 2016. View at Publisher · View at Google Scholar
  13. E. J. Khatib, R. Barco, P. Munoz, I. D. La Bandera, and I. Serrano, “Self-healing in mobile networks with big data,” IEEE Communications Magazine, vol. 54, no. 1, pp. 114–120, 2016. View at Publisher · View at Google Scholar
  14. C. E. Shannon, Claude Elwood Shannon, IEEE Press, New York, NY, USA, 1993. View at MathSciNet
  15. S. Verdú and S. Shamai, “Spectral efficiency of CDMA with random spreading,” IEEE Trans. Information Theory, vol. 45, pp. 622–640, 1999. View at Publisher · View at Google Scholar · View at MathSciNet
  16. E. Telatar, “Capacity of multi-antenna Gaussian channels,” European Transactions on Telecommunications, vol. 10, no. 6, pp. 585–595, 1999. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  17. S. Verdu, “Spectral efficiency in the wideband regime,” Institute of Electrical and Electronics Engineers. Transactions on Information Theory, vol. 48, no. 6, pp. 1319–1343, 2002. View at Publisher · View at Google Scholar · View at MathSciNet
  18. S. Shamai and S. Verdu, “The impact of frequency-flat fading on the spectral efficiency of CDMA,” Institute of Electrical and Electronics Engineers. Transactions on Information Theory, vol. 47, no. 4, pp. 1302–1327, 2001. View at Publisher · View at Google Scholar · View at MathSciNet
  19. P. Mogensen, W. Na, I. Z. Kováes et al., “LTE capacity compared to the shannon bound,” in Proceedings of the IEEE 65th Vehicular Technology Conference (VTC-Spring '07), pp. 1234–1238, Dublin, Ireland, April 2007. View at Publisher · View at Google Scholar · View at Scopus
  20. S. Schwarz, C. Mehlführer, and M. Rupp, “Calculation of the spatial preprocessing and link adaption feedback for 3GPP UMTS/LTE,” in Proceedings of the 2010 6th Conference on Wireless Advanced, WiAD 2010, gbr, June 2010. View at Publisher · View at Google Scholar · View at Scopus
  21. C. Mehlführer, M. Wrulich, J. C. Ikuno, D. Bosanska, and M. Rupp, “Simulating the long term evolution physical layer,” in Proceedings of the 17th European Signal Processing Conference, EUSIPCO 2009, pp. 1471–1478, gbr, August 2009. View at Scopus
  22. G. Gómez, D. Morales-Jiménez, J. J. Sánchez-Sánchez, and J. T. Entrambasaguas, “A next generation wireless simulator based on MIMO-OFDM: LTE case study,” Eurasip Journal on Wireless Communications and Networking, vol. 2010, Article ID 161642, 2010. View at Publisher · View at Google Scholar · View at Scopus
  23. K. I. Pedersen, G. Monghal, I. Z. Kovács et al., “Frequency domain scheduling for OFDMA with limited and noisy channel feedback,” in Proceedings of the 2007 IEEE 66th Vehicular Technology Conference, VTC 2007-Fall, pp. 1792–1796, usa, October 2007. View at Publisher · View at Google Scholar · View at Scopus
  24. 3rd Generation Parthnership Project, “TS 36.942 V. 9.0.1; Evolved Universal Terrestrial Radio Access (E-UTRA) Radio Frequency (RF) system scenarios (Release 9),” Tech. Rep., 2009. View at Google Scholar
  25. H.-J. Su, “On adaptive threshold adjustment with error rate constraints for adaptive modulation and coding systems with hybrid ARQ,” in Proceedings of the 2005 Fifth International Conference on Information, Communications and Signal Processing, pp. 786–790, Bangkok, Thailand, December 2005. View at Scopus
  26. K. Aho, O. Alanen, and J. Kaikkonen, “CQI reporting imperfections and their consequences in LTE networks,” in Proceedings of the 10th Int. Conference Networks, pp. 241–245, Taipei, Taiwan, 2011.
  27. A. Duran, M. Toril, F. Ruiz, and A. Mendo, “Self-Optimization Algorithm for Outer Loop Link Adaptation in LTE,” IEEE Communications Letters, vol. 19, no. 11, pp. 2005–2008, 2015. View at Publisher · View at Google Scholar · View at Scopus
  28. V. Buenestado, J. M. Ruiz-Aviles, M. Toril, S. Luna-Ramirez, and A. Mendo, “Analysis of throughput performance statistics for benchmarking LTE networks,” IEEE Communications Letters, vol. 18, no. 9, pp. 1607–1610, 2014. View at Publisher · View at Google Scholar · View at Scopus
  29. V. Niemi and K. Nyberg, Universal Mobile Telecommunications System Security, John Wiley & Sons, Ltd, Chichester, UK, 2003. View at Publisher · View at Google Scholar
  30. 3rd Generation Parthnership Project, “TS 32.421; Telecommunication management; Subscriber and equipment trace; Trace concepts and requirements (Release 6),” Tech. Rep., 2012. View at Google Scholar
  31. 3rd Generation Parthnership Project, “TS 25.331; Technical Specification Group Radio Access Network; Radio Resource Control (RRC); Protocol specification v11.4.0 (Release 11),” Tech. Rep., 2012. View at Google Scholar
  32. 3rd Generation Parthnership Project, “TS 36. 413; Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access Network (E-UTRAN); S1 Application Protocol (S1AP); v8.4.0; (Release 8),” Tech. Rep., 2008–2012. View at Google Scholar
  33. 3rd Generation Parthnership Project, “TS 36. 423; Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access Network (E-UTRAN); X2 application protocol (X2AP); v9.2.0; (Release 9),” Tech. Rep., 2010–2013. View at Google Scholar
  34. 3rd Generation Parthnership Project, “TS 36.331; LTE; Evolved Universal Terrestrial Radio Access (E-UTRA); Radio Resource Control (RRC); Protocol specification; v. 10.7.0; (Release 10),” Tech. Rep., 2012. View at Google Scholar
  35. S. Sesia, I. Toufik, and M. Baker, “LTE: the UMTS long term evolution,” Wiley Online Library, 2009. View at Publisher · View at Google Scholar · View at Scopus
  36. A. Engels, M. Reyer, X. Xu, R. Mathar, J. Zhang, and H. Zhuang, “Autonomous self-optimization of coverage and capacity in LTE cellular networks,” IEEE Transactions on Vehicular Technology, vol. 62, no. 5, pp. 1989–2004, 2013. View at Publisher · View at Google Scholar · View at Scopus
  37. W. Guo, S. Wang, and X. Chu, “Capacity expression and power allocation for arbitrary modulation and coding rates,” in Proceedings of the 2013 IEEE Wireless Communications and Networking Conference, (WCNC '13), pp. 3294–3299, China, April 2013. View at Publisher · View at Google Scholar · View at Scopus
  38. N. Wei, A. Pokhariyal, C. Rom et al., “Baseline E-UTRA downlink spectral efficiency evaluation,” in Proceedings of the 2006 IEEE 64th Vehicular Technology Conference, VTC-2006 Fall, pp. 2131–2135, Canada, September 2006. View at Publisher · View at Google Scholar · View at Scopus
  39. The r project for statistical computing, available: