Abstract

We seek to provide practical lower bounds on the prediction accuracy of path loss models. We describe and implement 30 propagation models of varying popularity that have been proposed over the last 70 years. Our analysis is performed using a large corpus of measurements collected on production networks operating in the 2.4 GHz ISM, 5.8 GHz UNII, and 900 MHz ISM bands in a diverse set of rural and urban environments. We find that the landscape of path loss models is precarious: typical best-case performance accuracy of these models is on the order of 12–15 dB root mean square error (RMSE) and in practice it can be much worse. Models that can be tuned with measurements and explicit data fitting approaches enable a reduction in RMSE to 8-9 dB. These bounds on modeling error appear to be relatively constant, even in differing environments and at differing frequencies. Based on our findings, we recommend the use of a few well-accepted and well-performing standard models in scenarios where a priori predictions are needed and argue for the use of well-validated, measurement-driven methods whenever possible.

1. Introduction

Predicting the attenuation of a radio signal between two points in a realistic environment has entertained scientists and experimenters for more than 70 years. This is for good reason: accurate predictions of path loss and propagation have many important applications in the design, rollout, and maintenance of all types of wireless networks. As a result, there have been no shortage of models proposed in the literature that claim to predict path loss within some set of constraints. Yet, despite the large quantity of work done on modeling path loss, there is an important shortcoming that this paper begins to address: there have been relatively few comparative evaluations of path loss prediction models using a sufficiently representative data set as a basis for evaluation. Those studies that do exist make comparisons between a small number of similar models. And, where there has been substantial work of serious rigor done, for instance in the VHF bands where solid work in the 1960s produced well-validated results for analog television (TV) propagation, it is not clear how well these models work for predicting propagation in different types of systems operating at different frequencies. The result is that wireless researchers are left without proper guidance in picking among dozens of propagation models. Further, among the available models it is not clear which is best or what the penalty is of using a model outside of its intended coverage. In [1], for instance, Camp et al. show that a wireless mesh network planned with a given path loss model can be massively under- or overprovisioned as a result of small changes to model parameters.

In this paper, we implement and analyze 30 propagation models spanning 65 years of publications using five novel metrics to gauge performance. Although many of these models are quite different from one another, they all make use of the same basic variables on which to base their predictions: position (including height and orientation) of the transmitter and receiver, carrier frequency, and digital elevation model and land cover classification along the main line-of-sight (LOS) transmit path. These models are a mix of approaches: empirical (purely), analytical, stochastic or some combination thereof. In addition, we make use of explicit measurement-based approaches to put a lower bound on the accuracy of direct fitting methods. The present study does not include ray-tracing models (e.g., [2]) or partition-based models (e.g., [3]) that require substantial knowledge of the environment, which is seldom available at all and rarely at the precision required to make useful predictions. We are also not considering active-measurement models (e.g., [4]), which make use of directed in situ measurements to correct their predictions.

The focus in this paper is the efficacy of these models at predicting median path loss values in environments with representative terrain and a large range of equipment and link lengths. We focus our analysis on a type of networks of particular interest: point-to-point and infrastructure data networks operating in the 2.4 GHz ISM and 5.8 GHz UNII bands using the widely adopted 802.11x family of protocols. Many authors have considered the problem of predicting outdoor path loss in uncluttered environments to be solved. We will see this is far from true—making accurate a priori predictions about path loss, without in situ measurements, with the models available, is a very difficult task even in “simple” environments.

In the end, the results show that no single model is able to predict path loss consistently well. Even for the seemingly simple case of long links between well-positioned antennas in a rural environment, the available models are unable to predict path loss at an accuracy that is usable for any more than crude estimates. Indeed, no model is able to achieve a Root Mean Square Error (RMSE) of less than 14 dB in rural environments and 8-9 dB in urban environments—a performance that is only achieved after substantial hand tuning. Explicit data-fitting approaches do not perform better, producing 8-9 dB RMSE as well. This conclusion motivates further work on more rigorous measurement-based approaches and the use of well-validated industry-standard models when it is not possible to use measurements.

2. Path Loss Models

Table 1 provides details of the models evaluated in this study. In the following subsections we will briefly discuss each major category of model within our proposed taxonomy and list notable examples. Due to space constraints, we are unable to discuss each model that we implement and instead focus on describing the most prevalent themes: Theoretical Models, Basic Models, Terrain Models, Supplementary Models, and Advanced Models. The models implemented here are described in more detail in [33].

At a high level, a model task is to predict the value of when computing power observed at the receiver is the transmitted power and the total path loss () is the sum of the free-space path loss (), the loss due to shadowing/slow-fading (, i.e., large fixed obstacles like mountains and buildings), and the small-scale/fast fading () due to destructive interference from multipath effects and small scatterers. Models cannot, without perfect knowledge of the environment, be expected to predict the quantity . In most applications, this additional error is computed from a probability distribution (often Raleigh, although Rician and m-Nakagami are popular).

It is worth noting that among the models we have implemented, very few were designed for exactly the sort of networks we are testing them against. Indeed, some are very specific about the type of environment in which they are to be used. Table 1 lists the coverage domain of each model we have implemented, when available. In this work we do not strictly adhere to these coverage requirements because we observe that they are not largely followed in the literature (the Longley-Rice Irregular Terrain model, in particular, is frequently used well outside of its intended coverage domain). In this study both appropriate and “inappropriate” models are given an equal chance at making predictions for our network. We have no starting bias about which should perform best.

2.1. Theoretical/Foundational Models

The first models worth considering are purely analytical models derived from the theory of idealized electromagnetic propagation. These models are simple to understand and implement and as a result they have been widely adopted into network simulators and other applications and often function at the center of more complex models. Important examples include Friis equation for free space path loss between isotropic transmitters [5] and the two-ray ground-reflection model [7, 17]. Friis equation is an integral component of many of the more complex models. It observes that power is attenuated in free space proportional to the distance squared. The ratio between received power and transmitted power is given as a function of distance () and wavelength (). More commonly this equation is given in the logarithmic domain by where distance () is given in km, carrier frequency () in MHz, and power in units of decibels relative to a mW (dBm).

2.2. Basic Models

The models that we call “basic” models are the most numerous. They compute path loss along a single path and often use corrections based on measurements made in one or more environments. In general, they use the distance, carrier frequency, and transmitter and receiver heights as input. Some models also have their own parameters to select between different modes of computation or fine tuning. Here we subdivide these models into deterministic and stochastic categories. The stochastic models use one or more random variables to account for channel variation (and hence are able to predict a distribution instead of a median value). The Egli model [6, 34, 35], Green-Obaidat [25], Hata-Okumura [8, 35] (and its many derivative models [16, 22, 29, 36]), and the Walfisch-Ikegami model [37] are good examples of deterministic basic models. Stochastic models include the recent Herring models [32] and the Erceg models [20, 24]. Because we are concerned with predicting median path loss, we disable the stochastic element of these models and simply use their median prediction.

2.3. Terrain Models

Terrain models are similar to the basic models but also attempt to compute diffraction losses along the line of sight path due to obstructions (terrain or buildings, for instance). They are an order of magnitude more complex but are immensely popular, especially for long propagation distances at high power in the VHF band (i.e., television transmitters). Important examples include the ITM [12, 38], which is widely used in propagation planning software (e.g., [39, 40]), the ITU-R 452 model, which is quite similar but with some added complexities [28], and the straight-forward ITU-Terrain Model [7, 23].

2.4. Supplementary Models

Supplementary models cannot stand on their own but are instead intended to make corrections to existing models. These models are best subdivided into the phenomenon they are wishing to correct for: stochastic fading [35, 41, 42], frequency [27], atmospheric gases [43], terrain roughness [34], and antenna directivity [21, 44] cover the majority of models. When appropriate, we use these models to correct the other models (i.e., frequency correction for the Hata model (hata.fc) or directivity correction for the CU-WART measurements).

2.5. More-Advanced Models

There are also two major categories of models that we are not considering in this study: many-ray (ray-tracing) models and active-measurement models. Although to some extent these models typify the state-of-the-art with respect to propagation modeling, they are not the models that are widely used in simulators and propagation planning tools. To a large extent, this is because they have greater data requirements. Many-ray models require high-resolution data describing the environment and substantial computation time. These predict the summed path loss along many paths by uniform theory of diffraction (or similar) [2, 45, 46].

Active-measurement models take the perspective that the only way to make realistic predictions is to combine an a priori model with in situ measurements. The development of these models is fairly immature but there are front-runners, including the proposal of Robinson et al. in [4]. A related set of “partition” models, most well known in indoor propagation applications, combines the multiray approach with some direct measurement of losses due to obstacles [3].

3. Existing Comparative Studies

The vast majority of existing work analyzing the efficacy of path loss models has been carried out by those authors who are proposing their own improved algorithm. In such cases, the authors often collect data in an environment of interest and then show that their model is better able to describe this data than one or two competing models. Unfortunately, this data is rarely published to the community, which makes comparative evaluations impossible. One noteworthy exception is the work of the European Cooperation in the field of Scientific and Technical Research Action 231 (COST-231) group in the early 1990s, which published a benchmark data set (900 MHz measurements taken in European cities) and produced a number of competing models that were well-performing with respect to this reference [16]. We consider all of the proposed COST-231 models and data in the analysis that follows.

Similarly, there was substantial work done in the USA, Japan, and several other countries in the 1960s and 1970s to derive accurate models for predicting the propagation of analog TV signals (e.g., [47]). This flurry of work produced many of the models that are still used today in network simulators and wireless planning tools: the Irregular Terrain Model (ITM) [12], the Egli Model [6], and the Hata-Okumura model [8], to name a few. However, it is unclear what the implications are of using these models, which were created for use in a specific domain, to make predictions about another domain. Understanding and quantifying the error associated with such applications is a central goal of our work here.

There are several studies similar to this work, which compare a number of models with respect to some data. In [34], the authors compare five models with respect to data collected in rural and suburban environments with a mobile receiver at 910 MHz. They discuss the abilities of each model, but abstain from picking a winner. In [48], the authors compare three popular models to measurements collected at 3.5 GHz by comparing a least squares fit of measurements to model predictions. The authors highlight the best of the three, which turns out to be the ECC-33 model proposed in [24]. In [49], Sharma and Singh do a very similar analysis but instead focus on measurements made in India at 900 and 1800 MHz. In contrast to [48], they find that the Stanford University Interim model (SUI) and COST-231 models perform best. The work presented here is the first to do an in-depth and rigorous analysis of a large number of diverse propagation models using a large and realistic data set from a production network. And, it is the first such comparative study looking at results for the widely used 2.4 and 5.8 GHz bands.

4. Measurement

In this section, we describe data sets collected to act as a ground truth basis for comparison to model predictions. These measurements were collected over the course of several years in multiple environments and with differing (but consistent) hardware. They range from “clean” measurements taken in rural New Zealand to “noisy” measurements collected in the urban center of a large US city along with some special measurements to investigate points of particular interest, such as measurements with phased-array and directional antennas, and some in suburban environments. Overall, these data sets combine to paint a unique picture of the real-world wireless radio environment at varying levels of complexity. Table 2 provides a summary of these data sets.

4.1. Packet-Based Measurements

With the exception of the COST-231 data (campaign D in Table 2), all data sets used here were collected using commodity hardware and packet-based measurements were used to determine received signal strength. This approach differs from some prior work on path loss modeling that uses continuous wave (CW) measurements [16, 50]. When using packet-based methods to collect information about received signal strength and path loss, a transmitter is configured to transmit “beacon” frames periodically. A (often mobile) receiver records these beacon frames. Using an open-source driver, such as MadWifi [51], and a compatable chipset, frames can be recorded in their entirety to the hard disk in realtime using any number of user space software tools (e.g., tcpdump). If these frames are recorded with the optional Radiotap header [52] (or equivalently, the more archaic Prism II header), then the record will include information about the physical layer, such as the received signal strength of the frame, any Frame Check Sequence (FCS) errors, and a noise floor measurement. Using this approach, inexpensive commodity hardware can be used to make extensive passive measurements of a wireless network.

To get an idea of how accurate commodity radios are in measuring Received Signal Strength (RSS), some calibration experiments were performed in a conductive setting. Each of four radio cards was directly connected to an Agilent E4438C Vector Signal Generator (VSG). The cards were all Atheros-based Lenovo-rebranded Mini-PCI Express, of the same family (brand and model line) chipset to those used for all of our packet-based measurements. The VSG was configured to generate 802.11 frames and the laptop to receive them. For each of the four cards many samples were collected while varying the transmit power of the VSG between −20 dBm and −95 dBm (lower than the receive sensitivity threshold of just about any commodity 802.11 radio) on 5 dB increments. Finally, a linear least squares fit was performed, finding a slope of 0.9602 and adjusted -squared value of 0.9894 (indicating a strong fit to the data). Thus, the commodity radios perform remarkably well in terms of RSS measurement. To correct for the minor error they do exhibit, we can use the slope of this fit to adjust our measurements, dividing each measurement by the slope value.

However, there is a drawback to this approach. Packet-based methods necessarily “drop” measurements for packets that cannot be demodulated. All receivers have fundamental limits in their receive sensitivity that are a function of their design. However, because packet-based measurement techniques rely on demodulation of packets to determine the received signal strength, they have a necessarily lower sensitivity than receivers that calculate received power from pure signal (continuous wave measurements). Additionally, without driver modification, commodity receivers generally update noise floor measurements infrequently. For the purpose of analyzing accuracy of median path loss prediction, these limitations are not problematic. In one sense, commodity hardware “loses” only the least interesting measurements—if we are unable to decode the signal at a given point, we are at least aware that the signal is below the minimum detectable signal for basic modulation schemes and is as a result, unlikely to be usable for many applications.

It should be noted that packet-based measurement methods are not appropriate for all modeling tasks—the tradeoff between convenience and affordability of commodity hardware versus the completeness of the measurements must be considered. For instance, if the goal of a measurement campaign is to sense signals or interference near the noise floor in order to predict capacity for next generation protocols or if the goal is to model delay spread or Doppler shift, then packet-based measurements will not be sufficient. However, our work here has less demanding data requirements than these applications. For the purpose of measuring median Signal-to-Noise Ratio (SNR) at a given point in space from the perspective of a typical receiver, packet-based measurements made with commodity hardware are both sufficiently accurate and convincingly representative.

4.2. Rural Measurements

In cooperation with the Waikato Applied Network Dynamics (WAND) research group at the University of Waikato [53] and the RuralLink wireless internet service provider (WISP) [54], we acquired a large set of measurements from a commercial network in rural New Zealand. These measurements were collected for the Wireless Measurement Project (WMP) [55] and are labeled campaign E in Table 2. Rural environments are simpler than cluttered urban environments because there are fewer obstacles to cause fading. Those obstacles that do exist are typically large and constant (e.g., mountains and terrain features) which produce only large-scale shadowing and minimal small scale (fast) fading. Moreover, the isolated nature of rural networks results in less interference from neighboring competing networks, which can create random fades that are difficult to predict and model. Hence, our measurements here are intended to form a comparative baseline for the measurements in more complex environments.

The network used in our study is a large commercial network that provides Internet access to rural segments of the Waikato region in New Zealand (as well as some in other regions). Our overall approach to measurement involves periodically broadcasting measurement frames from all nodes and meanwhile recording any overheard measurement frames. Every two minutes, each device on the network transmits a measurement frame at each supported bitrate. Meanwhile, each device uses a monitor mode interface to log packets. Because this is a production network, privacy concerns are of clear importance which is why all measurements are made with injected packets and a Nondisclosure Agreement (NDA) was required for use of parts of the data that contained sensitive information (principally client locations).

The network is arranged in the typical hub-and-spoke topology, as can be seen in Figure 1. The backhaul network is composed of long-distance 802.11a links operating at 5.8 GHz (“wmp/a” in Table 2). Atypically liberal power regulations in New Zealand and Australia around 5.8 GHz allow for much longer links than can be seen in most other places in the world −40 km is a typical link length in this networkmark (fixed radio links (Unlicensed National Information Infrastructure (U-NII) devices) operating between 5.725 and 5.825 GHz that use wide band digital modulation are allowed an Effective Isotropic Radiated Power (EIRP) of 200 W [56]). These are commonly point-to-point links that use highly directional antennas that are carefully steered. The local access network is composed of predominantly 802.11b/g links that provide connectivity to Client Premises Equipment (CPE) (“wmp/g” in Table 2). Often, an 802.11 g Access Point (AP) with an omnidirectional or sector antenna will provide access to a dozen or more CPE devices that have directional (patch panel) antennas pointing back to the AP. With few exceptions, each node in the network is an embedded computer running the Linux operating system that allows us to use standard open-source tools to perform measurement and monitoring. All nodes under measurement use an Atheros-brand radio and the MadWifi driver [51] is used to collect frames in monitor mode and record received signal strengths using the radiotap extension to libpcap [52].

After collection, the data requires scrubbing to discard frames that have arrived with errors. Because there is substantial redundancy in measurements (many measurements are made between every pair of participating nodes), discarding some small fraction of (presumably randomly) damaged frames is unlikely to harm the integrity of the data overall. As a rule, any frame that arrives with its checksum in error or those from a source that produces less than 100 packets is discarded. For the work here, one representative week of data collected between July 25th, 2010, and August 2nd, 2010 is used. Because detailed documentation about each node simply did not exist, some assumptions were made for analysis. The locations of nodes for which there is no specific Global positing system (GPS) reading are either hand coded or, in the case of some client devices, geocoded using an address. Antenna orientations for directional antennas are assumed to be ideal—pointing in the exact bearing of their mate. All nodes are assumed to be positioned 3 m off the ground, which is correct for the vast majority of nodes. While these assumptions are not perfect and are clearly a source of error, they are reasonably accurate for a network of this size and complexity. Certainly, any errors in antenna heights, locations, or orientations are on the same scale as those errors would be for anyone using one of the propagation models analyzed to make predictions about their own network of interest.

In the end, our scrubbed data for a single week constitutes 19,235, and 611 measurements taken on 1328 links (1262 802.11 b/g links at 2.4 GHz and 464 802.11a links at 5.8 GHz) from 368 participating nodes. Of these nodes, the vast majority are clients and hence many of the antennas are of the patch panel variety (70%). Of the remaining 30%, 21% are highly directional point-to-point parabolic dishes and 4.5% each of omnidirectional and sector antennas.

4.3. Urban Measurements

In addition to the “baseline” measurements in a rural setting, we collected measurements in three additional environments to complete the picture of the urban/suburban wireless propagation environment. The three campaigns cover the three transceiver configurations that are most important in the urban wireless environment (see Figure 2). The first, A, concerns well-positioned (i.e., tower or rooftop) fixed wireless transceivers. This sort of link is typically used for backhaul or long-distance connections (e.g., [57]). The second, B, concerns propagation between a single fixed ground-level node (i.e., on a utility pole) and mobile ground-level client devices. Finally, the third C, concerns infrastructure network configurations where one fixed well-positioned transmitter (AP) is responsible for serving multiple ground-level mobile nodes.

4.3.1. Backhaul

The first data set, A, was collected using the University of Colorado at Boulder (CU) Wide Area Radio Testbed (WART), which is composed of six 8-element uniform circular phased-array antennas [58]. The devices are mounted on rooftops on the CU campus and in the surrounding city of Boulder, CO (see Figure 3). These devices can electronically change their antenna pattern, which allows for them to operate as a directional wireless network with a main lobe pointed in one of 16 directions or as an omnidirectional antenna whose gain is (approximately) uniform in the azimuth plane. To collect this data, an “N × N scan” is done of the sort proposed in [59], which results in RSS measurements for every combination of transmitter, receiver, and antenna pattern. In short, this works by having each AP take a turn transmitting in each state while all other nodes listen and log packets. Identical measurements were collected during the winter (no leaves), during a snowstorm, and during the summer of 2010. These network measurements are applicable to rooftop-to-rooftop communication systems, including cell networks, and point-to-point or point-to-multipoint wireless backhaul networks both with directional antennas and with omnidirectional antennas. Although this is a reasonably small network, the representativeness of the environment (a typical urban/suburban campus) and the large number of effective antenna patterns ( unique combinations) that can be tested provide a strong argument for the generalizability of this data. This data is available at [60].

4.3.2. Street-Level Infrastructure

The second set of urban measurements, B, involves three data sets from three urban municipal wireless networks: a (now defunct) municipal wireless mesh network in Portland, OR, the Google WiFi network in Mountain View, Ca, and the Technology For All (TFA) network in Houston, TX. All three data sets involve data collected with a mobile client. As a standard practice we truncate the precision of the GPS coordinates to five significant digits, which has the effect of averaging measurements within a 0.74 m (6 wavelength) circle (a conservative averaging by the standard of [61]).

(a) Portland, OR
In this network, 70 APs are deployed on utility poles in a 2 km by 2 km square region. Each AP has a 7.4 dBi omnidirectional antenna that provides local coverage in infrastructure mode. These measurements were collected during the summer of 2007. This data set, which consists of both laborious point testing and extensive war-driving data, is most representative of ground-to-ground links in urban environments. Collection involved a two-stage process. First, a mobile receiver was driven on all publicly accessible streets in the 2 km by 2 km region. The receiver was a Netgear WGT-634u wireless router running OpenWRT linux [62] and the open-source sniffing tool Kismet [63]. The Kismet tool performs channel hopping to record measurements on all 11 802.11b/g channels which imposes a uniform random sampling (in time) on the observed measurements. The receiver radio is an Atheros-brand chipset, with an external 5 dBi magnetic roof-mount “rubber duck” antenna and a Universal Serial Bus (USB) GPS receiver. Passive measurements of overheard management frames (beacons) were recorded to a USB compact flash dongle. This results in a large set of measurements that is referred to as “pdx/stumble” here. After this initial stage, 250 additional locations were selected at random from within the region and tested more rigorously with a state-based point tester. At each of these points, physical layer information was recorded (i.e., SNR) along with results from higher layer tests. This smaller data set is called “pdx” in the remainder of the paper, and the data collection procedure is described in more detail in [64] and is available for download at [65].

(b) Mountain View, CA
The Google WiFi network [66], deployed in Mountain View, CA covers much of the city (31 km2) with 447 Tropos-brand [67] 2.4 GHz 802.11 mesh routers (see Figure 4). The measurements used here were collected by Robinson et al. between October 3rd and 10th in 2007 for their work in [4]. These measurements were made publicly available at [68] and involve passive measurements over a subset of the coverage area (12 km2) encompassing 168 mesh nodes. These nodes are mounted on light poles as in the Portland measurements and have a 7.4 dBi omnidirectional antenna for local coverage in addition to the backhaul network. The measurements were made with an IBM T42 laptop with a 3 dBi antenna and GPS receiver running the NetStumbler sniffing software [69]. As with the Portland measurements, these are all passive measurements of management frames (beacons) and the sniffer employs channel hopping to make a uniform random sample (in time) of all 11 channels. The Received Signal Strength Indicator (RSSI) and noise values are recorded for each packet overheard along with a time-stamp and GPS location. Some minor anonymization of the data has been done to remove unique identifiers (basic service set identifier (BSSID)s). RSSI is converted to RSS using by subtracting 149 from each value [70]. Precise height and transmit power control information was not recorded for this data, so in our application we use the reasonable constant values of 20 dB (100 mW) transmit power (as extracted from Tropos product white-paper specifications) and 12 m for the utility pole height.

(c) Houston, TX
The final set of street-level infrastructure measurements comes from the community wireless mesh network constructed by Rice University and the TFA nonprofit organization in Houston, TX [71]. Figure 5 shows a heat map of the measurements. These measurements were collected by Robinson et al. and Camp et al. for their work in [1, 4]. The measurements have been made publicly available at [72]. This network involves 18 wireless nodes in a residential area in Southeast Houston, providing coverage to approximately 3 km2 and more than 4000 users. In the data collection, the NetStumbler software was used on a laptop with an a GPS device and Orinoco Gold 802.11b wireless interface (Atheros chipset) connected to a car-roof mounted 7 dBi omnidirectional antenna. As with the other measurements, all data collection is passive and the software channel hops to record a random sample of overheard management frames (beacons) on each of the 11 channels. The drive test covers all city streets in the region and was carried out 15 times between the hours of 10 AM and 6 PM between December 15th, 2006, and February 15th, 2007. Although this is a winter data collection, Houston has a tropical climate, so it is presumed that there is fading due to foilage throughout the year. The measurements contain signal strength, noise, and location values as well as the vehicle’s average velocity at the point of measurement.

4.3.3. Wide-Area Infrastructure

The final data set, C, involves two sets of measurements: one from the CU WART and one set of published measurements from a well-placed transmitter in Munich, Germany.

(a) Boulder, CO
The first data set was collected using a mobile node (a Samsung brand “netbook”) with a pair of diversity antennas. In this experiment, the 6 rooftop CU WART nodes were configured to transmit 80-byte “beacon” packets every seconds, where is a uniformly distributed random number between and . Beacons are configured to transmit at 1 Mbps so that possible effects of Doppler spread on higher-data rate waveforms are avoided. Similarly, the mobile device was configured to transmit beacons at the same rate. Meanwhile, each rooftop testbed node was configured to its 9 dBi omnidirectional antenna pattern.
All nodes, including the mobile node, were configured to log packets using a second monitor mode (promiscuous) wireless interface. The mobile node was additionally instrumented with a USB GPS receiver that was used both to keep a log of position and to synchronize the system clock so that the wireless trace was in sync with the GPS position log. These measurements were collected during the summer of 2010. During the experiment, the mobile node was attached to an elevated (nonconducting) platform on the front of a bicycle. The bicycle was pedaled around the CU campus on pedestrian paths, streets, and in parking lots. This data set is most representative of an infrastructure wireless networks where a well-positioned static transmitter must serve mobile clients on the ground. This data set is subdivided into the upstream part (boulder/gtp) and the downstream part (boulder/ptg).

(b) Munich, Germany
The second group of measurements are from a reference data set collected by the COST-231 group at 900 MHz [16] in Munich in 1996. This data set, which provides path loss measurements collected by a mobile receiver from three well-placed (rooftop) transmitters, is the closest in intent to our data set C but does not include upstream measurements. This data set is fundamentally different from our other data sets in that it involves continuous wave (CW) measurements instead of packet-based measurements and was collected with differing hardware at a much lower frequency. Because quality reference data sets of this sort are few and far between and the COST-231 is an early example of such a campaign, we have included it for completeness and comparison.

5. Implementation

Each of the 30 models is implemented from their respective publications in the Ruby programming language. Only one of the models, the ITM [12], has a reference implementation. Hence, there are fundamental concerns about correctness. To address this, basic sanity checking of model output is performed. However, without access to the data sets on which the models were derived, or reference implementations, it is impossible to make a more rigorous verification than this.

5.1. Terrain Databases

Terrain Models require access to a Digital Elevation Model (DEM). In the the case of the International Telecommunications Union Radiocommunication Sector (ITU-R) 452 model, a Landcover Classification Database (LCDB) is required as well. The DEM used for the networks in the United States is publicly available raster data set from the United States Geological Survey (USGS) Seamless Map Server, providing 1/3-arcsecond spatial resolution. The US LCDB is also provided by the USGS as a raster data set, which is generated by the USGS using a trained decision tree algorithm. For the New Zealand data sets, DEM and LCDB data are provided by the Environment Waikato organization. The DEM has a vertical precision of 1 m and an estimated accuracy of 5-6 m RMSE. The GDAL library [73] is used to perform coordinate conversions and data extraction to generate path profiles for the terrain algorithms.

5.2. Corrections for Hata-Okumura

In our implementation of Hata-Okumura, and its derivative models, a few crude corrections are made to antenna heights in the event that they fall outside of the models coverage (and would therefore produce anomalous results). First, the minimum of the two heights is subtracted from both so that they are relative. For instance, antenna heights of 30 and 40 m become 0 and 10. Then, heights are swapped if necessary so that the transmitter height is always higher than the receiver height (at this point the receiver height will be zero). Next, one is added to the receiver height and one is subtracted from the transmitter height, keeping the relative difference but setting the receiver height to 1 m. For instance 0 and 10 m would become 1 and 11 m. Finally, the transmitter height is decreased or increased as necessary so that it is above the minimum (30 m) and below the maximum (200 m).

These corrections are necessary to use the Hata-Okumura model with transmitter or receiver heights that would otherwise produce meaningless (infinite) results. It is not certain what the impact is on the model performance by making these corrections. However, it stands to reason that even if the performance is negatively impacted, an inaccurate prediction will still be closer to the true answer than an anomalous (infinite) prediction. We believe this to be acceptable due diligence in terms of applying the model outside of its domain of coverage (where the accuracy of predictions is already questionable).

6. Method

To obtain results we ask each model to offer a prediction of median path loss for each link in the data. The model is fed whatever information it requires, including DEM and LCDB information. The model produces an estimate of the loss that is combined with known values to calculate the predicted received signal strength where is the antenna gain of the transmitter in the azimuthal direction () of the receiver and is the antenna gain of the receiver in the azimuthal direction () of the transmitter. These gains are drawn from measured antenna patterns. The antenna patterns were derived for each directional antenna empirically, using the procedure described in [74], or from manufacturer specifications. Omnidirectional antennas were modeled as constant gain (isotropic). The transmit power () is set to 18 dBm for all nodes, which is the maximum transmit power of the Atheros radios that all measurement nodes use. For a given link, we calculate the median received signal strength value across all measurements (). Then, the prediction error, , is the difference between this prediction and the median measured value

Some models come with tunable parameters of varying esotericism. For these models, we try a range of reasonable parameter values without bias towards those expected to perform best.

This entire process requires a substantial amount of computation but is trivially parallelizable. To make the computation of results tractable, we subdivide the task of prediction into a large number of simultaneously executing threads and merge the results after completion. This must occur in two sequential stages. During the first stage, path profile information is extracted and prepared for each link in parallel, and during the second stage this information is provided to each algorithm for each link, which can also be done in parallel. With the merged data in hand, each prediction is compared with an oracle value for the link. This oracle value is computed from the measured received signal strength for the link as well as known values for the transmitter power and antenna gain.

It is worth noting that very few of the models tested were designed with the exact sort of network that we are studying in mind. Indeed, some are very specific about the type of environment in which they are to be used. In this study both appropriate and “inappropriate” models are given an equal chance at making predictions for our network—there is no starting bias about which should perform best.

6.1. Five Metrics

The performance of the models is analyzed with respect to several metrics in order of decreasing stringency:(1)RMSE and and spread corrected root mean square error (SC-RMSE),(2)Competitive success,(3)Individual accuracy relative to spread,(4)Skewness,(5)Rank correlation.

RMSE is the most obvious and straightforward metric for analyzing the error of a predictive model of this sort. As discussed above, for a given model we compute an error value ( as in (5)) for each prediction for each link in each data set. For a given set of links in a given data set and a given model , we can define the overall RMSE for a given model for a given data set as where is the error of model for link and is the number of links in the data set . SC-RMSE is a version of RMSE that subtracts off the expected spread in the measurements from the RMSE. This way, if a given link has large variation in our measurements, then the error a model obtains on that link is reduced by a proportional amount. This has the effect of reducing the error associated with especially noisy links. Figure 6 provides an explanatory diagram comparing normal error () and spread corrected error (). The spread corrected error for a given model and link is the absolute value of the error, reduced by the standard deviation () of measurements on link , with

Computing SC-RMSE is identical to RMSE as shown in (6), except that is substituted for .

The competitive success metric is the percentage of links in a given data set that a given model has made the best prediction for. For each link we keep track of the model that makes the prediction with the smallest , count the number of best predictions for each model, and then divide this count by the total number of links:

We would expect that when analyzing many models, if one model (or a set of related models) is dominant for a given environment, then it would score near 100 on this metric. Because the percentage points are divided evenly between all models tested, if we test a large number of models, this metric may be spread too thinly to be useful for analysis (i.e., too many similar models share the winnings and no single model comes out on top).

The individual accuracy metric is the percentage of links where the given model is able to make a prediction within one or two standard deviations of the measured spread where is how many standard deviations to use for the metric. In the following analysis, we present results for and .

The fourth metric is skewness, which is simply the sum of model error across all links

This metric highlights those models that systematically over- and underpredict. Some applications may have a particular cost/benefit for under- or overpredictions. Models that systematically overpredict path loss (and therefore underpredict received signal strength) score a high value on this metric. Models that systematically underpredict score a large negative value. And, models that make an equal amount of under- and overpredictions will score a value of zero.

Our final metric is rank correlation using Spearman’s (Kendall’s would be an equally appropriate metric but is slower to compute). In some applications, predicting an accurate median path loss value might not be necessary so long as a model is able to put links in a correct order from best to worst (consider, e.g., the application of dynamic routing). Spearman’s is a nonparametric measure of statistical dependence and in this application describes the relationship between ranked predictions and oracle values using a value between −1.0 (strong negative correlation) and 1.0 (strong positive correlation).

7. Results

We begin by explicitly fitting the data to a theoretical model and looking at the number of measurements required for a fit. This gives an initial estimate of expected error for direct (naïve) fits to the collected data. Then, to analyze the performance of the algorithms, we apply five domain-oriented metrics of decreasing stringency. We discuss the performance results for each data set with respect to these metrics, as well as general trends and possible sources of systematic error. Finally, explicit parameter fitting of the best models is performed, and this best-case performance is used to define practical lower bounds on model prediction error.

7.1. Explicit Power Law Fitting

In this section we attempt to explicitly fit the relationship between attenuation and distance as a straight line on a log/log plot. To this end, we extend the classic equation for freespace path loss from [5] to allow for a fitted path loss exponent () and offset () and proceed with least squares fitting

Figure 7 shows the resulting fits using this method for a subset of our data sets. One unavoidable side effect of packet-based measurements is that it is impossible to record SNR values for packets that fail to demodulate. Hence, because the 2.4 and 5.8 GHz data is derived from packet-based measurements, low SNR values (and therefore high path loss values) are underrepresented here, which leads to “shallow” fits and unrealistically low values of . As a result, while it is safe to make comparisons between the 2.4/5.8 GHz data sets, it is not safe to directly compare the slope of the 900 MHz and 2.4/5.8 GHz fits.

Table 3 lists fitted parameters () and residual standard error () (for all intents and purposes, standard error () and RMSE are interchangeable). Between the 2.4 GHz data sets, we can see that there is little consensus about the slope or intercept of this power law relationship, except that it should be in the neighborhood of and . All fits are noisy, with standard error around 8.68 dB on average for the urban data sets. This residual error tends to be Gaussian, which is also in agreement with previously published measurements (e.g., [17]). However, the size of this error is almost two orders of magnitude from the 3 dB that Rizk et al. suggest as an expected repeated measures variance for outdoor urban environments (and hence the expected magnitude of the error due to temporally varying fast fading) [75]. Looking at Figure 7, it is easy to see that the 2.4 GHz measurements are substantially less well behaved than the 900 MHz COST-231 data, even in comparable environments.

In order to understand how many measurements are needed to create a fit of this sort, we take successively increasing random samples of the data sets and use these subsets to generate a fit. We then look at how the residual error of the model (with respect to the complete data set) converges as the subsample size increases. Figure 8 shows this plot for one representative data set. However, all plots follow a similar trend: the eventual model is closely matched with approximately 20, or at most 40, data points. Table 3 gives an approximate minimum sample size for each data set in the column labeled derived from these plots.

7.2. Performance Comparison

Figure 9 shows the results of the five metrics for all data sets combined. To conserve space, we have omitted results for the individual data sets and instead have summarized the important results in the following discussion. Also, to simplify the plots, we have only included results from the 18 best performing models.

Looking first at the results for the rural (WMP) data, the best performing models achieve an RMSE on the order of 15 dB. The best models are the Allsebrook model (with its terrain roughness parameter set to 200 m) at just under 18 dB RMSE (16.7 dB when corrected) and the Flat-Edge model (with 10 “buildings” presumed) at 16.5 dB RMSE (15.3 dB when corrected). The urban models do much better in terms of RMSE. The best models achieve an RMSE on the order of 10 dB and the worst (of the best) approach more than 50 dB. The overall winners are the Hata model, the Allsebrook model, the Flat-Edge model, and the ITU-R model. This follows from expectations because all of these models were derived for predicting path loss in urban environments. The Hata model and Allsebrook model are based on measurements from Japanese and British cities, respectively. The Flat-Edge model is a purely theoretical model based on the Walfisch-Bertoni model, which computes loss due to diffraction over a set of uniform screens (simulating buildings separated by streets). Table 3 provides the top three models by SC-RMSE for each data set and their corresponding error.

For the second metric, competitive success, look to the leftmost (red) bar in the second of the plots. For most of the data sets, there is no clear winner with the best models sharing between 10 and 15 percent of the winnings. This indicates that there is no single model that outperforms all others. There are a few exceptions. For the PDX data set, the Davidson model takes 40% of the winnings, in the COST-231 data set, the ITU-R 25 model takes 30%, in the Google data set, the Davidson model takes more then 30%, and, in the downstream Boulder measurements (boulder/gtp), the Davidon model again takes 25% of the winnings. There is not, however, a single model or two that outperform all others in a large subset of our data. Hence, we can conclude that the choice of the most winning model is environment dependent.

The third metric is percentage of predictions within one (or two) standard deviation of the true median value. This metric requires multiple measurements at each point in order to estimate temporal variation in the channel. Of our data sets, six have this data available: WMP, COST-231, PDX/Stumble, Google, TFA, and WART. For the WMP data the best performing models (Allsebrook, Flat Edge, Herring Air-to-Ground, and ITU-R) score between 10% (for within one standard deviation) and 20% (for within two standard deviations) on this metric. We see similar results for our other data sets but different winners. For the PDX/Stumble data the winners are Herring Air-to-Ground, Hata, and ITU-R 25. For the WART data set, the winners are the ITM, ITU-Terrain, and Blomquist. For the COST-231 data set the winners are Herring Air-to-Ground, Hata, and Allsebrook. Again, the best performing model appears to be largely environment dependent.

Our fourth metric is skewness. The interpretation of this metric is largely application dependent—it is hard to know in advance whether over- or underestimates are more harmful. If a model makes an equal amount of over- or underestimates (resulting in zero skewness) but has a large RMSE, is it better than a model that systematically overestimates but has a small RMSE? The Hata model is particularly well behaved by this metric, producing a value near zero for all data sets. As one would expect, the Hata-derived models perform similarly (i.e., ITU-R 25, Davidson, etc.). The rest of the models seem to vary largely from data set to data set, although ITU-R 452 performs well for some data sets.

The final metric is rank correlation. For just about all of the models we see a rank correlation around 0.5, which indicates a moderate (but not strong) correlation between measured and predicted rank orderings. Models that perform particularly poorly by this metric achieve values much lower on occasion. A result near zero indicates that there is no noticeable correlation between rank orderings. The COST-231 rank correlations are substantially higher than all other data sets. We believe this is related to the fact that the COST-231 data more closely fits theoretical expectations of the relationship of path loss to distance. Hence, models that use something like Friis equation at their core will produce rank values that are closer to data in this data set. Overall, however, there does not seem to be a consensus about which model performs best at rank ordering—the winners are different for each data set.

7.3. Explicit Parameter Fitting

In order to determine the minimum obtainable error with these models, we take two well-performing models that have tunable parameters, Allsebrook-Parsons and Flat-Edge, and proceed by searching the parameter space to find the best possible configuration (data from the Boulder, WART, and PDX data sets were used for this experiment). The Allsebrook-Parsons model takes three parameters (besides carrier frequency, which is common to nearly all the models): , a terrain roughness parameter (in m), , the average height of buildings (in m), and , the average width of streets (in m). The Flat-Edge model also takes three parameters: , the number of buildings between the transmitter and receiver, , the average height of these buildings (in m), and , the street width (in m). After sweeping the parameter space, we use an ANOVA to determine the parameters that best explain the variance in the data.

For the Allsebrook model, the and parameters are both important, and, for the Flat-Edge model, is the only significant parameter. Figure 10 shows the response (in terms of RMSE) for tuning these parameters. The optimal values can be determined from the minima of these plots, and a similar approach could be carried out with any subset of our data. However, the optimal parameters for one datum are not usually in agreement with others, forcing a compromise in terms of accuracy and specificity. Even with cherry-picked parameters, the RMSE is still in the neighborhood of 9–12 dB, which is too large for many applications.

If we consider 9 dB to be the minimum achievable error of a well-tuned model, it is interesting to note that approximately the same performance can be achieved with a straight line fit through a small number (20) of measurements as was done in Section 7.1. In [76], the authors found similar bounds on error (6–10 dB) attempting to fit a single model to substantial measurement data at 1900 MHz.

7.4. Factors Correlated with Error

In order to understand which variables may serve to explain model error, we performed a factorial analysis of variance (ANOVA) using spread corrected error as the fitted value and transmitter height, receiver height, distance, line-of-sight (a boolean value based on path elevation profile), and data set name. Although all of these variables show moderate correlations (which speaks to the fact that many models add corrections based on these variables), some are much better explanations of variance than others. Perhaps not surprisingly; distance and data set name are the biggest winners with extremely large -values (16687.34 and 52375.54, resp., and 14156.54 when combined) Figure 11 plots the relationship between error and link distance for each of the best performing models for two representative data sets—the relationship is plain to see. This leads to the conclusion that the best results can be obtained when an appropriate model is known for a given environment and when the model is designed for the same distances of links being modeled.

One conclusion from this is that hybrid models, which combine the strengths of multiple simpler models, may perform better than any one model alone. To understand the possible benefit of hybridized models, we implemented three hybrid models and applied them to the WMP data. The WMP data was chosen because it includes the largest variety of link lengths. The first uses the Hata model (for medium cities) for links under 500 m (where it is well-performing) and the Flat-Edge model (with 10 “buildings”) for longer links (hatam.flatedge10). This model performs marginally better than all other models, producing a corrected RMSE of 14.3 dB. Very slightly better performance is achieved by combining the Hata model with the Egli Model (14.2 dB RMSE). We also tried using the TM90 model for links less than 10 miles and the ITM for longer links (tm90.itmtem), but this combination is not well-performing with respect to our measurements. Treating this tuning and hybridization as an optimization problem with the goal of producing the best performing configuration of the existing protocols is a promising project for future work. Taking this approach; however, one must be careful to avoid overfitting a model to the data available.

7.5. Practical Interpretation

As an example of what these results mean for real applications, consider Figure 12, which shows a predicted coverage map for the Portland MetroFi network using two well-performing models tuned to their best performing configurations. We have also included versions of these maps with zero-mean 12 dB Gaussian noise, which approximates the expected residual error from these models. To generate these maps, the 2 km by 2 km coverage area was divided into a 500 × 500 raster and each pixel is colored based on predicted received signal strength, linearly interpolated between red (at −95 dBm) and green (at −30 dBm). For each pixel, we compute the predicted path loss from all 72 APs, and the maximum value is used to color the pixel.

Comparing these maps to the empirical and operator assumed coverage maps shown in Figure 13, it is clear to see that there is no consensus on what the propagation environment looks like. The Hata model may produce the picture that is the closest to the measurements, but our results show that it is not the best performing model overall. Moreover, the Allsebrook-Parsons model, which is well-performing overall and has been tuned to its best configuration, produces a map that is in stark disagreement with reality.

Yet, the future holds promise. Consider the final column in Table 3, which gives the RMSE for each data set if we choose to take only the best prediction among all the predictions made by the 30 models and their configurations. This represents one version of a minimal achievable error in a world with a perfectly hybridized model that always knows which model to use when. In this scenario, we can see a very attractive bound on error—as low as 1 dB. This indicates that there is still room for improvement. If we were able to determine the situations when each model is likely to succeed, then it is reasonable to assume that it is possible to construct a single hybrid model that is more accurate than any one model alone.

7.6. Additional Observations

In this section, we discuss several important observations based on the results above.

7.6.1. Modeling Directional Antennas Is Challenging

One interesting additional observation from this data is that modeling path loss from directional transmitters is especially difficult. This can be seen in the fact that our data from the directional CU-WART testbed is particularly noisy. There have been at attempts to model this phenomenon explicitly in the past [21, 44], but even using this correction, the error in prediction of directional propagation is still much greater than for omnidirectional transmitters.

7.6.2. Models That Generate Errors

It is worth noting that some algorithms will generate errors when used outside of their intended coverage. If we give these models the benefit of the doubt and only make use of those predictions where no errors or warnings were generated, the overall performance looks better. For instance, the corrected RMSE for ITM (with parameters for a temperate environment) on the WMP data set improves from 28.2 dB to to 23.1 dB if the most egregious errors are discarded (which stem from problems predicting refraction over certain terrain types and are only 290 of 2492 predictions) and down to 17.3 dB when only those predictions that generate zero warnings are used (which usually stem from links that are too short and are only 696 of 2492 predictions). This is a substantial improvement—at 17.3 dB corrected RMSE, the ITM is performing on par with the best of the other models.

7.6.3. Prediction in Rural Environments Is Challenging

In a result that appears completely counterintuitive, the rural data set is much more difficult to model than our urban data sets. To look for sources of systematic error, we analyzed the covariance (correlation) between “best prediction error” (the error of the best prediction from all models) and various possible factors. There appears to be no significant correlation between carrier frequency (and therefore neither modulation scheme nor protocol) and antenna geometry. However, there is a large correlation between error and distance. It is our hypothesis that the reason the WMP data is especially difficult to model has to do with two factors.(i)Because researchers have assumed that rural environments are “easy” or “solved,” there has been substantially more work in developing (empirical) models for urban environments. The majority of state-of-the-art rural models on the other hand are largely analytical and were mostly developed 30 or more years ago (i.e., the ITM).(ii)This data set has an exceptionally large variety of link lengths, and as has been shown, prediction error is strongly correlated with distance for many models.

8. Conclusion

In this work, we have performed the first rigorous evaluation of a large number of path loss models from the literature using a sufficiently representative data set from real (production) networks. Besides providing guidance in the choice of an appropriate model when one is needed, this work was largely motivated by a need to create baseline performance values. Without an existing well-established error bound for these approaches, it is impossible to evaluate the success (or failure) of more complex approaches to path loss modeling (and coverage mapping). For the models implemented here and the data sets analyzed, it is possible to say that a priori path loss modeling will achieve, at least, 8-9 dB RMSE in urban environments and 15 dB RMSE in rural environments. This is true almost regardless of the model selected, how complex it is, or how well it is tuned. And, this bound seems to agree with prior work at other frequencies in similar environments that have also produced results with RMSE in the neighborhood of 9 dB (e.g., [76, 77]).

Direct approaches to data fitting, such as a straight line fit to the log/log relationship between path loss and distance, produce a similar level of error: 8-9 dB for urban environments and 15 dB for rural environments. Fits of this quality can be obtained after only 20–40 measurements. Hence, we can say with some confidence that whether a network operator does a small random sampling and basic fit or carefully tunes an a priori model to their environment, they can still expect approximately the same magnitude of error. We believe that there is substantial opportunity for future work in the area of measurement-driven path loss modeling and coverage mapping. Although there has been some solid preliminary work in this area (e.g., [4]), there appears to be substantial room for improvement in terms of developing robust statistical methods for sampling and interpolating between measurements.

Among the most important outcomes of this work is a set of guidelines for researchers, which can help provide direction in the complicated landscape of path loss prediction models. As a general rule, when it is feasible to make direct measurements of a network, one should do so. We have shown that a small number of measurements can have substantial power in terms of tuning the models we have studied and in fitting parameters for basic empirical models. When it is not possible to make measurements of a network, the careful researcher should choose from standard well-accepted models such as Okumura-Hata or Davidson, which generally have the least systematic skew in predictions and are among the best performing models overall. In simulation studies, we advocate a repeated-measures approach, where stochastic models are used in a repeated-measures/Monte Carlo experimental design, so that a realistic channel variance can be modeled. For this application, the recent proposal of Herring appears to be a good choice or, for the greatest comparability, the Hata model with stochastic Lognormal fading. Although there are a large number of models from which to choose, our work here shows that in many cases the most important factors that a researcher should consider are having a realistic expectation of error and choosing a model that enables repeatability and comparability of results.