Abstract

Numerical models are presently applied in many fields for simulation and prediction, operation, or research. The output from these models normally has both systematic and random errors. The study compared January 2015 temperature data for Uganda as simulated using the Weather Research and Forecast model with actual observed station temperature data to analyze the bias using parametric (the root mean square error (RMSE), the mean absolute error (MAE), mean error (ME), skewness, and the bias easy estimate (BES)) and nonparametric (the sign test, STM) methods. The RMSE normally overestimates the error compared to MAE. The RMSE and MAE are not sensitive to direction of bias. The ME gives both direction and magnitude of bias but can be distorted by extreme values while the BES is insensitive to extreme values. The STM is robust for giving the direction of bias; it is not sensitive to extreme values but it does not give the magnitude of bias. The graphical tools (such as time series and cumulative curves) show the performance of the model with time. It is recommended to integrate parametric and nonparametric methods along with graphical methods for a comprehensive analysis of bias of a numerical model.

1. Introduction

The models are used in many fields such as engineering, agriculture, health, business, and weather and climate for simulation and prediction. They help to understand the different subprocesses underlying a given process and have undergone tremendous improvements due to developments in computing technology. These models range from simple (e.g., linear regression models) to complex models (e.g., weather and climate prediction models); Glahn and Lowry [1] categorized the models as dynamical and statistical. A combination of dynamical and statistical models is also used in operational forecasting especially using statistical techniques to correct output from a dynamical model.

The national meteorological services usually operate high resolution numerical weather prediction models so as to give accurate guidance to users of weather information [2]. The accuracy of a given model is the measure of how close the model predicted fields are compared to independently observed atmospheric fields [3, 4] but it can be affected by errors in initial conditions, imperfections in the model, and inappropriate parameterizations. When a model agrees with observations, the confidence in using the model is higher [5] but the present agreement does not necessarily guarantee the skill for the future model prediction.

The main advantage of models is their objectivity [1]. However, the presence of systematic errors is due to bias [6] which occurs due to difference in model response to external forcing [7] such as errors in initial conditions. This bias can manifest as overprediction or underprediction and is defined by the World Meteorology Organization as the mean difference between forecast values and mean actual observations [8] while Haerter et al. [9] define bias as time independent component of error in model output.

A couple of methods have been proposed to correct for the bias. Maraun [10] used quantile-quantile method and found that uncorrected regional climate models underestimated precipitation and produced many drizzle cases. Durai and Bhradwaj [11] investigated four statistical bias correction methods (namely, best easy systematic method, lagged linear regression, nearest neighbor, and running mean removal) and noted that the running mean and nearest neighbor methods improved the forecast skill. These methods attempt to reduce the bias in the next forecast using the information from the bias of the previous forecast [12]; however they influence the model output if prediction is based on bias corrected data [8] and they cannot correct improper representation of processes producing the model output [9].

Many studies have employed the parametric methods such as RMSE [1315], MAE [14, 15], and ME [16] relative error [13, 16] to analyze the bias of numerical models but have put less emphasis on graphical tools as well as the nonparametric method. In the present study, we investigate the performance of the bias analysis methods on actual January 2015 temperature data and simulated temperature data using the Weather Research and Forecast (WRF) model (Tables 1 and 2). The rest of the paper is organized as follows: Section 2 describes the data sources, Section 3 presents overview of the methods of bias analysis, Section 4 presents results and discussion, and Section 5 gives summary and conclusion.

2. Data

We simulate January 2015 temperature using WRF model version 3.7 [17], with parameterizations schemes: WRF single moment 6-class scheme microphysics, the Kain-Fritsch cumulus parameterization, the Asymmetric Convective Model option for planetary boundary layer, the Rapid Radiative Transfer Model for longwave radiation, and the Dudhia scheme for shortwave radiation. This data is compared with observed January 2015 temperature (maximum and minimum temperature) data obtained from the Uganda National Meteorological Authority (UNMA). We use six stations (namely, Arua (arua), Entebbe (ebb), Kasese (ksse), Jinja (jinja), Mbarara (mbra), and Gulu (gulu)). For a given day and station, the maximum simulated temperature is compared with the maximum observed temperature and the minimum simulated temperature is compared with the minimum observed temperature.

3. Methods of Bias Analysis

In order to comprehensively investigate the performance of numerical models, it is important to evaluate them on many metrics other than using a single method [5]. In this section, we present the popular methods for analyzing bias of numerical models. The parametric methods are presented in Sections 3.13.6 while the nonparametric method considered is described in Section 3.7.

3.1. The Difference Measures

Willmott et al. [3] suggested a difference variable, , given by the difference between the model predicted value, , and observed value, , that is,This is appropriate for point measurements. It is this measure that gives rise to other measures like the root mean square error (RMSE), the bias or mean error (ME), and the mean absolute error (MAE).

For a model , with time-ordered data set we define the difference as follows:where is the th data point and is the corresponding th observed value from time-ordered actual observed data set . A positive (negative) value indicates that model output is higher (lower) than the actual values.

3.2. The RMSE

The RMSE is the square root of the average squared differences () and is a popular statistical measure for the performance of numerical model in atmospheric research [15]. For a model, , the RMSE is thus defined as follows:

The RMSE is a good criteria to classify the accuracy of a model and a low index indicates higher accuracy.

3.3. The MAE

The MAE is the average of the magnitudes of differences ( taken as positive) and is also a popular index for estimating bias in atmospheric studies. For a model, , the MAE is defined as follows:and, just like RMSE, a low index indicates higher accuracy.

3.4. The Bias

The bias, also known as the mean error (ME), is obtained by averaging the differences () over the number of cases. For a given model output, , the ME is calculated fromThe magnitude of ME is equal to the MAE if all the predicted values of the model are higher (or lower) than the actual values. A value of bias close to zero indicates that model values are in fair agreement with actual values with zero implying no bias.

The relative bias is another bias measure suggested by Christakis et al. [16] in which ME is divided by average observations and given as follows:The bias given by (5) and (6) gives both the direction and probable magnitude of the error.

3.5. The Skewness Coefficient

The skewness coefficient is a moment measure based on symmetry [18]. Having obtained the differences between the model and actual values (), positive (or negative) skewness indicates that model outputs are largely lower (or higher) than actual observations. The skewness coefficient is defined as follows:with as the standard error of the sample biases forming a distribution and calculated as follows:

3.6. The Bias Easy Systematic Method

The bias easy systematic (BES) method considers location measures (especially quartiles) and is given by Durai and Bhradwaj [11] as follows:where , , and are the sample lower quartile, median, and upper quartile, respectively, of the differences, , and it is commended for its robustness for taking care of extreme values by Woodcock and Engel [12].

3.7. Sign Test Method

The sign test method (STM) is a nonparametric method based on assigning a score, , that compares the prediction, , and observation, , at a given point. If the model predicts higher values than observation , we assign positive one (i.e., ), if the model prediction is equal to observed value , we assign zero (i.e., ), and if the model predicts a value lower than observation , we assign negative one (i.e., ); thus

For a model forming a distribution of scores, , of size , such that , the mean is computed as follows:

If the mean score, , for a given model is positive, the model is generally considered to overpredict; if it is negative then the model underpredicts. Otherwise there is no significant bias. We suggest the hypothesis asand consider for unbiased model (i.e., zero bias) For a distribution of sample size less than 30 , we propose the use of Student’s -distribution and make approximation to normal distribution for large samples . The standard error is computed usingThe nonparametric statistic for measuring bias is then corrected and calculated using We can then test this for a given significance level and make statistical inferences.

4. Results and Discussion

In comparing model results with observations, we assume that observed values are accurate and that it is the model predicted values that contain error because, as explained by Piani et al. [19], the models have inconsistencies that are sometimes not solved by bias correction. This thus brings the necessity of clearly determining the direction and magnitude of the bias. The magnitude of the bias can be affected by other factors, namely, the geographical location and season [11]. These factors are not considered in the study but it is possible to compare spatial and temporal bias using the different bias analysis methods.

Table 1 presents bias estimation using maximum temperatures as simulated by WRF model and actual observed values for maximum temperature while Table 2 presents bias estimation using model simulated values and actual observed values for minimum temperature. These tables help to explore the different possible cases and we obtain a negative bias for all maximum temperatures (Table 1) and some stations have positive bias for some minimum temperatures (Table 2). These cases are also presented using time series figures (Figures 112). The time series figures help to investigate how the biases change with time and the greater the departure from the curves (model simulated curve and observed curve), the greater the bias. For Gulu, (Figures 11 and 12) the model and actual observations follow roughly the same trend. For Kasese (Figure 6) there is high variability for actual minimum temperatures compared to those presented by model. For Jinja (Figure 7) actual observations have increasing trend while model values have a decreasing trend over the period (20–30 days). These results imply that a given model can have varying performance in different geographical regions, hence bias.

4.1. Traditional Bias Analysis Methods

The popular traditional parametric bias analysis tools were presented in Section 3. A discussion of these methods is presented below.

The RMSE and MAE vary with both magnitude of error and sample size [15]. If an extreme event happens and is not correctly predicted (simulated) by the model, a big error will result and can manifest as outliers, thus distorting the index. The problem of estimating bias using the RMSE and MAE is as follows: (i) it does not show the direction of bias and (ii) it treats all the biases in one direction, thus amplifying the bias. The bias given by (5) and the relative bias defined by (6) are of great importance as they suggest both magnitude and possible direction of the bias. This is helpful as it indicates whether the model overpredicts or underpredicts the field being predicted, but, as explained by Knutti et al. [5], simple averaging (e.g., bias or ME) is not effective as it is affected by extremes and biases in different directions canceling. The BES is however a location measure and is less affected by extreme values.

4.2. The Sign Test Method (STM)

In this method, we assign positive (or negative) one depending on the direction of the bias and then compute the mean of the assigned values. A value of mean greater (less) than zero indicates positive (negative) bias. By STM, we believe that the direction of the bias is preserved while not being influence by extreme values which occur rarely. For example, a model can have many drizzle days when in reality the days are dry but underpredicts a heavy rainfall event [8]. Aggregating these results using traditional bias estimation methods can lead to confusing results suggesting that the model has no or less bias than should be expected.

If the number of biases in opposite directions is equal, the STM will give a zero score. Although this may appear to be a drawback, its meaning is easily understood. It simply means that the model can equally overpredict or underpredict; however, it rarely occurs in numerical models. On the contrary, if the other methods gave zero, the meaning would not immediately imply that the number of biases in one direction is exactly equal to the number of biases in the other direction and that there has been an offset. It could imply that the model is unbiased which could be misleading. The inferences made using STM statistic are based on general assumption that lead to some function of the sample observation whose sampling distribution can be determined without knowledge of the specific distribution function underlying the population [20]. The STM is also less concerned with the distribution of the population which is why it is noted to be affected by extreme values.

It is possible for the STM and the parametric methods to disagree (Table 2). In results presented by Table 2, for gulu, the STM gives a negative index while the ME gives a positive index. By STM, it means that the model had more value underpredicted than overpredicted which, unfortunately, was weakly resolved by the ME. This probably means that there are cases of partial cancelation of values by the ME which is why it is giving a positive bias.

In principle, we believe that the direction presented by the STM should approach the direction presented by ME for a large sample of values.

5. Summary and Conclusions

The numerical models normally have both systematic and nonsystematic errors. The systematic errors manifest as bias in the model which may lead to either overprediction or underprediction. In the study, we analyzed the parametric methods of analyzing bias and compared them with STM but have not considered spatial bias and methods of correcting the bias.

The parametric methods are based on difference measure and the STM is based on assigning a score of +1 to positive biases and −1 to negative biases and then getting an average of these scores. We believe that STM is ideal for estimating bias in prediction or simulation of scalar geophysical variables (e.g., wind speed, rainfall amount, and temperature) by numerical models and that it is reliable and robust because the values presented are clear to understand as far as determining the direction of the bias is needed. The direction of the bias is needed in order to tune the model to correct for future biases. By STM a value of +1 (−1) indicates that all the values are higher (lower) than the actual ones. The STM can be used in inferences, thus reducing uncertainty, and is also based on a simple algorithm.

However, we do not suggest neglecting other measures but propose a complement because, in order to get a complete analysis of the data, it is important to compare both parametric and nonparametric tools [3]. We also recommend the use of graphical tools especially the density plots and investigating the skewness as well as tail properties. The time series plots can be used to investigate the performance of the model for extended period of time, with an intention of ascertaining whether the model worsens or improves with time. Lastly, while assigning loses the magnitude of the bias, STM only helps to determine the direction of the bias.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The authors appreciate the WIMEA-ICT project for the support and the Uganda National Meteorological Authority for availing the temperature data used for model comparison. They also express sincere thanks to Godfrey Mujuni for organizing the temperature data used.