Abstract

Daily gridded precipitation data are needed for investigating spatiotemporal variability of precipitation, including extremes; however, uncertainties related to daily precipitation products are large. Here, we compare a range of precipitation grids for Australia. These datasets include products derived solely from in situ observations (interpolated datasets) and two products that combine both remote sensed data and in situ observations. We find that all precipitation grids have similar climatologies for annual aggregated precipitation totals and annual maximum precipitation. The temporal correlations of daily precipitation values are higher between the interpolated datasets, but the correlations between the most widely used interpolated product (AWAP) and the two remotely sensed products (TRMM and GPCP) are still reasonable. Our results, however, point to distinct structural uncertainties between those datasets gridding in situ observations and those datasets deriving precipitation estimates primarily from satellite measurements. All datasets analysed agree well for low to moderate daily precipitation amounts up to about 20 mm but diverge at upper quantiles, indicating that substantial uncertainty exists in gridded precipitation extremes over Australia.

1. Introduction

Rainfall is vital to maintaining our water resources and sustaining life on our planet. Surprisingly, however, many existing rainfall products exhibit differences that are often larger than can be explained by observational or methodological uncertainties [1, 2]. In addition, many products perform poorly at capturing daily rainfall variations and particularly rainfall extremes [35]. This has made it difficult to provide the essential underpinning that is required for model evaluation or detection and attribution studies, for example.

Point-based precipitation measurements are often gridded to create high-resolution daily regional datasets that are crucial for climate monitoring at national scales and for model evaluation. There are now a plethora of different gridded products, developed with different resolutions and construction methods for many regions of the world, for example, E-OBS for Europe [6], APHRODITE for Asia [7], AWAP for Australia [8], and CLARIS for South America [9]. Users of datasets, however, are not always aware of their inherent uncertainties and often they might not be fit for the purpose for which they are being used [2, 10, 11]. For instance, uncertainties arise around the representativeness of the gridded values produced [12], the sensitivity of the chosen interpolation technique to changing network density [5], and/or the ability of the gridded product to represent precipitation extremes [4]. For example, Hofstra et al. [5] used the European E-OBS high-resolution gridded dataset to illustrate the effect of station network density on daily precipitation estimates, noting problems especially when examining extreme rainfall characteristics. In addition, Dunn et al. [13] investigated structural and methodological uncertainties when creating global grids of climate extremes and found that variations in station network and the interpolation method used account for the largest uncertainties. A particular challenge when calculating grids of daily precipitation is the high spatial variability of precipitation fields. In particular, on shorter time scales, precipitation fields are spatially discontinuous, although they are more continuous on longer time scales [14].

In addition to daily gauge-based rainfall products, there are also many remotely sensed or merged products which are often also used for monitoring and evaluation (e.g., CMORPH [15], PERSIANN [16], and TRMM [17]). Many of these products, however, have been shown to exhibit substantial differences in their representation of rainfall and especially rainfall extremes [3, 18]. Given the considerable use of these products within the meteorological, hydrological, and climate communities, it would seem prudent to better quantify some of these uncertainties and provide guidelines for users on the potential use and misuse of these datasets, particularly in relation to long-term assessment.

Using Australia as a case study, the purpose of this paper is to highlight some of the structural uncertainty that is inherent in daily precipitation products through use of common interpolation schemes and also to assess how this compares with existing gridded in situ-based datasets and satellite-derived products, which are often used in hydrometeorological applications. In addition, we show how structural uncertainty manifests between gauge-based and satellite-based products and provide an assessment of how gridded products perform at capturing local precipitation extremes.

2. Materials and Methods

2.1. Data

We use several different gridded datasets in order to represent daily precipitation estimates for Australia from in situ, satellite-based, and merged products. Datasets chosen were the in situ-based Australian Water Availability Project (AWAP) and the merged in situ and satellite-based datasets from the Tropical Rainfall Measuring Mission (TRMM) and the Global Precipitation Climatology Project One-Degree Daily (GPCP 1DD) product. AWAP [8, 19] is a dataset created by the Australian Bureau of Meteorology by gridding daily gauge data on a 5 km × 5 km grid using the Barnes successive correction method (http://www.bom.gov.au/climate/). TRMM 3B42 Daily V7 [17, 20] is a 0.25° × 0.25° resolution dataset spanning 50°N–50°S that combines precipitation estimates from multiple satellite sources to create a daily precipitation estimate in the period from 1998 to present. This estimate is rescaled with respect to monthly gauge-based precipitation totals, where possible, for improved accuracy. Finally, GPCP 1DD version 1.2 [21, 22] combines remote sensed data with monthly gauge data to create monthly global precipitation estimates with a resolution. This dataset is then combined with high spatial and temporal resolution infrared data from various satellites to create a global daily precipitation estimate at a resolution for the period from 1996 to present. The data from all three datasets were regridded to a common grid over Australia using a first-order area-conservative remapping technique [23]. Subsequently throughout the paper, the datasets will be referred to as AWAP, TRMM, and GPCP, with TRMM and GPCP being collectively referred to as “merged” datasets since they combine remotely sensed and in situ-based data. It should be noted that TRMM and GPCP use monthly gauge analysis; so the intensity and occurrence of daily rainfall are determined by remote sensing. As a result, TRMM and GPCP are likely to be reasonably independent of AWAP. We compare the different datasets for the 16-year period, 1998–2013, as this period is common to all datasets (e.g., TRMM was only available from 1998).

In addition to the above, we calculate six daily precipitation grids using different interpolation schemes and based on point-based observational data derived from the Global Historical Climatology Network Daily (GHCN-Daily) archive [24, 25], a database of daily weather and climatic information from land-based stations across the globe. We do this to try to understand how uncertainty might manifest through the use of common interpolation techniques and understand how this uncertainty differs in comparison to the merged products. In addition to these six interpolated datasets we also calculate a dataset, Grid [Station] Average (GA), which is simply an unweighted average of all stations inside a grid cell, without interpolation. This dataset provides a reference for comparison to all of the nine other datasets.

Across Australia, GHCN-Daily currently contains daily precipitation values for 8361 stations (Figure 1) over the period of 1998 to 2013 of which only a subset of stations (4500 to 6500, depending on the day) were used for gridding each day. This is because the number of operational and reporting stations varies over the period of 1998 to 2013 (data downloaded in January 2015). GHCN-Daily data for Australia were interpolated on a regular 0.5° × 0.5° grid covering Australia. A comparison of GHCN-Daily stations with the stations used by the merged datasets TRMM and GPCP showed that on average there were approximately 1000 additional GHCN-Daily stations available, with the most prominent differences around the north and northwest coastlines (not shown).

Note that although all datasets represent cumulative daily precipitation over a 24 h window, the start and end times of this window vary. For example, the time coverage of observations for AWAP starts at 0900 h local time in Australia on the day before the time stamp to 0900 h local time on the day named. For TRMM and GPCP on the other hand, the time coverage starts at 2230 h GMT on the day before to 2229 h GMT on the day of the time stamp. This is equivalent to 0830 h Australian Eastern Standard Time (AEST) on the day of the time stamp to 0829 h AEST on the day after the time stamp in Sydney, NSW. This time difference varies from summer to winter due to daylight savings across Australia and the use of varying time zones in the country. These time differences were considered before calculating correlations by shifting the TRMM and GPCP data by a day to match as close as possible the time intervals for comparison with the gauge-based grids.

Also note that AWAP takes advantage of anomaly based interpolation rather than interpolating the raw gauge observations directly. Anomalies are calculated relative to daily climatologies. This has the advantage of transforming the highly varying observations into a set to which a smoother curve can be fitted [26]. Jones et al. [26] also note that the climatology provides more information than contained in individual observations. Furthermore, they note that anomalies are weakly related to elevation making it more suitable to interpolate station data using a two-dimensional analysis.

Finally, although our comparisons give an indication of the skill of various datasets in capturing daily extremes, we acknowledge that the merged datasets are developed with the intention of preserving the mean behaviour, rather than extremes. For example, GPCP 1DD preserves the total monthly precipitation derived from a combined satellite-gauge analysis [27] by multiplying a single conditional rain rate, , by the 3-hourly fractional occurrence of rain in a day derived from infrared radiometers on geosynchronous satellites. Throughout this study, we mention numerous reasons for the differing behaviour of precipitation estimates from merged datasets, all of which contribute to the uncertainties in the merged dataset extremes.

2.2. Interpolation Methods

The GHCN-Daily data was interpolated using six interpolation methods that were chosen because they are widely used in climate or environmental modelling and easily available with commonly used software packages such as NCAR command language (NCL), R, Climate Data Operators (CDO), and Python. The implementation of these methods often follows the default setup that is common in many standard software packages. This is done as people frequently use the default case. Regardless, “optimum” griding solutions often only consider mean values rather than extremes. Thus, our attempt here is to include some uncertainty in the results as might be present in a non-optimised implementation of a gridded product (such as not taking account of distribution tails). The methods are described as follows.

2.2.1. Inverse Distance Weighting (IDW)

This is a popular interpolation method that has been used in many environmental applications (e.g., [2832]) and it is applied as follows: let be a set of observed values at locations . Then, the predicted value, , at a location is given bywhere is the weight given by . Here, represents the Euclidian distance between and . In general, the distance in the weighting parameter is raised to a power, , where [28]. In our study, we use . This results in a higher weight for stations closer to the point of interpolation than the ones further away, thus accounting for the localized nature of precipitation. At any given point of time, the total number of gauge observations is significantly higher than nonzero gauge observations. As such using a weighted average of all stations would lower the predicted values of the wet locations greatly and conversely increase the estimate of precipitation in dry regions to a nonzero value. This was avoided by introducing a cut-off radius of 1.5 degrees latitude around the interpolated point, beyond which the weights were set to zero. This value was chosen as it is small enough to avoid stations from different climatic regions influencing grid values in regions of different precipitation climatology (e.g., tropics versus desert).

2.2.2. Cubic Spline (CS)

The Akima spline interpolation method is implemented as per [33]. The method is a bivariate generalisation of the cubic spline interpolation method described in [34]. This is the most common cubic spline method implemented in R. Cubic splines are very common in modelling climate variables [29, 30, 35]; “thin plate splines” and “spline with tension” only differ from Akima splines by the smoothing algorithm used (but exhibit similar properties overall) and are commonly available in climate-specific packages such as NCL and CDO. The method is as follows. The - plane is triangulated based on the input data points and for each triangle a fifth degree polynomial in and is determined: The 21 coefficients are obtained by deriving the function values and first- and second-order derivatives at the vertices of the triangle. This involves fitting a cubic polynomial to the function values and deriving the partial derivatives of the function differentiated in the direction perpendicular to each side of the triangle [36]. This method yields a result with the accuracy of a cubic polynomial [33]. Unlike IDW and other weighted average methods, cubic spline interpolation is capable of extrapolating to values higher than those observed and characteristically does so especially around observations with steep gradients. As a result, it generates unreasonably high extrapolated values in some regions and negative precipitation in some other regions. This was mitigated by interpolating instead of .

2.2.3. Triangulation with Linear Interpolation (TLI)

This method is analogous to cubic spline interpolation and even uses the same function as CS above, as described by [33]; however, instead of fitting cubic splines, it performs linear interpolation between the triangle’s vertices. This is an exact interpolator and is capable of handling steep gradients. The returned function, however, is not as smooth as CS interpolation.

2.2.4. Ordinary Kriging (OK)

Ordinary kriging and the following two methods are weighted average methods similar to IDW; that is, the predicted values are defined as a weighted average of observed values. Ordinary kriging and other variants of kriging (universal kriging [6, 37], cokriging [37], indicator kriging [6], simple kriging [37], etc.) are all stochastic methods; that is, they incorporate the concept of randomness in the spatial processes [28]. Ordinary kriging is commonly available in high level GIS software packages [28] and Stahl et al. [31] and Jarvis and Stuart [35] among others have used it to grid daily air temperature. The weights for kriging depend on the spatial correlation between the data, which is estimated by a theoretical semivariogram (a function which indicates the degree of spatial dependence). The theoretical semivariogram is obtained by fitting a nonlinear function (model) to the experimental semivariogram. This experimental semivariogram relates the average variance in observations to distances between the points of interest and surrounding gauges and is defined byHere, is the set of all distances, is the Euclidean distance between and , and and are the observed values. In this study, we have fitted the semivariogram to a spherical model. The characteristic trait of a spherical model is that, beyond a decorrelation length scale, the variance remains unchanged. This is suitable for precipitation systems, which are often highly localised.

2.2.5. Natural Neighbour Interpolation (NNI)

This is another weighted average interpolation method (e.g., [6, 38]), where the weights are calculated as follows. Let be a polygon around the gauge location such that any locations inside are closer to than any other gauge locations. Such polygons are known as Thiessen polygons. Let be a set of such polygons for the gauge locations, , and let be another set of Thiessen polygons for the set , where is the location where interpolation needs to be performed. Then, the set of weights, , are defined by the ratio:

2.2.6. Barnes Objective Analysis (BOA)

BOA is another popular weighted average method (e.g., [8, 39, 40]) where the weights, , are calculated usingwhere is the Euclidean distance between the point of interpolation and the gauge locations , and is the length scale parameter that governs the rate of fall-off of the weights. Subsequently, two more “passes” were performed, using the slightly modified formula for predicted values and successively smaller values of :where is the predicted value after the previous pass. We chose a three-pass scheme as it has been shown to produce a more accurate analysis compared to a two-pass analysis while a four-pass scheme shows only marginal improvement [41]. For each subsequent pass, is reduced by a factor of . The permissible values for according to Barnes [42] are between 0.2 and 0.4 while Koch et al. [39] define these in the range of 0.2 and 1. A smaller produces a more detailed analysis; however, the level of detail is limited by various factors such as density of observations, errors in data, and the presence of subgrid scale atmospheric processes [39]. Our is in agreement with Weymouth et al. [41] who also grid daily precipitation for Australia using BOA. Based on Koch’s analysis and setting to 0.3, the original value for the length scale can be derived fromwhere is the average separation between stations. Assuming a regular grid, this separation can be calculated by which results in a value of approximately equal to 50 km. We have chosen, however, a more generous value (as in [41]) of 1° latitude accounting for the irregularity of the station network, especially the large data-sparse regions in the centre of Australia. Another practical reason for choosing a larger is that it avoids excessively large regions with missing values in the centre of Australia. As per our settings, these missing value regions are comparable in size and shape to AWAP (see Figures 2, 3, and 5).

The purpose of this study is not to find the best method of interpolation for gridding daily in situ precipitation observations or even to optimise the methods described here to find the most accurate or consistent daily gridded dataset. This is primarily due to the fact that the “true” values of area-averaged daily precipitation are unknown across space, since we do not have observations at all locations. We simply aim to quantify the envelope of uncertainty resulting from common choices made in interpolation techniques, using methods that are readily available in common software packages. We note that there are indications for which methods can be further pursued and optimised, as well as indications for which methods can be ruled out.

The next section delineates the results from comparing all of the ten daily precipitation datasets for Australia. Where correlations are shown, they were calculated using the Spearman rank correlation function to account for the non-Gaussian nature of daily precipitation.

3. Results

Figure 2 shows maps of average annual precipitation total over the period from 1998 to 2013 for the ten datasets compared in this study. Broadly, the spatial patterns of precipitation from each product agree reasonably well and appear “realistic.” That is, high annual rainfall is observed primarily in the tropical north and Tasmania with reasonably high amounts on the coastlines, especially the east coast. All datasets show that precipitation is concentrated in the north, east, south-east, and a small region around Perth in south-western Australia. The datasets also well represent Australia’s arid and semiarid regions in central and western parts of the continent.

There are, however, some noteworthy differences. Due to the lower density of stations, AWAP masks out the precipitation estimates in some parts of western Australia. Due to its cut-off radius, IDW produces regions of missing values similar to AWAP and BOA, whereas the other gridding methods return values in the data-sparse regions. Without interpolation, however, spatial coverage would be substantially reduced, as can be seen from the GA maps in Figures 2 and 3. The merged products also provide data everywhere. Precipitation estimates from CS are unreasonably high in the data-sparse parts of central western Australia, since this is the arid region of Australia. Ordinary kriging also returned negative daily precipitation estimates in 24% of cases, of which less than half a percent were under −1 mm. The negative values greater than −1 mm have been set to 0 mm throughout this study while the remaining negative values have been masked out. The negative precipitation values occurred in all regions and no particular areas were affected more severely than the others. These negative values are an outcome of the negative weights in kriging. Kriging allows the weights to be negative since the only condition on the weights of kriging is unbiasedness; that is, the weights must add up to 1. Methods to deal with negative weights exist [43] and have been used in other studies [31].

The annual maximum daily precipitation for all datasets averaged over the period of 1998 to 2013 is shown in Figure 3. All datasets follow a similar spatial structure to the average precipitation shown in Figure 2, that is, wetter annual maxima in the north and the east coast, gradually decreasing towards the south. The lowest annual maximum precipitation is observed in the regions around the Great Australian Bight, in South Australia and parts of western Australia, and in all but the CS dataset. This region represents the “extensive arid core” of the country and furthermore much of South East Australia experienced an extended dry spell from 1996 to 2009 [44]. Note that although all but the CS datasets show similar spatial distributions of precipitation on a continental scale, they vary in their patterns on regional scales. The varying distributions of the driest regions around the Great Australian Bight are a good example of this. Larger differences, however, can also be found, most notably in the CS and OK datasets in comparison with the remaining datasets. The maximum precipitation from the CS dataset is 424 mm at its maximum, which is four times higher than the other datasets at the northern coastline of Australia. The highest annual maximum of GA at the same location is 109 mm. Comparison of CS and OK with GA indicates that CS struggles in regions of low station density, whereas surprisingly OK underperforms on the east coast where the station density is high.

In the regions around Sydney, on the east coast, and going slightly inland, OK’s annual maximum rainfall is above 500 mm at a few selected points (9 in total) reaching 1240 mm in one case. These numbers are an order of magnitude higher than the other datasets as seen in Figure 3 (the averaged annual maximum of GA is 132 mm at the grid location where OK records 1240 mm). Furthermore, the GPCP annual maximum precipitation on the northern coastline of Australia is lower than the other datasets. This may be due to the fact that the original GPCP 1DD dataset has a resolution of 1° × 1°, which was regridded, albeit conservatively on a grid. Another reason for this could be that Huffman et al. 2001 remove the highest 10% of conditional rain rates in order to remove outliers, in the process probably also removing real precipitation values [21].

To quantify the differences and agreement in temporal variability of precipitation between various datasets, we investigate local correlations between the different datasets (Figures 4 and 5). Figure 4 shows the correlation of the daily precipitation time series over the period of 1998 to 2013, for each grid cell comparing AWAP, TRMM, and GPCP. In Figures 4 and 5, correlations are only performed at grid locations where at least 90% of time series is common between the two datasets. This means that correlations are calculated at each grid point on at least 5258 data values, accounting for daily values over the 16-year period. We note that all three datasets are highly correlated, although the correlations between AWAP and TRMM and TRMM and GPCP are slightly higher than the correlations between AWAP and GPCP. It comes as no surprise that the correlations of the merged datasets with AWAP are lower in central Australia since the density of observations is lower in this region. Lower correlations are also seen in the region around Lake Eyre between all datasets including TRMM and GPCP. Lake Eyre is southeast of Alice Springs (see Figure 1). As noted earlier, this region is one of the driest regions in the country. Furthermore, the correlation between the point-based AWAP and the merged datasets is higher in the north and gradually decreases moving south, exhibiting a somewhat banded structure. Remote sensed signals are often weaker at high latitudes related to attenuation issues due to shallowness of the angle of measurement between the satellite and the surface. This could therefore account for this banded correlation that we see between TRMM and GPCP. Furthermore, it is also well known that both TRMM and GPCP miss light rainfall events [45, 46]. Since northern Australia receives higher rainfall than the South (Figures 2 and 3), the uncertainty in GPCP and TRMM’s rainfall estimates may also increase towards the South.

Figure 5 shows correlation maps of the daily precipitation time series over the period of 1998 to 2013 at each grid location, between AWAP, TRMM, GPCP, and the seven new gridded datasets of GHCN-Daily data created for the purpose of this study. We immediately notice that the new gridded datasets show higher correlation with AWAP than with the two merged products, TRMM and GPCP. The mean correlations between the six new interpolated datasets and TRMM are in the range of 0.37 and 0.58, and the mean correlations with GPCP are in the range of 0.30 and 0.51; whereas, the mean correlations between the interpolated datasets and AWAP are in the range of 0.58 and 0.86, with 4 of them above 0.80 (IDW, TLI, NNI, and BOA). The mean correlation between CS and GPCP is 0.30 and the mean correlation between CS and TRMM is 0.37. Reminiscent of the correlations between AWAP and the merged datasets, the correlations of the six new interpolated datasets with TRMM and GPCP also show a banded structure; higher correlations in the north and lower correlations in the south. Moreover, similar to AWAP, the correlations with TRMM are also higher than those with GPCP. Hence, we conclude that all gridded datasets, whether interpolated from in situ data or remote sensing based, are broadly similar in terms of their temporal variability, although the correlations also point to structural uncertainties between the interpolated fields on the one hand and the satellite-based datasets on the other hand. We note that the annual seasonal cycle of precipitation may contribute to the positive correlation; however, a separate investigation proved that this effect is not very strong. The mean correlations with the annual cycles removed (not shown) were not more than 0.05 lower than the mean correlations with the annual cycle. More importantly, we believe we do not have a large enough sample size in this study to calculate the annual cycle and anomalies properly. Finally we note that, the correlations of GA are similar to those of the interpolated datasets. Note, however, that the mean correlations of GA were calculated on a different domain compared to the other datasets due to the sparser spatial coverage.

Despite the broad similarity in the general variations of daily precipitation, we discovered that the extremes derived from these can be quite different. To demonstrate this, the daily precipitation quantiles of five stations (see Figure 1), representing different climatic zones in Australia, were compared to the quantiles of the corresponding grid box from the various gridded datasets (Figure 6). We see that the quantiles of all datasets, up to 20 mm, are similar to the gauge observations, as they lie on or near the 1-1 line, shown in black. Subsequently, quantiles related to higher precipitation amounts from all datasets increasingly diverge from each other, with the maximum variance observed in the highest quantiles for all five stations. In the case of Cairns located in tropical Northeast Australia (see Figure 1), where maximum precipitation reaches 250 mm, all datasets, with the exception of CS interpolation, sit around 100 mm. This may be an outcome of higher precipitation in Cairns and its immediate vicinity compared to the surrounding areas (both the annual average precipitation total and the maximum daily precipitation of the grid cell representing Cairns in Figures 2 and 3 are higher than the surrounding grids). It must be noted that we are comparing a point to a grid; so values lower than station values in the gridded products are expected due to averaging daily totals from several stations. This varying interpretation between area and point value is commonly known as the problem of change of support in statistics [47]. Assuming a grid box value is representative of the (average) precipitation in the area of the grid box, a reasonable expectation would be that the gridded value would be close to the average of the different station values, if stations were sufficiently dense and well-spaced. Since stations, however, are often sparsely and irregularly distributed and precipitation fields are often spatially discontinuous, the average of all stations may not represent the actual grid box average. Since we cannot know the true average over a grid and hence its extreme values, we aim instead to illustrate the envelope of uncertainty between the gridded datasets pertaining to the extreme daily values. Finally, we see that TRMM overestimates the high rainfall percentiles around Perth (see Figure 1) by a third relative to AWAP; however, GPCP does not. Based on Figure 3, we see that both merged datasets overestimate the maximum annual precipitation compared to the interpolated datasets in Southwest Australia. In case of GPCP, however, this inconsistency is not as spatially far reaching as Perth.

As an additional quantification of performance, we check how well the different gridded datasets preserve the area without rain on each day, using the simple station average per grid box (GA) as a reference. Note that none of the gridding methods have an explicit criterion to preserve this quantity. We test the percentage of days when the total number of dry grid cells (i.e., daily precipitation below 1 mm) is conserved within 10% of the GA dataset (Table 1). This test was only performed on the cells where GA had nonmissing values. BOA and NNI preserve the total number of dry grid cells on more than 99% of days. In contrast, AWAP and GPCP preserve the dry area on less than 45% percent of days.

4. Discussion

Merged products portray a similar climatology for Australia compared to the gridded datasets obtained by interpolating station data (Figures 2 and 3); however, the correlations between the gridded station-based datasets were lower with TRMM and GPCP than with the in situ-based AWAP. The obvious reason for this is the structural uncertainty in the two different types of datasets; one is created by interpolation of point-based data while the other involves deriving precipitation estimates from observables related to precipitation, instead of measuring daily cumulative rainfall directly. For example, both TRMM and GPCP miss many light rain events which may account for the lower correlations between AWAP and the merged datasets over the most arid regions of Australia [45, 46]. Infrared (IR) estimates can also experience false positives from non-precipitating clouds and experience the problem of attenuation of signals at higher latitudes [21]. Furthermore, in order to remove unrealistic values (outliers), the top 10% of conditional rain rates from the IR estimates are removed in the production of GPCP. All of these issues with merged satellite-gauge datasets can result in inconsistencies when comparing with gauge-based datasets on daily timescales. We note that the authors of these datasets are aware of their lack of skill when compared to gauge datasets on a daily scale [21, 48]. The intention of releasing the daily satellite-gauge analysis is to offer users the flexibility to compute temporal aggregates according to their own requirements [21].

Structural differences between similar datasets, such as between merged satellite-gauge products or interpolated datasets, may also exist. For example, as seen throughout this study, regional differences are present between datasets created with different interpolation methods. Furthermore, satellites make indirect measurements which are converted into precipitation estimates. For example, satellites measure the brightness temperature of the microwave and infrared signals, and radar signals measure the amplitude of the radio waves scattered off the clouds and rain drops. These measurements are related to precipitation amount via complex relationships and the variation in the algorithms used by TRMM and GPCP to estimate these relationships and combine them may also contribute to the difference between them. For example, the primary source for GPCP is infrared radiometers on geosynchronous satellites (geo-IR) [21], whereas TRMM uses microwave data from multiple low earth orbit satellites and uses the geo-IR data to fill in the gaps in coverage [17]. Note, however, that both datasets use identical methods to create a monthly combined satellite-gauge analysis [27] and use a similar algorithm to rescale the daily estimates based on the monthly analysis [17]. Microwave radiometers also tend to be more accurate than IR data when predicting precipitation, which may account for the slightly better correlations between AWAP and TRMM compared to correlations between AWAP and GPCP.

Caution must also be exercised when comparing datasets with different timings of observations, especially on a daily scale when differences in time zones may decrease the overlap between the respective datasets. In this study, shifting the TRMM and GPCP data a day backward relative to the interpolated datasets resulted in (at best) a 23.5-hour overlap between the datasets over Sydney. This overlap would reduce by 2 hours in Perth and change by an hour depending on the use of daylight savings in the region. Furthermore, the inconsistencies in the grid locations and resolutions also be a source of difference. This is seen in Figures 3 and 6 where the GPCP daily estimates, regridded from resolution to 0.5° × 0.5°, are lower compared to the other datasets. All of these possible reasons for differences mentioned above act as caveats when performing comparisons between various datasets on a daily timescale.

The comparison of the new interpolated datasets with established datasets provides an indication of the suitability (or lack thereof) of various schemes to interpolate daily precipitation. Throughout this study, CS interpolation has consistently returned overestimates of high precipitation quantiles in some regions and shown lower overall correlation with AWAP, TRMM, and GPCP. This is because CS interpolation characteristically overestimates the interpolated values, especially when interpolating noisy data with steep gradients. As such CS interpolation may be more suitable to regularly spaced, dense data with smooth changes between neighbouring observations and is often used for regridding fields from climate models [49]. It is also known that temporal averaging minimises this small scale variability of daily precipitation while reducing the effect of random observational errors on the interpolation [26]. As a result, spline interpolation with a smoothing algorithm such as thin plate splines (TPS) is popular for interpolating monthly data [26, 30, 35].

Ordinary kriging on the other hand assumes some partially correlated structure over space, which is useful for applications in geology and mining [50]. Mines are spatially slowly varying, smaller in area compared to the Australian continent considered here, and the density of observations within the decorrelation length scale is usually high. On the other hand, the characteristic length scales of precipitation systems are much smaller than even, in many cases, the distance between neighbouring gauge observations. This strong spatial variability of precipitation complicates the process of semivariogram fitting, making automatically fitted semivariograms (as in this study) less skilled. It is possible, however, to conduct more complex semivariogram estimation such as 3D semivariograms or anisotropic semivariograms [51, 52]. Multivariate kriging methods which use external data such as radar data to calculate semivariograms can also be employed to improve the estimates [6, 53]; however, Dirks et al. [54] discovered that, for high density networks, kriging does not improve upon simpler methods such as IDW. Furthermore, kriging is not an exact interpolator, which results in unrealistic maxima on the east coast in our study (Figure 3). Another disadvantage of kriging is that it assumes (often falsely) that the interpolant is a random process that is stationary in space. This can sometimes be corrected by using different size and shapes of the search neighbourhood which in turn creates complications in stitching the analyses together so as to avoid discontinuities. Finally, kriging is a computationally taxing scheme and took up to five times longer than the other methods in our case. This may be an important factor to consider if the aim is to generate near real time analysis or incorporate much larger data networks. There are many variants of kriging, with each purpose designed to improve some aspect of the analyses. Our intention is to investigate these methods further to understand how extremes can be preserved in the interpolation process.

The remaining methods (IDW, TLI, NNI, and BOA) exhibited high skill in interpolating daily gauge data as they all showed high correlations with AWAP and correlations similar to AWAP with the merged datasets. All methods with the exception of TLI are weighted average methods similar to AWAP which in part may explain the similar correlations; however, all methods including AWAP modelled the extremes disparately as seen in the Q-Q plots (Figure 6), adding to their uncertainty. We note that the weighted average methods cannot shoot above the observations and neither can TLI as it is an exact interpolator. The advantages of these methods are that they are highly computationally efficient and handle irregular and noisy data well, as highlighted by this study.

The interpolation for AWAP was done with Barnes objective analysis as well [26]. There are, however, differences between the new BOA dataset and AWAP (Figure 5). This may partly be due to the difference in the parameters used to determine the weights. More importantly, differences also arise due to the application of anomaly interpolation for AWAP, as mentioned in Section 2.1. The exact effect of using anomalies for gridding daily precipitation is still unclear, particularly with regard to extremes, and shall be another focus for further research. Ultimately, despite not interpolating anomalies, natural neighbour and linear and inverse distance weighting interpolation give near identical results to AWAP.

5. Conclusions

We compare a range of gridded daily precipitation data for Australia over the period of 1998 to 2013. Three existing data products, AWAP, TRMM, and GPCP, which represent in situ and merged in situ/remotely sensed products, respectively, were compared with six datasets that the authors calculated using in situ observations with six different interpolation techniques. For comparison, we also added a grid that simply averages all station values within a grid box, without interpolation. All datasets show similar spatial patterns of precipitation climatologies, with the highest precipitation amounts occurring in the tropical north of Australia and along the east coast and the lowest amounts in central western parts of the continent.

In terms of temporal variability, there is generally good agreement between most of the different datasets interpolated from station observations; albeit there are regionally larger differences (lower correlations) for grids interpolated by ordinary kriging and cubic spline interpolation. Correlations are low between gridded observations and the merged products, but the two merged datasets are positively correlated with each other.

The local distributions of precipitation are very similar between all datasets for low to moderate daily precipitation amounts of up to approximately 20 mm. Larger differences, in some locations of up to a factor of five, are found for the most extreme daily precipitation amounts. This has implications for investigations of precipitation extremes in terms of sensitivity of the results to the specific datasets used.

There is no gridding method that consistently performs closest to station observations, but in particular cubic spline interpolation shows a tendency to “overshoot” in comparison to station data and the other gridded products, in particular, in regions with steep spatial gradients of precipitation.

Our results indicate that all datasets are able to simulate the broad rainfall patterns across Australia. It is clear that the merged products are as similar to each other as are the datasets interpolated from in situ data but the two types of datasets (in situ-based and remotely sensed) are structurally dissimilar on daily timescales particularly outside of the tropics. Our study shows that daily precipitation extremes are subject to large uncertainties across all gridded products analysed here, irrespective of whether the datasets are derived from in situ data or also contain remotely sensed data. Our recommendation to those investigating precipitation extremes is that a combination of different methods and products should be used in order to capture the level of uncertainty that clearly exists at these higher quantiles. Using a single dataset would not faithfully capture this uncertainty creating false confidence in conclusions regarding the extreme end of the distribution.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This study was supported by Australian Research Council (ARC) Grant CE110001028 and contributes to the World Climate Research Programme (WCRP) Grand Challenge on Extremes. Markus Donat was also supported by ARC Grant DE150100456. Analysis was performed using Climate Data Operators software (https://code.zmaw.de/projects/cdo), R [55], and NCL [56].