Abstract

Lightning is one of the most spectacular phenomena in nature. It is produced when there is a breakdown in the resistance in the electric field between the ground and an electrically charged cloud. By simple observation, we observe that precipitation, especially the most intense, is often accompanied by lightning. Given this observation, lightning has been employed to estimate convective precipitation since 1969. In early studies, mathematical models were deduced to quantify this relationship and used to estimate precipitation. Currently, the use of several techniques to estimate precipitation is gaining momentum, and lightning is one of the novel techniques to complement the traditional techniques for Quantitative Precipitation Estimation. In this paper, the authors provide a survey of the mathematical methods employed to estimate precipitation through the use of cloud-to-ground lightning. We also offer a perspective on the future research to this end.

1. Introduction

The estimation of precipitation is important because of its impact on several aspects of human life [15]. Depending on the spatial and temporal scales, the accuracy of Quantitative Precipitation Estimation (QPE) can impact several fields such as hydrology, water resources, agriculture, natural hazards, drought, climatology, meteorology, among many others [2, 3, 6, 7].

Rain gauges and radars have traditionally been used to monitor and estimate precipitation [811]. Both instruments have their own strengths and weaknesses and these have been documented in a number of authors [6, 1218], among many others. As a result, researchers and agencies have opted to combine these two measurements in order to get more accurate QPE [6, 811, 19]. Multisensor precipitation estimation (MPE) consists of merging network radar data with gauge bias correction [6] plus model inputs and quality controlled data. Examples of these data are the stage IV data [20] and NMQ precipitation data [21], among others. Although the MPE represents progress in estimating QPE, the accuracy of these estimations, especially in mountainous regions, has been questioned [22].

Satellites data also has been proposed as an alternative to complement ground-based MPE. It has almost global coverage [7, 23], can estimate precipitation at relatively high time and space resolution, and is not affected by topography as are other observational platforms [5, 24]. Currently, in fact, there are some data products that combine ground-based and satellite precipitation [25, 26], such as PERSIANN [27], CMORPH [28], and TMPA-RT [29], which actually represent an improvement in monitoring precipitation [25, 26]. However, satellite data suffers of its own limitations [7]. Reference [7] evaluated parameters like bias, probability of detection, and false alarm ratio, among others. They concluded that satellite precipitation products presented poor results in cases when high precipitation events associated with complex terrain occur. Coincidentally, in many places of the world, these kinds of events occur over mountainous zones where ground base sensors do not have spatial coverage (e.g., southwest USA) [22, 30]. Furthermore, satellite data latency may be inadequate for real-time forecasting.

Considering that the problem of sensor coverage is recurrent, even with multisensor QPE products (either ground or space-based), other alternative techniques to address this problem are required.

Figure 1 shows the precipitation distribution of a thunderstorm over Mid Land, Texas. It is clear that regions with higher precipitation coincide with those with more lightning events (dots); based on this kind of observations, the relationship between cloud-to-ground lightning and convective precipitation has been proposed since the late 1950s to complement QPE estimations [3135]. Many authors have demonstrated empirical correlation between these two variables [3236]. In contrast with the use of radar, making use of the Lightning-Precipitation Relationship (LPR) has the advantage that, with a proper network, it can estimate precipitation in real time at high resolutions without spatial coverage problems such as mountainous regions with sparse data and weather radar blockage [22, 34]. However, this potential has not been fully explored and most of the operational QPE methods limit the use of lightning only to locate thunderstorms [37].

When studying precipitation, it is appropriate to focus on the basic types, convective and stratiform. Convective precipitation occurs predominantly in the form of localized rain showers and thunderstorms and may have greater intensity. Stratiform precipitation tends to be of larger-scale and long-lasting, with lesser intensity. Disaggregated precipitation data can be used for parametrizing climate models which simulate convective and stratiform (large-scale) precipitation separately [38], as well as for analyzing climate change effects on precipitation patterns. While the recorded precipitation is often a mixture of both, convective precipitation rates are generally much higher [39]. There are a variety of algorithms to discriminate convective precipitation through subjective methods or through more sophisticated techniques (e.g., radar reflectivity [40]).

There is scientific evidence that the occurrence of lightning is linked with convective precipitation. This can be physically explained by the fact that cloud electrification processes need the presence of supercooled water, ice particles, and larger heavier graupel to coexist in a region experiencing moderately high updraft velocities [36, 41]. Observational experiments (e.g., [4246]) and some modeling studies (e.g., [4749]) have established that the rate of all lightning discharges (intracloud and cloud-to-ground flashes) is strongly correlated with the rate of cloud electrification, and the latter is controlled by the convective cycles, the updraft mass flux, and the mass of ice-phase precipitation. The storm severity or the updraft intensity may not be correlated with the rates of CG flashes; however, they are correlated with the formation of precipitation and its descent to lower levels of the storm [46, 50].

On the other hand, lightning data have also been used in models as a proxy for deep convection to improve parameters related to cold clouds and precipitation [5153]. Lately, researchers have investigated the utility of lightning observations for identifying convective events at several different spatial scales for which they conclude that the assimilation of lightning favorably impacts several model variables as well as initial conditions [54, 55].

In view of the scientific evidence, lightning based methods have been proposed as a complement or alternative for QPE [22, 3235]. A number of studies have demonstrated the relationship between cloud-to-ground (CG) lightning and convective rainfall [22, 3236]. More recently, in [5], the authors demonstrated the value of combining satellite infrared and lightning information to estimate rainfall, at least for large time periods (hours to days).

In practice, many regions of the world have ground lightning detection networks [5661]; ground global detection networks are emerging and growing, networks such as WWLLN [62] and GLD360 from Vaisala Inc. [63] are such examples. Furthermore, NASA is launching a Geostationary Lightning Mapper (GLM) which will detect total lightning (intracloud and cloud to ground) over the western hemisphere. With the aforementioned technological infrastructure in addition to the demonstrated weaknesses of the actual MPE methods previously discussed. We believe that the use of lightning based methods as a complement to convective precipitation estimation and the integration of this estimate to the current MPE may improve the rainfall estimations, warranting a further discussion of such methods.

This paper attempts to contribute to the discussion of the methods to estimate QPE by using lightning. We present a brief survey of the mathematical models used over the years and provide an overview of future research geared towards development of a system that may provide a basis to incorporate lightning-derived precipitation into the current multisensor precipitation products.

2. Estimation of Lightning-Derived Precipitation

2.1. Simple Models

Let be a spatial domain or region, divided into a gridded partition, where is the grid located in the th row and th column of grid partition. Let be a location in and the rainfall in a fixed time interval in location at time . The QPE problem resides in estimating the spatial precipitationwhere is the area of the grid .

In much of the previous work investigating the LPR, the mathematical model was a simple linear least squares regression [35, 36]. In general, linearly estimated precipitation at time and grid can be defined aswhere is a feature vector of dimension at time from lightning observations data and is a vector of estimation parameters.

The feature vector can be calculated in different ways. For instance, a model can be defined by comparing the mean of convective precipitation at every time step with the corresponding lightning accumulation in the same domain. For this case, is established with only one feature as the number of lightning occurrences in grid at time .

In [22], the model parameters are estimated in two ways. First, in order to estimate the model parameters, a vector of mean value of precipitation and lightning is obtained aswhere is the set of grids in the domain with lightning counts different than zero at time and is the area of . The parameters are obtained solving by least squares the equation The model parameters are estimated by comparing the total seasonal convective precipitation accumulation with the total lightning strikes accumulated per grid. At this point, all the research relating LPR had correlated precipitation with discrete lightning counts. In [22] the authors proposed a method to get LPR relationships for higher time and space resolutions by using what they called Gaussian counts (GC). They noted that lightning counts and precipitation are not variables of the same nature; while precipitation is continuous quantity, lightning counts is a discrete variable. This could be a problem when using gridded data because most of the lightning events can fall in different adjacent grid than precipitation and therefore the correlation can be affected. A GC is simply defined as a convolution of every discrete count by a Gaussian distribution assuming an uncorrelated identical variance in latitude and longitude . The lightning Gaussian counts are obtained byThe values of are obtained by a simple numerical integration procedure.

2.2. Power Law Model

Based on the power law relationship that is employed for the relationship between liquid water reflectivity and precipitation rate used to estimate precipitation by radars [32, 33, 64, 65] this model has also been used to estimate lightning-derived precipitation. The model is given by where and are the optimal parameters of the power law. This power law is equivalent to estimating a simple linear model where and . In [22] the authors did not find any benefit in using a power law relationship for a complex terrain domain.

2.3. Space-Time Invariant (STI) Model

Overall, previous results demonstrate that LPR is reliable when one compares relatively large regions and/or longer time periods. However, if this relationship is tested at higher resolutions (such as those of the new precipitation products), lightning events and convective precipitation may not be colocated. On the other hand, some results report a time lag between lightning and precipitation [35, 36, 66]. To address this problem, in [22], the authors proposed a model that considers spatial and time neighbors that will be described next.

Let be the Gaussian lightning counts of spatial neighboring of at time with vicinity. Let be the temporally associated vector of lightning observationswhere nl and pl are the negative and positive time lags, respectively. Using the model from equation (2), the vector is adjusted by least square criteria with all the convective covered grids in the relevant dataset. As will be noted further in Figure 2, the STI model improves the results with respect to the simple linear models because it allows having more parameters that consider the time lag and space relationship of the LPR.

2.4. Dynamical Linear Models

The STI model is a set of linear parameters fixed in time. Physically LPR changes from one storm to another or even within the same convective event. Therefore, a fixed time model may work better when the thunderstorms are close to the average but fails for non “typical storms.” This suggests the need to develop a method to model the LPR changes in time. Letbe model (2) with time-varying model parameters. Assume that the parameters evolve from one time step to the next one aswhere is a parameter estimation error modeled as a Gaussian random variable with zero mean and a covariance matrix .

The estimation should be realized only with the covered grids (grids with valid precipitation observations). Let be a vector of grids with observed precipitation, so is the observed precipitation vector at covered grids, where is the observation noise assumed to be Gaussian with zero mean and covariance . The estimated precipitation of covered grids is obtained from (9) aswhere is the matrix whose rows are the feature vectors for each observed grid.

Equations (10) and (12) define a discrete-time linear system and thus the parameter vector can be estimated by means of a Kalman filter [67] as where is the covariance estimate and is the Kalman gain matrix. At each time step is estimated, and then the precipitation in nonobserved grids is estimated by (9).

In order to apply the Kalman filter for estimation, some conditions must be assumed, such as a zero mean Gaussian distribution of the parameters variations. Also, fixed and known covariance matrices and are usually assumed and computed experimentally. Future research in time-varying STI models and experimental validation is required not only for lightning based QPE but in general for a large class of remote sensing models.

Figure 2 shows the benefit of the fixed STI model and the dynamic Kalman filter estimations. Comparing with the simple linear model, it is clear that the new proposed methods decrease the total mean square error down to half.

3. Geostatistical Approach

3.1. Geostatistical Basic Model

A random field is a set of random variables (precipitation in our case) parametrized by some set where is a spatial coordinate. An extensive treatment of random fields’ theory can be found in [6871].

The construction of optimal predictors on a single and partial realization of a random field is based on some form of stationarity. A random field is called second-order stationary ifwhere is the expected value operator and is the covariance operator.

For processes for which the above conditions do not hold (i.e., covariance function does not exist) another hypothesis is introduced. A random field is called intrinsic stationary if If additionally the covariance depends only on the separation between and , then the random field is called isotropic.

Let be the variance between the precipitation on two spatial coordinates and let be the distance between and . The semivariogram is expressed by means of expected value operator. The semivariogram as a structural function of intrinsically stationary random fields describes a broader class of phenomena, where covariance may not exist. Additionally, semivariogram does not require the mean value of a random field to be known; therefore it became the preferred function of geostatistician. In case of second-order stationary spatial processes, the covariance operator and the semivariogram are related as

The semivariogram is a measure of dissimilarity between a pair of observations. As a function, it provides information on spatial continuity and variability of a random field. The inference on the shape is based on empirical semivariogram and some a priori knowledge of the behavior of a phenomenon. In [72] the author presents three types of models for the fitting of the experimental semivariogram well adapted for precipitation estimation.(i)The spherical model with range : (ii)The cubic model with range : (iii)The dampened hole effect model: where is the distance at which 95% of the hole effect is dampened out.

The most common algorithm of geostatistical estimation is the so-called ordinary kriging which can be viewed as heterogeneously linear estimator [69]assuming is a second-order stationary random field with constant unknown mean value, is the vector of observed data, and is the parameters vectors for the spatial location .

Ordinary kriging is a minimal variance estimator, given by under the condition (15): Hence, the objective function to be minimized through Lagrange multipliers can be expressed asTaking partial derivatives with respect to and , and setting them to zero, provides the system of equations with unknowns:

The ordinary kriging approach is the simplest practical model for geostatistical estimation. There are several modifications of ordinary kriging; some deal with a nonconstant mean like kriging with a trend or universal kriging; some others deal with nonlinear transformations of the random field, such as indicator kriging or log-normal kriging [69, 73, 74]. In general these approaches are straight forward from the ordinary kriging.

3.2. Block Kriging

The ordinary kriging is a punctual estimator, while the QPE is a spatial estimation. Therefore, it is necessary to modify the method. Let be a defined area; then is the area average of over . Intuitively, can be estimated by first estimating for a large number of locations in and then using the average of these values. That suggests a solution to the problem of how to estimate an area average by ordinary krigingand, again, is determined by imposing the same conditions of some kriging approach. If the constraint used in ordinary kriging is applied, then the equations to solve the ordinary block kriging arewhere is the point-to-block average variogram. In contrast to the average of the estimated values in , this has been replaced by averaging values of the variogram. In practice, these point-to-block average variograms are obtained numerically from an empirical variogram on a regular spaced grid in .

It can be shown that this is equivalent to using ordinary kriging in each grid point, thus averaging the estimated values. However, there are several important differences. First, since the distribution of the unknown values is itself unknown, it is not possible to predict the appropriate choice of the grid to obtain a given error tolerance when averaging the kriged estimates. On the contrary, it is possible to estimate an appropriate grid when numerically integrating a known function. Secondly, this approach results in many estimation errors (one for each kriged estimate) and many kriging standard deviations. In contrast, it blocks kriging results in a single kriging standard deviation for a given area and controllable numerical integration errors.

3.3. Kriging with External Drift

Kriging with external drift is a method to merge two sources of spatial information: a primary variable that is precise but only known at few locations and a secondary variable that is available in the spatial domain [75]. The primary variable is considered a random field with a nonconstant mean, but rather depends on the location. In particular it is assumed that is a function of a secondary variable . The simplest representation might bewhere coefficients are assumed unknown. The kriging estimator form would not change from (23). For a the following constraints are sufficient: whereas for the additional constraints would be necessary. To minimize the estimation variance subject to these constraints several Lagrange multipliers would be necessary. Note that (30) might be thought of as a regression model, then the residuals are used to estimate and model the variogram function of .

For QPE several authors have used block kriging with external drift with elevation as the secondary variable. In the reviewed literature, there is a dearth in the research on the use of lightning data to carry out kriging with external drift. Nevertheless, there is physical and modeling evidence that justifies the assumption that the expected value of the QPE is dependent on the lightning values.

4. Probabilistic Quantitative Precipitation Estimation

4.1. Simulation Approach

Quantitative precipitation estimates often have significant uncertainty. Stochastic precipitation models provide an alternative framework for Quantitative Precipitation Estimation [7678]. Most stochastic precipitation models are developed for the purposes of precipitation simulation rather than conditional precipitation estimation. The parameters of such simulation models are estimated from station data, but the temporal variability is generally not constrained to fit station observations.

In [79] a method for conditional precipitation estimation is proposed, based on a locally weighted regression, in which observed grid information are used as explanatory variables to predict spatial variability in precipitation. For each time step, regression models are used to estimate the conditional cumulative distribution function of precipitation at each grid cell and ensembles are generated by sampling to extract values from the gridded precipitation cumulative distribution function.

Based on this idea, one approach to perform probabilistic QPE based on lightning data is to compute an empirical climatological cumulative distribution function of precipitation in grids with observed precipitation and using a lightning based kriging with external drift to estimate a locally weighted regression for each grid. This model will be used to estimate a conditional cumulative distribution function of precipitation at each grid.

4.2. Conditional Random Fields

Another approach to probabilistic QPE is to model the conditional probability distribution function of each grid as a continuous conditional random field [80]. Letwhere are known as feature functions, is the coefficients vector, and is a normalization function The coefficients vector is obtained by maximization of the log-likelihood criteria where is the set of grids in the domain with observed precipitation. In general, to evaluate for inference or optimization, one would need to use time consuming sampling methods such as Markov Chain Monte Carlo-based algorithms. However, in [80], it is argued that there is an efficient algorithm for optimization if the feature functions are defined as where is an indicator function, based on the expert knowledge of a problem. By introducing indicator functions we essentially make a partition of the whole data set of observed precipitations into smaller subsets. For each subset the learning problem is convex and a global optimal solution can be estimated by a EM approach. Estimated represents our belief in in different subsets, corresponding to different prediction conditions.

5. Storm Tracking

The benefit of using a dynamic model to track LPR is evident in Figure 2. However, even in this case, the spatial extension of the domain may contain more than one storm and each of these storms can be described by different models. For instance, Figure 3 shows a case in southern Arizona where it is clear that we have two LPR behaviors, one in the southern domain (non-sensor-covered domain), characterized by larger lightning events and the central north area where there were not many lightning events, but there was intense precipitation. In order to address this problem, one additional improvement to the dynamic model is to develop a method to follow in space and time LPR. Methods of clustering may be employed in order to recognize storms in the spatial domain.

In the clustering problem, we are given a training set (the lightning locations at one interval time in this case) and want to group the data into a few cohesive clusters [81]. Clustering can be achieved by various algorithms that differ significantly in their notion of what constitutes cohesive clusters. Popular notions of cohesive clusters include groups with small distances among the cluster members, dense areas of the data space, intervals, or particular statistical distributions.

Since storms change with time, it is important to capture the main transitions. In recent years, there has been an increasing interest in tracking scenarios in which a very large number of coordinated objects evolve and interact. It should be noted that clusters can be thought of as extended objects that produce a large number of observations. In recent work [82] merging and splitting objects are modeled using point processes. This is a fundamental issue characterizing storm behavior.

Assume that at time there are storms, or clusters at unknown locations. Assume that the storms can be adequately represented by a parametric statistical model . Each storm may produce more than one lightning yielding the realization set , where typically . Let be the lighting location history up to time and its realization. The storm tracking problem could be defined as the estimation of the posterior distribution of the random set of unknown parameters from which point estimates for and posterior confidence intervals can be extracted. In [83] a filtering algorithm for tracking multiple clusters of coordinated objects is presented. The algorithm is based on a MCMC mechanism. A dynamic Gaussian mixture model is utilized for representing the time-varying clustering structure.

6. Conclusions

In this paper, several approaches for estimating QPE from lightning measurements were reviewed. We also reviewed the existing techniques for storm tracking, since these methods can be used in conjunction with linear models, allowing a better parametrization of the models for convective events.

Linear models assume implicitly that the data used to parametrize the model is independent, normal, and homogeneous in variance. The most simple models are suitable for a first approach or when the data are expensive or limited. On the other hand, the STI models provide a powerful tool to easily express a heuristic knowledge about the space-time relation of lightning with QPE. Dynamical STI models allow adjustment of the STI model in response to changes in LPR from one storm to another (or even in the same convective event). However, for these models to be effective it is necessary to carry out an adjustment phase of the tracking parameters, which is critical for the quality of the estimation.

Geostatistical methods generate smooth interpolated surfaces, where the estimation errors depend strongly on the assumed probability distribution, derived from the variogram model. Kriging methods are suitable when there is sufficient data to establish (and statistically verify) a variogram function. On the other hand, discriminant models (such as conditional random fields) require less assumptions about the distribution of the data and the structure of the model, so it is possible to reduce estimation errors. However, discriminative models do not offer clear representations of relations between lightning and convective precipitation. These models are suitable in large regions, with a large amount of historical data.

As mentioned at the end of Section 1, efforts to detect lightning globally with both ground-based and space-based sensors have increased in recent years. An example of this is the Vaisala GLD360 network [63], which is capable of detecting approximately 80% of events occurring all over the planet. On the other hand, NASA’s next mission to put the new generation of GOES satellites into orbit will be launched in late 2017 [85]. This new generation of satellites has a lightning mapper, which implies that the entire network will be able to monitor lightning events at a global level. This implies that, as never before, electrical activity due to thunderstorms will be able to be studied and observed over the entire planet.

This infrastructure represents a great opportunity to investigate the relationship between convective precipitation and the occurrence of lightning at a global level. It Investigates the differences that may exist in LPR depending on the geographic location as well as the nature of the different convective events. The development of new algorithms and mathematical models for the estimation of convective precipitation as well as those that emerge from other investigations will be of great importance to develop systems of prediction of severe storms, to study the physical relationship between LPR for convective events of different nature, or simply to complement existing methods and techniques for estimating precipitation.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.