Abstract

Missing data is an inevitable problem when measuring CO2, water, and energy fluxes between biosphere and atmosphere by eddy covariance systems. To find the optimum gap-filling method for short vegetations, we review three-methods mean diurnal variation (MDV), look-up tables (LUT), and nonlinear regression (NLR) for estimating missing values of net ecosystem CO2 exchange (NEE) in eddy covariance time series and evaluate their performance for different artificial gap scenarios based on benchmark datasets from marsh and cropland sites in China. The cumulative errors for three methods have no consistent bias trends, which ranged between −30 and +30 mgCO2 m−2 from May to October at three sites. To reduce sum bias in maximum, combined gap-filling methods were selected for short vegetation. The NLR or LUT method was selected after plant rapidly increasing in spring and before the end of plant growing, and MDV method was used to the other stage. The sum relative error (SRE) of optimum method ranged between −2 and +4% for four-gap level at three sites, except for 55% gaps at soybean site, which also obviously reduced standard deviation of error.

1. Introduction

Eddy covariance technique to measure CO2, water, and energy fluxes between biosphere and atmosphere is widely spread and used in various regional networks [1]. At present, over 600 tower sites are operating on a long-term and continuous basis around the world, covering different climate conditions and land use and land cover changes, some of them running continuously for more than 10 years (http://fluxnet.ornl.gov/). However, missing or rejected data in these measurements is a unavoidable problem due to equipment failures (system/sensor breakdown), maintenance and calibration, spikes in the raw data, and physical and biological constraints (e.g., storms, hurricanes, and nonoptimal wind directions) [2]. In general, about 17–50% of the observations in net ecosystem CO2 exchange (NEE) are reported as missing or rejected at FluxNet sites [3]. The gaps in observed data cause at least three problems: (1) difficulty in annual estimation of NEE, (2) biased relationships between NEE with climatic variables, and (3) low quality data for modeling validation [2].

To accurately calculate annual values of NEE at sites, gap-filling to account for the missing data is imperative. The commonly used methods for filling missing data include mean diurnal variation (MDV) [3], look-up table (LUT) [3], nonlinear regression (NLR) [35], marginal distribution sampling [6], multiple imputation model [7], artificial neural network [811], and terrestrial biosphere model [12]. This diversity hinders synthesis activities because the biases and uncertainties associated with each technique are unknown [13, 14].

In a comprehensive study, Falge et al. [3] compared three methods including MDV, LUT, and NLR on the annual sum of NEE for 28 datasets from 18 FluxNet sites and found that the differences in annual NEE estimation by different gap-filling methods ranged from −45 to 200 gC m−2 per year. Their study also emphasized the importance of the method of standardization during the data postprocessing phase, so comparable data can be obtained to address intercomparisons across different ecosystems, climatic conditions, and multiple years. Richardson and Hollinger [15] quantified the uncertainties in annual NEE with a simple model using data assimilation techniques that are due both to random measurement error and to gap filling, including the additional uncertainty that can be attributed to long gaps and the relationship between gap length and uncertainty in NEE. The CO2 flux data come from a coniferous, two deciduous, two mixed species, and two mediterranean sites. Moffat et al. [9] reviewed 15 techniques for estimating missing values of NEE in eddy covariance time series and evaluate their performance for different artificial gap scenarios based on a set of 10 benchmark datasets from six forested sites in Europe which is the same with Richardson and Hollinger [15]. Papale et al. [2] introduced a new standardized set of corrections and assessed the uncertainties associated with these corrections for eight different forest sites in Europe with a total of 12 yearly datasets.

However, most comparison works about gap-filling methods were objected to tall vegetation, that is, forests. Less research focused on short vegetation, that is, croplands or marshlands. The vegetation structure for short vegetation changes more rapidly in the growing season, which may affect ability of gap-filling methods. So, it is important to evaluate the performance of gap-filling methods and search the optimum methods for short vegetation.

In this study, we reviewed three methods (MDV, LUT, and NLR) and applied the techniques to a set of benchmark datasets from marshland and croplands (rice and soybean) in China. Artificial gaps were added to observed NEE time series based on Falge et al. [3], and the ability of different gap-filling techniques to replicate the missing data was evaluated using statistical analysis. The objective of this paper is to find the optimum method for short vegetation.

2. Methods

2.1. Data Basis

For this analysis, we used half-hourly eddy flux measurements of the net ecosystem exchange of CO2 from three different ecosystem types. As case studies, we chose CO2 flux data from May to October in 2005 from marshland and agriculture (rice and soybean cropland) sites in the Sanjiang Plain. The marshland site locates at (47°35′N, 133°31′E), the field areas are approximately 105 ha. The rice and soybean site locate approximately 1.5 km west and 500 m north to the marshland, respectively. The field areas are approximately thousands of hectares for rice site and 25 ha for soybean site. The altitude is 55.4–57.9 m. The more detailed information is available in Zhao et al. [16].

The EC system consisted of a triaxial sonic anemometer (CAST3, Campbell Scientific, USA) and a fast response open-path CO2/H2O infrared gas analyzer (Li-7500, LiCor Inc., USA). The meteorological parameters including air humidity and air temperature, wind speed, precipitation, soil temperature, and water content were measured [16]. Raw data acquired at 10 Hz were processed using the postprocessing, including spike removal, frequency response correction [17], sonic virtual temperature correction [18], the performance of the planar fit coordinate rotation [19], and corrections for density fluctuation (WPL correction) [20].

The quality control of the half-hourly flux data was carried out as follows: (i) data from periods of sensor malfunction were rejected (e.g., when there was a faulty diagnostic signal), (ii) data within 1 h before or after precipitation were rejected, (iii) incomplete 30 min data were rejected when the missing data constituted more than 3% of the 30 min raw record, and (iv) data were rejected when the value was larger than mean ± 3 standard deviation. The information of original gaps in NEE measurements is showed in Table 1. The gap percentages in all time were 25.4%, 18.2%, and 20.5% at the marsh, rice, and soybean site, respectively. Gap percentages at nighttime (ranging from 26.7% to 36.2%) were slightly higher than at daytime (ranging from 12.4% to 18.4%) (Table 1).

For this comparison, four artificial datasets were created, containing 35%, 45%, 55%, and 65% of gaps [3]. Based on random function RAND, sets of data with random distribution were generated. The random dataset corresponded to the dataset of NEE measurements except for original gaps. According to the difference from the number of artificial gaps to original gap, a range of certain numbers were selected from a set of random data, then a new gap was generated by deleting corresponding the NEE dataset. Starting from the original gap percentage, artificial gaps were created separately for daytime and nighttime, until the dataset contained a given percentage of gaps at both daytime and nighttime [3]. To avoid underestimation of CO2 flux during calm conditions at night, the friction velocity () was applied at nighttime [21, 22]. The data were rejected at night when the was below 0.10 ms−1. Due to the percentage of filtered data was about 10% at three sites, which caused high percentage of original gaps at nighttime. Therefore, the correction was applied to artificial data instead of original data. The percentage of filtered data ranged from 2.1% to 13.5% in different percentages of artificial datasets (Table 2).

After introducing artificial gaps for each of the four datasets, the respective gap-filling methods were parameterized with the remaining data and applied to fill the artificial datasets. The gap-filling error was calculated using the observed fluxes in these artificial gaps to validate the predictions of each filling technique.

2.2. Filling Methods

Three gap-filling methods were applied here, including mean diurnal variation (MDV), look-up tables (LUT), and nonlinear regression (NLR) methods.

2.2.1. Mean Diurnal Variation

MDV is an interpolation technique where the missing NEE value for a certain time period (half-hour) is replaced with the averaged value of the adjacent days at exactly that time of day. Data windows of 7 days during daytime and 14 days during nighttime were chosen for averaging in the application.

2.2.2. Look-Up Tables

In a look-up table, the NEE data are binned by variables such as light and temperature presenting similar meteorological conditions, so that a missing NEE value with similar meteorological conditions can be “looked up” [3]. Tables were created to represent changing environmental conditions based on monthly period, using the photosynthetic photon flux density- (PPFD-) air temperature- (Ta-) sort during day, and the relative humidity- (RH-) Ta-sort during night. For look-up tables the average NEE was compiled for six monthly periods 11 PPFD-class 36 Ta-classes. The PPFD-classes consisted of 200 μmol m−2 s−1 intervals from 0 to 2000 μmol m−2 s−1. Similarly, Ta-classes were defined through 1°C intervals ranging from −5°C to 31°C. For night day, average NEE was compiled for six monthly periods 8 RH-classes 19 Ta-classes. RH-classes range from 20% to 100% with 10% intervals, and Ta-classes were the same as the daytime.

2.2.3. Nonlinear Regression Methods

The nonlinear regressions are based on parameterized nonlinear equations which express (semi-)empirical relationships between the CO2 flux and environmental variables such as temperature and light.

For filling daytime gaps, the light response function of Michaelis–Menten [3, 23] was selected as follows:where NEE is the net ecosystem exchange (mgCO2 m−2 s−1) and Re is the ecosystem respiration rate (mgCO2 m−2 s−1) during the day. PPFD is the photosynthetic photon flux density (μmol m−2 s−1), and α is the ecosystem quantum yield (mgCO2μmol−1 quantum). is the gross primary productivity at “saturating” light (mgCO2 m−2 s−1). The light response function was fitted with window sizes of 15 days from June to middle September, and the seasonal variation of parameters was showed in Figure 1. The parameters in Figure 1 were calculated according to original NEE datasets before artificial gap introduction. The at the rice and soybean sites ranged from 0.17 to 2.0 mgCO2 m−2 s−1, which were larger than that at the marsh site (from 0.04 to 0.7 mgCO2 m−2 s−1). The Re ranged from 0.05 to 0.37 mgCO2 m−2 s−1 at all three sites, which were lower than the . The correlation coefficients () of observed and simulation value during this period were 0.69, 0.83, and 0.81 () at the marsh, rice, and soybean sites, respectively. For each artificial dataset, the parameters in Michaelis–Menten function were recalculated and applied to artificial gaps.

The net ecosystem CO2 exchange (NEE) at nighttime represents the ecosystem respiration () because of no photosynthesis. The ecosystem respiration () is conceptualized to consist of soil respiration, , and above-ground component attributed to the respiration by various plant components, . For nighttime NEE, the temperature response function was selected based on Wohlfahrt et al. [24]:where is ecosystem respiration at nighttime (mgCO2 m−2 s−1), which includes soil respiration () and plant respiration (). is the respiration rate (mgCO2 m−2 s−1) at a reference temperature (), the reference temperature is 10°C, and denotes an activation energy (J mol−1), the subscript of as represent to the soil components, and the subscript of as represent to the plant components. is the universal gas constant, 8.314 J mol−1 K−1. is air temperature (°C), and denotes leaf area index. The respiration function was fitted for the whole growing season, whose parameters were showed in Table 3. The parameters in Table 3 were calculated according to original NEE datasets before artificial gap introduction. For each artificial dataset, the parameters in temperature response function were recalculated and applied to artificial gaps.

2.3. Error Assessment

To assess the applicability of a standard data filling method at three sites, we examined the potential bias error associated with each method. The bias errors for different methods were calculated as the observed value minus the predicted value for each gap level. For daytime carbon uptake, a positive error therefore indicates an overestimation and a negative error in underestimation by the respective method.

The statistical sums were calculated using the individual observed NEE data and the predicted value , mean bias error (MBE), mean absolute error (MAE), and sum relative error (SRE) were as follows:

3. Results

3.1. Frequency of Gaps

The gap distribution for benchmark sets showed the number of gaps decreased with gap length (Figure 2). However, the majority of 35% artificial gaps consisted of short gaps (less than 10 half-hours), and very short gaps (less than 2 half-hours) were more than other 3 benchmark sets. Though gap numbers of long gaps (more than 20 half-hours) for 65% gaps of artificial data sets were similar to benchmark sets, gap numbers of short and medium gaps were higher than benchmark sets.

3.2. Error Analysis in Half-Hourly Scale

The most frequent distribution of error for gap-filling methods in half-hourly scale was nearly normal distribution (Figure 3), which indicated an unbiased estimate for gap-filling error. The mean and standard deviation of bias error for gap-filling methods were showed in three sites (Table 4). The count for nighttime data was less than daytime data because of -correction. They were not consistent with negative or positive error for three methods or four gap percentage levels. No trends were found that MBE was larger than gap percent levels during day and nighttime at three sites; moreover, the error for high gap level was small inversely, that is, error of 65% gaps with MDV and LUT methods at marsh site at daytime and error of 65% gaps with three methods at rice site at nighttime. For daytime data, standard deviation for LUT method was the largest and for NLR method was the lowest among three methods for each gap level. For nighttime data, standard deviation for MDV method was the largest, especially at soybean site.

3.3. Seasonal Variation of Error

The seasonal patterns of daily MBE, taking 65% gaps level as an example, showed the difference in each site (Figure 4). These patterns were affected by different methods and stage of growth. In general, all methods have good performance before germination or sawing stage and after entirely wilting or harvesting; in these stages, daily MBE for each method was around zero and MDV method with less fluctuation. The daily MBE was large in the peak of growing season because of strong assimilated CO2 ability, and spikes often occurred for MDV and LUT methods in this stage at three sites. This result was agreed with large standard deviation for MDV and LUT in Table 4. The significant difference among the methods in fast growth stage of spring (LAI was rapid increase), that is, late May at marshland site, early June at rice site, and middle June at soybean site (Figure 4). However, MDV method has good performance in this stage.

The variation of cumulative error was shown in Figure 5, taking 65% gaps level as an example. The cumulative error at daytime has stronger fluctuation than at nighttime at three sites, as the results of a little data and small error at nighttime. The cumulative error for three methods has no consistent bias at marsh site (Figure 5). However, positive bias errors were observed for three methods during the day at rice site, and negative bias errors were observed at soybean site. This suggests different methods may cause complicated effects at three sites. The huge bias error for LUT method was observed at each site, especially, in spring of rapid growth stage. The large bias error for NLR method was also observed after August at soybean site, whereas this phenomenon has not occurred in 35% and 45% gaps. Based on cumulative error from May to October, it indicates that MDV method has good performance, especially smooth trend in the end of growing season at three sites. Overall, the cumulative error at three sites ranged between −30 and +30 mgCO2 m−2.

The SRE was showed in Table 5, and it is convenient to evaluate the performance of gap-filling methods and compare it with other sites. In general, the SRE for 35% and 45% artificial gaps filled by three methods was smaller than 55% and 65% gaps at daytime, while these patterns were not marked at nighttime. The gap-filling methods have distinct different performances at three sites; for example, MDV method showed small SRE at daytime over the rice site, while LUT and NLR methods represented well at daytime over the marsh site. The majority of SRE ranged from −10 to 10% during whole day, except for 55% and 65% artificial gaps filled by NLR method at soybean site, this caused by huge bias after August (showed in Figure 5).

3.4. Error Analysis in Gap Size Class

Gap size and distribution were produced in random, whereas those greatly impacted performance of gap filling methods. The colored surface plots are depicted in Figure 6, 65% gaps were taken as an example, which provide a visual means of qualitatively assessing the impact of gap length on NEE uncertainty. For short vegetation, small MAE was expressed in dormant season (early spring and late autumn), regardless of methods and gap length. This was related to the fact that measured fluxes at this stage tended to be smaller. Large errors for three methods were concentrated to the zone of gaps of less than 5 in growing season, especially in stage of growth rapidly of plants. Among the methods, LUT method resulted in the largest error ranging from short to long gaps and then MDV and NLR methods. Though the patterns did not find that MAE for all methods increased with gap length increasing, long gaps added appreciably to the uncertainty of gap-filling (results were not shown).

3.5. Optimum Gap Filling Method for Short Vegetation

Selection of methods was based on the most stable performance and smallest errors; however, according to the above analysis, no one method was perfect during the measuring stage. To reduce sum bias in maximum, combined gap-filling method was selected for short vegetation. NLR or LUT method was used after plant rapidly growing in spring and after end of plant growth, and MDV method was used to the other stage. In this case, based on the growth stage of different vegetation, the gap filling strategies at three ecosystems were showed in Table 6. The SRE of optimum method reduced to the range of −2 and +4% for four-gap level at three sites (Figure 7), except for 55% gaps at soybean site. The optimum method also reduced standard deviation of error that was around 0.07, 0.11, and 0.12 mgCO2 m−2 s−1 at marsh, rice, and soybean site, respectively; there were no significant different within four-gap level.

4. Discussion

4.1. The Response of Error on Environmental and Biological Factors

The performance of gap-filling methods impacted on climatic and biological variables such as PPFD and LAI [6, 23]. The LUT and NLR methods have considered the effect of PPFD, a residual error with NLR method distributed evenly around zero response of PPFD with small magnitude (Figure 8), while residual error with LUT method scattered strongly around zero. This was the reason of high standard deviation at daytime (Table 4). The residual error with MDV method caused positive bias when PPFD was less than 500 μmol m−2 s−1.

The residual errors response of LAI with NLR method has even distribution around zero, and MDV and LUT method have more scatter; moreover, significant negative error occurred around LAI = 1 (Figure 8). The large scatters were showed for MDV method that did not consider LAI when filling gaps. Though the LUT method filled gaps per half month, there was weak relationship between LAI and NEE, especially when LAI = 1 (Figure 8). This result must be expected from potential changes in the ecosystem properties, particularly as related to canopy development and senescence [6, 25].

4.2. The Selection of Gap-Filling Methods for Short Vegetations

In this study for short vegetations, error introduced by gap-filling differed between methods at different gap levels (Table 4). The choice of a technique should be based on the application, Moffat et al. [9] considered NLR method can serve well for an annual sum estimate, but an artificial neural network will best reproduce the half-hourly profile of the flux. Falge et al. [3] also commented on semiempirical methods because they preserve the response of NEE to main meteorological conditions. However, the NLR method in our study has good performance in variation of daily NEE (Figure 4) and caused huge bias in cumulative NEE, especially for high gap level (Figure 5), which can explain that great uncertainty was introduced, because little data was available to simulating nonlinear function.

The MDV method had large error in half-hourly NEE (Figure 4) but consistent performance and reliability in sum NEE (Figure 5). For MDV, the method does not make use of the ancillary meteorological data and can be expected to have additional problems filling gaps of more than 3–7 days in length, as synoptic changes in weather are strongly linked to changes in diurnal cycles of photosynthesis and respiration [1, 9]. So, to reduce error in half-hourly and annual NEE, the combined method of MDV and NLR was selected in our study (Figure 7) and performed well for short vegetation.

The methods caused large bias during periods of active change in ecosystem properties (Figure 4), because when the flux data are missing, it is impossible to know the timing of magnitude of the change [3, 15]. The magnitudes of NEE for short vegetation, that is, marsh or cropland, and so forth, were smaller than forests; especially for soybean cropland, high GPP and high Re caused low NEE during the growing season. So little error may cause large bias of cumulative NEE, and underestimating NEE or overestimating Re may change carbon sink to carbon source. In this study, the optimum gap-filling method can resolve partly this problem.

5. Conclusion

The three major gap-filling methods (mean diurnal variation, look-up table and nonlinear regression) for estimating net carbon fluxes (NEE) were reviewed and their gap-filling performance was evaluated based on a set of datasets from three short vegetations (marsh, rice, and soybean sites). The performance of the filling techniques depended on the time scale, gap length, and time of day (day or night). In half-hourly scale, standard deviation for NLR method was the smallest among three methods for each gap level. The MDV method has good performance in seasonal scale, especially before germination or sawing and after entirely wilting or harvesting. Though LUT and NLR methods showed small error for daily mean error during the peak of growing season, the huge bias was observed in cumulative NEE for two methods. The combined gap-filling methods were used for short vegetation, which showed NLR or LUT method was selected after plant rapidly increasing in spring and before end of plant growth and MDV method was used to the other stage. This combined method distinctly reduced sum bias and deviation for gap-filled NEE.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work was supported by the National Natural Science Foundation of China (no. 41471022).