#### Abstract

Based on the Backward Four-Dimensional Variational Data Assimilation (Backward-4DVar) system with the Advanced Regional Eta-coordinate Model (AREM), which is capable of assimilating radio occultation data, a heavy rainfall case study is performed using GPS radio occultation (GPS RO) data and routine GTS data on July 5, 2007. The case study results indicate that the use of radio occultation data after quality control can improve the quality of the analysis to be similar to that of the observations and, thus, have a positive effect when improving 24-hour rainfall forecasts. Batch tests for 119 days from May to August during the flood season in 2009 show that only the use of GPS RO data can make positive improvements in both 24-hour and 48-hour regional rainfall forecasts and obtain a better B score for 24-hour forecasts and better TS score for 48-hour forecasts. When using radio occultation refractivity data and conventional radiosonde data, the results indicate that radio occultation refractivity data can achieve a better performance for 48-hour forecasts of light rain and heavy rain.

#### 1. Introduction

After decades of development, the accuracy of numerical weather predictions has greatly improved. The main contributions lie in the following two aspects: first, the perfection of the dynamic framework of the numerical model itself and the refinement of various physical processes; second, the high development of exploration technologies and the application of data assimilation technologies. The assimilation application of unconventional observation data, such as satellite remote sensing data, plays an important role in improving the accuracy of numerical weather predictions, especially the accuracy of forecasting in the Southern Hemisphere. Therefore, using various observational methods to obtain more detailed information on the atmospheric state, developing and improving advanced assimilation methods to effectively utilize all atmospheric observation information to improve the quality of the initial conditions, is a critical way to improve the accuracy of numerical weather predictions at this stage.

Among the various new observation methods, the global positioning system (GPS) and small satellite technology for occultation detection are relied upon as new methods for obtaining atmospheric information. Compared with conventional observations and other satellite data, occultation data have the advantage of high vertical resolutions; uniform global coverage; and weak influences from aerosols, clouds, and precipitation [1]. In theory, the assimilation of data can improve the vertical distribution of the physical quantity field, especially for temperature and humidity near the observations, which allows the analysis quality of the initial value to be improved to some extent.

In the study of assimilation applications to GPS RO data, refractivity data are simpler and more feasible in the application of routine assimilations because of their simple observation operators and economic time-saving features. Previously, Zou et al. [2] and Kuo et al. [3] performed an observational system test in the 4DVar assimilation system; Kursinski et al. [4] and Poli et al. [5] performed assimilation experiments on local refractivity data under an one-dimensional variational assimilation framework. Huang et al. [6] used GPS RO refractivity data to test typhoon predictions based on the WRF 3DVar assimilation system. The results showed that the use of GPS RO refractivity data has a positive effect on the simulation of typhoon precipitation.

The key point of this article is introducing the quality control scheme for the GPS RO refractivity data in the AREM-B4DVar system to evaluate the role of GPS RO refractivity data assimilation in regional numerical weather predictions and, through actual rainstorm cases and batch experiments during the flood season, to explore an effective method for improving the forecasting ability of regional numerical forecast models by using GPS RO refractivity data. Furthermore, the focus of this article is to provide a basic theoretical basis and technical support for the development of occultation data assimilation methods in regional numerical forecast models and weather forecasts for short periods of time.

#### 2. AREM-B4DVar Data Assimilation System and Observational Operators of GPS RO Data

##### 2.1. Introduction of the AREM-B4DVar Data Assimilation System

Four-dimensional variational data assimilation (4DVar) is one of the most promising methods for providing optimal analyses of numerical weather predictions. It is the preferred development plan for most of the numerical weather prediction centres in the world. Based on the principle theory of the method, due to the dynamic and physical constraints of numerical models, all observations are best fitted in the assimilation time window by the variational method, and the initial value of the optimal analysis at the beginning of the assimilation time window is obtained. Under the constraint of the model, the evolution trajectory of the analysis field within the assimilation time window is consistent with the actual observation trend, which allows the accuracy level of the forecast to be better improved at a future time.

Developing an operational 4DVar system is a very large project and takes hard work. It requires not only a corresponding tangent and adjoint model but also rigorous correctness and accuracy tests. Given the “on-off” problem in the process of complex physics [7, 8], these problems have become a bottleneck that plagues the development of 4DVar and limits the wide application of this method in operational numerical models. Many scholars have proposed numerous effective solutions to this problem. Among them, Wang and Zhao [9] proposed the concept of the 3D mapping variation method (3DVM) by placing the initial value of the assimilation at the end of the assimilation window. The use of mapping observations subtly avoids the use of an adjoint model; Wang et al. [10] proposed a four-dimensional variational assimilation method (called DRP-4DVar) to reduce dimensionality projections using historical sample fitting and dimensionality reduction projection techniques. Selecting perturbed samples that depend on the analysis time solves the problem that the background error covariance matrix is not explicitly developed in the 4DVar. Recently, Wang et al. [11] combined the advantages of the 3DVM and DRP-4DVar, proposed a backward mapping four-dimensional variational assimilation method (Backward-4DVar, referred to as B-4DVar), and established the AREM mode of the B-4DVar system (which is termed the AREM-B4DVar system). This method not only avoids tangent-linear and adjoint models but also reduces the computational cost of the assimilation window, and because the initial value being generated at the end of the assimilation window, it can also reduce the prediction error accumulation throughout the assimilation window, which plays an important role in short-term and nowcasting forecasts, is verified in the observational system experiments.

The B-4DVar problem comes down to the minimization of the cost function in the m-dimensional sample space (where *m* represents the number of samples) [11]; therefore, the classic 4DVar method, which is defined for high dimensions of the control variable space, is implemented on the m-dimensional sample space.

The solution to the minimization problem mentioned above can be expressed as

Because the dimension of the matrix is *m*, which is a small number (under 100), calculating an inverse matrix is relatively easy in the above expression. To slow down the underestimation of the B matrix and the false teleconnection between the lattice variable and observed variable, the B matrix is expanded to localize the above optimized solution [10, 11].

The AREM-B4DVar system is based on AREM (version 2.4.0) and the Backward-4DVar method (Wang et al. [11]). Cheng et al. [12] used this system to establish the local and nonlocal operators of GPS RO refractivity data. The assimilation experiments for different observational operators were carried out, and the positive contribution of nonlocal observation operators when forecasting heavy rain was verified. The coordinate plane of the assimilation system adopts the mode surface for the AREM, and the assimilation control variables include the forecasted temperature, zonal wind, meridional wind, specific humidity, ground pressure, and geopotential height. The assimilated observation data include conventional ground and upper-air observation data. The assimilated large-scale background field uses global medium-term numerical forecast products.

##### 2.2. The Observation Operator of the GPS RO Refractivity Data

In GPS RO data, refractivity data are usually given as an atmospheric observation product, and the local observation operator that links the data with the control variable iswhere *p* represents the air pressure (hPa), *T* represents the air temperature (K), and *q* represents the specific humidity. Since the refractivity data are obtained under the assumption of spherical symmetry, gradient information for the elements on the ray is not considered; therefore, the calculation accuracy is lower, especially in the vicinity of bad weather processes, and non-local operators are considered. These operators can theoretically partially compensate for the inadequacy of local observation operators.

#### 3. Case Description and Experimental Design

The study was performed with a sample rainstorm from July 4 to July 5, 2007, in the Yangtze-Huaihe River Basin in China. The reason why the rainstorm occurred was because of a large-scale circulation pattern at 500 hPa, where the circulation to the south was higher than that over East Asia; there was a weak ridge over the Hetao region, and the Yangtze-Huaihe River Basin was influenced by a shear line at 700 hPa. Such a circulation pattern was extremely optimal for the formation of convection. As the tropical system around Hainan moved northward, a torsion in the subtropic high occurred, which strengthened warm air propagation north of the subtropical high in the west. To summarize, the rainstorm was caused by the interaction between two air currents.

Experiments for predicting heavy rainfall, which occurred from July 4, 2007, to July 5, 2007, in the Yangtze-Huaihe River Basin in China, with data from the Constellation Observing System for Meteorology, Ionosphere and Climate (COSMIC), which included GPS RO refractivity data, bending angles, and routine GTS data, are designed. The designed assimilation test time window was the period from 18:00 on July 3, 2007 (UTC; same below) to 00 UTC on July 4, 2007. The time window data include data from each occultation detection time (Figure 1), conventional ground data at 18 UTC and 00 UTC, and high-altitude observation data.

The forecast experiment scheme adopts the AREM2.4.0 forecasting model for a limited-area regional numerical weather forecasting system. Its horizontal resolution is 30 km, the top layer of the model is at 10 hPa, and the model area (14°N-51°N, 74°E-136°E) covers China and the surrounding areas. A specific description of the experiment is given in Table 1, which includes the time variable boundary conditions and explicit cloud physics processes using the parameterization process for cold cloud precipitation process parameterization [13].

#### 4. A Case Study of the GPS RO Refractivity Data Quality Control Scheme

As the GPS RO data cannot be regarded as uncorrelated and are very dense at the vertical direction (Chen et al. [12] [14]), the resolution, the coverage, and data density can differently contribute to the analysis and the forecast [15, 16]. An essential step before data assimilation is quality control. Given efficient quality control, data with too many observation errors, data with observation operators that cannot simulate reasonable values, and data that represent small-scale processes and cannot be resolved by the model resolution can be removed and not reduce the positive effect of the data assimilation.

Finding outlier data is commonly used in quality control, and we should find the statistical characteristics and distributions by studying a large amount of observation data and the differences between observation data and model simulations. The standard deviation is intended to be used in this paper as the main quality control method:where *S* represents the standard deviation of observation *x*.

The GPS RO refractivity data are derived from the spherical symmetry assumption. The closer the height is to the ground, the more nonuniformly distributed is the water vapor, which causes data errors in the form of nonlinear growth. To improve the effect of GPS RO refractivity data assimilation and forecasting, this paper adopts a simple quality control scheme for refractivity data: (1) exclude observation data below 3 km with errors in O-B that are too high; (2) set a high vertical resolution due to the GPS RO refractivity data (there are 500 m intervals in the troposphere, and the upper troposphere to the stratosphere has an interval of nearly 1 km), while keeping the vertical level of the pattern relatively small (usually only approximately 30 layers); therefore, unnecessary observations between the two model levels should be eliminated to match the resolution of the model; (3) use standard deviation between the observation data and model simulations to get rid of the outlier data.

To determine the influence of the quality control scheme on the assimilation effect, four experiments (Table 2) were designed: the control run test (CTRL), the GPS RO refractivity data local observational operator assimilation scheme (REF_NQC), the GTS conventional radiosonde observation data assimilation scheme (STN), and the GPS RO refractivity data local operator plus quality control assimilation scheme (REF_QC).

By comparing the difference before and after quality control between the initial values of the assimilation and observation data for GPS RO refractivity, it can be clearly seen that the assimilation analysis is closer to the observations in the region when using observation data after quality control. However, after performing the quality control near the ground, where some excluded data were removed, and there was no observation constraint, the analysis values were far from the observation values (Figure 2). Based on the relative deviation when comparing the initial value of the quality control with the actual observation, the region with a relatively small difference was in the upper troposphere, and the area with the greatest error was still concentrated in the lower troposphere, which was also due to the large error in the low-level refractivity observations.

**(a)**

**(b)**

The deviation caused by the quality control does not mean that causes poor prediction ability; this can be seen from the differential forecast field for 24-hour cumulative precipitation. In the northwest part of the main rain-belt after the quality control, the increment in the data assimilation moves southward and strengthens (Figure 3). From the assimilation increment after quality control (Figure 3), increments in the 700 hPa and 500 hPa quality control tests are mainly concentrated to the northeast of the low-pressure system and south of the low-pressure system, respectively. With the increase in the northeastern region of low pressure and the weakening of the southern region of low pressure, airflow to the south near the rain-belt is enhanced, which causes an overall southward movement in the rain-belt, and is consistent with the actual situation.

**(a)**

**(b)**

**(c)**

#### 5. A Batch Test for Occultation Data Assimilations and Forecasts during the Flood Season in 2009

The batch assimilation/prediction programmes during the flood season are as follows:(1)Batch tests are performed for 119 days from May 4, 2009, to August 30, 2009.(2)The observation data used for assimilation are the COSMIC occultation refractivity data and conventional observation data at intervals of 6 hours from 18 UTC to 00 UTC.(3)Global midterm numerical forecast products are used as background field data.(4)The B-4DVar batch assimilation/prediction test schemes include the control run (CTRL), a GTS conventional radiosonde data assimilation test method (stn_b4dvar), the GPS RO refractivity data assimilation test (gps_only), and the test that uses GPS RO refractivity data simultaneously with GTS conventional radiosonde data (stn_gps_b4dvar).

All of the observation operators for GPS RO refractivity data assimilation are local observation operators.

To evaluate the attribution of the assimilation to the model forecast, we usually use Threat Score (TS) and Bias Score of the precipitation forecast accuracy.

The formula for computing the TS is

For a perfect forecast, correct = forecast = observed to yield a TS of 1. The worst possible forecast, with correct = 0, yields a TS of zero.

The basic formula for computing the Bias is

This quantity gauges the accuracy of areal/station coverage of a specified precipitation threshold amount, regardless of accuracy in location. An ideal forecast would have forecast = observed to yield a Bias of 1.

The TS score and B score results for the GPS RO refractivity data assimilation test (gps_only) and control run (CTRL) are analysed (Figures 4 and 5). It can be seen that there are certain improvements in the TS score for all levels of light rain, moderate rain, and heavy rain in the 24-hour forecast, but the score for heavy rain is slightly worse. Excluding heavy rainfall, the B score improved somewhat and was better than the reference test. At 48 hours, the improvement in the TS score was more obvious, and the B score was similar to the reference test in terms of light rain and moderate rain. In contrast, the B score for heavy rain and rainstorms was larger than the reference test results.

**(a)**

**(b)**

**(a)**

**(b)**

As a whole, data assimilation using only radio occultation data can make positive improvements to both 24-hour and 48-hour rainfall forecasts; it can obtain a better B score in the 24-hour forecast, and the TS score is better in the 48-hour forecast.

The results of the three experiments, including the GPS RO refractivity data and GTS conventional radiosonde data assimilation test (stn_gps_b4dvar), the GTS conventional radiosonde data assimilation test (stn_b4dvar), and the control run test (CTRL) (Figures 6 and 7), were compared. We know that stn_b4dvar has a certain improvement in the TS scores in both the 24-hour and 48-hour forecasts for all levels of light rain, heavy rain, moderate rain, and heavy rain compared with the CTRL. In addition, the B score results are better than the control run test results at all levels (except that it is larger for heavy rain in the 48-hour forecast). When comparing the effect of GPS RO refractivity data when using GTS conventional radiosonde data in the stn_gps_b4dvar and stn_b4dvar tests, the TS score did not improve in the 24-hour forecast, and there was only a slight improvement in the B score. However, in the 48-hour forecast, the TS score was slightly improved for light rain and heavy rain; moderate rain and heavy rain were comparable in the reference test, and the corresponding B score was larger.

**(a)**

**(b)**

**(a)**

**(b)**

Overall, when using GPS RO refractivity data and GTS conventional radiosonde data, the results indicate that the use of GPS RO refractivity data can achieve a better performance for light rain and heavy rain at 48 hours, but they have a less positive effect on the 24-hour forecast.

#### 6. Summary

According to the analyses from the experiments above, it is obvious that the use of GPS RO refractivity data can improve the prediction accuracy of heavy rain-belts and regional rainfall intensity based on the AREM-B4DVar data assimilation system. By comparing various test schemes, the following conclusions are obtained:(1)Both tests show that this new method can make a positive improvement to regional rainfall forecasts by using GPS RO refractivity data.(2)Only the use of GPS RO refractivity data can make positive improvements to both 24-hour and 48-hour rainfall forecasts and obtain better B scores in 24-hour forecasts and TS scores in 48-hour forecasts.(3)When using GPS RO refractivity data and GTS conventional radiosonde data, the results indicate that the use of GPS RO refractivity data can achieve better performances in 48-hour forecasts of light rain and heavy rain, but there is a less positive effect on the performance in the 24-hour forecasts.

#### Data Availability

The figure of GPS RO observation location of the heavy rainfall case on Jul 3, 2007 was used to support this study and is available at DOI: 10.1360/012012-17. These prior studies are cited at [12] within the text as references.

#### Conflicts of Interest

The authors declare that there are no conflicts of interests regarding the publication of this paper.

#### Acknowledgments

The authors are grateful for the GPS RO refractivity data from the COSMIC Data Analysis and Archive Center. This research was financially supported by the National Key Research and Development Program of China (Project No. 2017YFB1002702).