A variety of climate factors influence the precision of the long-term Global Navigation Satellite System (GNSS) monitoring data. To precisely analyze the effect of different climate factors on long-term GNSS monitoring records, this study combines the extended seven-parameter Helmert transformation and a machine learning algorithm named Extreme Gradient boosting (XGboost) to establish a hybrid model. We established a local-scale reference frame called stable Puerto Rico and Virgin Islands reference frame of 2019 (PRVI19) using ten continuously operating long-term GNSS sites located in the rigid portion of the Puerto Rico and Virgin Islands (PRVI) microplate. The stability of PRVI19 is approximately 0.4 mm/year and 0.5 mm/year in the horizontal and vertical directions, respectively. The stable reference frame PRVI19 can avoid the risk of bias due to long-term plate motions when studying localized ground deformation. Furthermore, we applied the XGBoost algorithm to the postprocessed long-term GNSS records and daily climate data to train the model. We quantitatively evaluated the importance of various daily climate factors on the GNSS time series. The results show that wind is the most influential factor with a unit-less index of 0.013. Notably, we used the model with climate and GNSS records to predict the GNSS-derived displacements. The results show that the predicted displacements have a slightly lower root mean square error compared to the fitted results using spline method (prediction: 0.22 versus fitted: 0.31). It indicates that the proposed model considering the climate records has the appropriate predict results for long-term GNSS monitoring.

1. Introduction

Within the various remote sensing technologies, Global Navigation Satellite Systems (GNSS) plays an important role in providing fundamental infrastructure and has been successfully implemented in deformation monitoring. The global GNSS, such as the United States’ Global Positioning System (GPS), Russia’s Global Navigation Satellite System (GLONASS), the European Union’s Galileo, and China’s Beidou Navigation Satellite System (BDS), serve as highly efficient monitoring tools for precise geodetic surveying. Different unions or countries have installed numerous Continuously Operating Reference Stations (CORS) for various monitoring purposes, including the Plate Boundary Observatory (PBO) maintained by National Science Foundation (NSF) EarthScope, the CORS GPS network maintained by the U.S. National Geodetic Survey, and the GPS Earth Observation Network (GEONET) of Japan. More than 506 worldwide permanent GNSS stations are managed by the International GNSS Service (IGS) group as of December 08, 2019. The original RINEX (Receiver Independent Exchange Format) files for the GNSS stations in different CORS networks are free to download through University NAVSTAR Consortium (UNAVCO) or National Geodetic Survey (NGS) data archiving facilities [1, 2]. The GPS, originally NAVSTAR GPS, is a widely used satellite positioning system, and it especially refers to the GNSS owned and operated by the United States. In the United States, the GPS signal has been widely used in the monitoring service since 1989 [3]. Although this study only used GPS signals, GNSS is currently used as an umbrella term for all the aforementioned global satellite positioning systems. For this reason, this study uses the term GNSS.

Understanding the climate variations of GNSS time series are important for monitoring applications. In practice, during the period of half, one, or two years, the long-term monitoring time series have cyclical fluctuation and rebound characteristics triggered by climate influence, which makes the users who do not major in geodesy confuse about the precision of GNSS monitoring. The climate factors, mainly including the rainfall, temperature difference, wind speed, visibility, dew, and humidity, have always been a question among the long-term monitoring GNSS operators and data users. Xu et al. [4] proved that climate change led to the periodical variety of thermal expansion of bedrock (TEB). Dong et al. [5] established the hybrid model to remove the seasonal variations from long-term time series. Yan et al. [6] combined the thermal expansion model with the mass loading model to study the observed annual GNSS height changes. Munekane [7] proposed that the large temperature difference induced the maximum vertical displacements of three millimeters within 24 hours and established a model to move it. The climate variations are recognized as impact factors of the target deformation. Moreover, since the climate includes a variety of parts, the effects due to local climate variations on GNSS observations need to be carefully further studied. Thus, it is necessary to quantitatively analyze the weights of impacts caused by different climate factors on the long-term GNSS monitoring time series.

To achieve the quantitative analysis results of the impact on the GNSS time series caused by different daily climate factors, we need high-precision GNSS records. The accuracy and precision of GNSS observations are impacted by the types of GNSS equipment used and the processing method applied. In general, there are two widely used GNSS data postprocessing approaches: relative positioning and absolute positioning. The relative positioning approach uses the carrier-phase double-difference (DD) method to fix differenced phase ambiguities to integer values between stations and between satellites [8]. The DD method uses simultaneous GNSS records from at least two GNSS stations, and at least one of the GNSS stations should be installed in a stable foundation or known position with respect to a specific reference frame. The absolute positioning approach only needs a single GNSS station to determine the position without using any synchronous data from other stations. This method used undifferenced dual-frequency pseudo-range and carrier-phase observations in addition to precise satellite orbits and clock information to determine the position of a stand-alone GNSS station [9].

Precise Point Positioning (PPP) is a typical GNSS postprocessing absolute positioning approach, which uses a single-receiver phase-ambiguity-fixed resolution to calculate daily original raw data. The precision of the PPP solutions has dramatically increased during the last decade, which primarily attributes to highly precise satellite orbits and clock data provided by the International GNSS Service (IGS) and new algorithms used to resolve phase ambiguity within a single receiver [10]. Moreover, the PPP algorithm has operational simplicity and can provide consistent accuracy [11, 12]. Considering the risk of lacking high-quality reference GNSS station when using the relative DD method, it is convenient to use the PPP method to process the GNSS data that monitor the long-term structural deformation or surface land movements. In this study, the software package GIPSY/OASIS (V6.4) employing the PPP processing strategy was applied [13]. The estimation of wide lane and phase bias was maintained by the Jet Propulsion Laboratory (JPL). When applying the GIPSY/OASIS software to solve daily positions, they are processed within the , , and geocentric Cartesian coordinate systems. The widely accepted global reference named the International Terrestrial Reference Frame (ITRF) is derived from the method of minimizing the overall horizontal movements of selected global permanent stations [14]. Using the GNSS to study the global tectonic movement, the most common global reference frame based on GNSS sites is required. Currently, the latest earth-centered, earth-fixed global reference frame International GNSS service of 2014 (IGS14) is established and aligned to ITRF2014 datum, which was updated from ITRF2008 in January 2017 [15]. However, the reference frame selection is motivated by the purpose of the study. In a continental-scale study area, such as North American, the NA12 was defined by 299 GPS stations and designed to corotate with the stable interior of the North American tectonic plate [16]. Also, another North American Plate fixed reference frame (NAD83 reference frame) was established resulting from requirements of the coordinates for sites located in the Conterminous United States (CONUS), Alaska, and US territories in the Caribbean. However, the high-precision GNSS monitoring time series inevitably include background tectonic motion within the subcentimeter level in the continental scale reference frame [17]. The millimeter-level internal tectonic movements could be easily obscured and biased within an inappropriate reference frame [18]. For this reason, there is a need to establish a stable regional or local reference frame for studies focusing on regional or local scales.

The scale, the orientation, the origin, and the change of these parameters over time are the main physical and mathematical properties of a reference frame. In geodetic applications, a stable local reference frame is primarily transformed from the latest and well-established global reference frame using the Helmert transformation. A group of GNSS reference stations (common points) are used to tie the target regional or local reference frames to a global reference frame (e.g., IGS14). Pearson and Snay [19] used a 14-parameter transformation method to transform coordinates from the global reference frame IGS08 to the regional reference frame NAD83, which is maintained by NGS. Theoretically, the coordinates of a point referred to the IGS14 global reference frame vary when they are transformed to a regional or local reference frame. Nevertheless, deformation monitoring mainly focuses on the deformation rates/displacements of the target study area rather than the coordinates themselves in both reference frames. In general, relative high precision displacements or deformation velocities are needed in different area monitoring applications.

In practice, after the highly precise GNSS records are processed, it is still a challenge to quantitatively analyze the weights of impact from different climate factors on the GNSS time series. However, with computing science development, the machine learning approach achieves a dramatic developing rate and provides a new tool to explore new analysis methods in geodesy and geosciences. The approach can quantitatively analyze the hypothesis and assist to capture high-dimensional data sets [20, 21]. Furthermore, machine learning methods have been widely applied in geosciences. Rouet et al. [22] evaluate the fault movements in the Cascadia subduction zone. Phrampus et al. [23] predict the probability of encountering global SEAFLEA. Anderson and Lucas [24] proposed the climate ensemble method to predict weather and establish various climate models. Ren et al. [25] used the machine learning algorithm to estimate the fault friction with high precision. The machine learning algorithm has the potential to help researchers make further explanations and explore the theories behind the questions in geosciences. Therefore, we applied the ensemble learning method to analyze the weights of impact from various daily climate factors on the GNSS monitoring time series. In this study, we used an appropriate supervised machine learning (SML) algorithm, Extreme Gradient Boosting (XGBoost), to determine the relationship between various daily climate factors and highly precise GNSS data.

In this paper, we proposed a hybrid method to evaluate the impact of daily climate factors on the GNSS time series, using the extended Helmert transformation method and XGBoost algorithm. The model is trained by high-precision GNSS records and various long-term climate records. The contributions of this paper are shown as follows: (1)To remove the background tectonic movements when monitoring local ground deformation, we proposed the extended Helmert transformation to establish the highly stable PRVI19 local reference frame based on ten well-distributed continuously operating GNSS stations with at least five years of data(2)By combining the GNSS records with millimeter accuracy and the local climate data with a span of at least five years, we applied the XGboost machine learning algorithm to derive the quantitative results of the weights of impact from different daily climate factors on the GNSS time series(3)Based on the model, we predicted the GNSS records and validate them with the real raw GNSS data. The results show that the high accuracy of the prediction and it is expected that this study can provide a new prospect to explore the potential deformation monitoring problem

2. Data and Methods

2.1. Coordinate Transformation Using the Extended Helmert Transformation Method

The Helmert transformation, also called a 7-parameter transformation, is used to conduct distortion-free reference frame transformations within a three-dimensional (3D) space in the geodesy area. The 7-parameter approach employs three parameters from translations, three parameters from rotations, and one parameter from the scale at a selected epoch and the rates of these seven parameters over time. For daily GNSS positional coordinate transformation from the PPP solutions, the geocentric coordinates of a site with respect to the local reference frame can be approximated by the following formula: where , , and are counterclockwise rotations along axis, axis, and axis of the geocentric coordinate system; , , and are translations about these three axes; and is a differential scale change between IGS14 and local reference frame.

Theoretically, the seven parameters for each epoch/day are different. Therefore, to obtain the positional time series in the local reference frame, the seven parameters at each epoch have to be provided. Currently, there are two strategies that can be employed to perform the transformation, daily seven-parameter transformation, and 14-parameter transformation [19]. Continuous observations have made it possible to calculate the seven transformation parameters on a daily basis and transform the daily positional time series from one reference frame to another. However, calculating daily transformation parameters is too complex for most end-users, since users have to include a large number of reference stations for calculating transformation parameters at each epoch. Besides, it is difficult to ensure the data quality and availability for all the reference stations at each day. Thus, the GNSS time series of these seven transformation parameters can be simulated by the linear method with equations:

Here, denotes a specific epoch, which is set as specific epoch. , , , , , , and are the seven Helmert transformation parameters at specific epoch . , , , , , , and are the first derivatives (rates) of these seven parameters, which are assumed to be constant; the units of the translational components (, , ) are meters; the units of the three rotational components (, , ) are radians; the units of the rates of translational components (, , ) are meter/year; the units of the rates of the rotational components (, , ) are radian/year, and is a unitless scale. The local reference frame is tied to the same origin and scale of the original global reference frame, which is the IGS14 global reference frame in this study. The above formulas also require the use of a scale factor to minimize the distortion of point-to-point distances between the two reference frames. In general, the scale factor can be set as zero in frame transformations between a global reference frame and a local reference frame [26]. The common points are used to solve the inverse problem by the least square method.

Then, the coordinates of the target GNSS stations referred to the local reference frame can be obtained through:

Also, since a linear model is assumed, the changing rates can be easily calculated with two sets of transformation parameters at two different epochs. In this study, the epochs are set as (i.e., 2015) and (i.e., 2018). So, the coordinates in both frames at epoch are the same, and the seven parameters are all zeros. The method aligns the two frames at the epoch . The coordinates at epoch are calculated with the following equation:

In order to calculate the transformation parameters at another epoch , the coordinates at epoch in IGS14 and local reference frame can be derived by using the following equations:

Here, the velocity in local reference frame is regarded as zero since local reference frame is designed to have a velocity of zero relative to the rigid part of the region. With the coordinates at epoch in both IGS14 and local reference frame, equations (1), (2), and (3) can be solved for the seven parameters at epoch . By knowing the six parameters for epoch and , the parameter rates can be obtained by a simple differential method:

Then, the coordinates of a GNSS site at any epoch can be transformed from IGS14 to the local reference frame with , , , , , and parameters.

2.2. Extreme Gradient Boosting (XGboost) Algorithm

In this study, we applied the Extreme Gradient Boosting (XGboost) algorithm to predict ground displacements and to understand which climate factors have more impact on the GNSS monitoring time series. XGBoost was proposed by Chen et al. [27], which can combine many regression trees into one strong ensemble learner. Because of the ensemble, XGBoost can sufficiently capture complex interaction of variables in monitoring time series and then fit the nonlinear dynamic changes of displacements. This is one of the motivations for us to select XGBoost in this paper. Another reason is that the strong explanation power of XGBoost can help us understand the relationship between different daily climate variables and predict the high-precision GNSS displacements.

In XGBoost, additive regression trees are together to predict displacements. That is: where is the space of used regression trees, is a data set containing historical displacements, each represents an independent regression tree structure and leaf weights , and represents the number of leaves of the regression tree.

The optimization function of XGBoost can be written as: where β and λ are coefficients, is a differentiable convex loss function which represents the differences between measured displacements and predicted displacements, and is the penalty term of the objective function which helps avoid the overfitting problem.

However, because of the complex architecture, it is difficult to train the ensemble learner once. An additive strategy has been widely applied which means trees are trained one by one. The trees that have already been trained will be fixed, and then, a new tree is added at one step. Suppose that the predicted displacement at step is , then the optimization function at this step can be written as: where is a constant. To approximate the optimization function using a second-order Taylor expansion, it can be rewritten as: where

With equations (19) and (20), the optimization function can be revised as: where . Given a tree structure , the weight of the th node and extreme value of can be obtained as:

As studied by Breiman and Friedman [28], for a single regression tree , the following equation can be used to measure the relative importance of independent variables in the tree: where is a nonterminal node and is the terminal node of tree , is the splitting variable of node , and is the improvement of square error of the prediction if is used as splitting variable. For an ensemble learner, which is a collection of regression trees , the relative importance of variable can be obtained by its average over all additive trees:

The XGBoost is implemented using the “xgboost” package in the R software [29].

2.3. Data and Selection

In geodesy, a terrestrial reference frame is realized by selecting a set of reference stations and defining their positions and velocities. The selection of the reference stations is critical for establishing a stable reference frame. Here, it is very hard to set any mathematical or technical standard for selecting appropriate GNSS reference stations. In general, with some previously proposed guidelines, the reference stations are selected based on overall geographic distribution and long-term (e.g., >5 years) continuous records [30, 31]. Also, it requires no considerable subsidence or uplift with vertical velocity magnitude less than 0.5 mm/year referred to the global reference frame IGS14. A reference station that is not locally stable will degrade the overall stability or precision of the reference frame. The selection is mostly based on the availability of long-term CORS in the study area. In addition, if the selected GNSS sites have detailed station logs, which it may help explain the unexpected steps. The step means that the GNSS time series have sudden ascending or descending jump induced by the earthquake, volcano eruption, or/and GNSS equipment change. In this paper, we selected Puerto Rico and the Northern Virgin Islands (PRVI) to be the target area. In the tectonic setting, the motivation results from the results that researchers proposed the exist of rigid block of Puerto Rico and the Northern Virgin Islands (PRVI) between the Caribbean Plate and the North American Plate [32, 33]. The area is recognized as a tectonic stable microplate without notably tectonic movements in the Caribbean area and can minimize the influence from tectonic movements. The PRVI region is recognized as an appropriate area to apply the method.

Also, the PRVI region has established an appropriate GNSS infrastructure and has a long-term land monitoring history. Since 1986, GNSS stations were installed by researchers for studying Caribbean plate tectonic movements [34]. In the PRVI region, as of 2020, 28 GNSS stations are recording the displacement observations with 15 seconds positioning. GNSS station coverage within this region is exceptionally dense. The PRVI region owns the densest GNSS stations in the United States and has been well monitored by numerous permanent GNSS stations [30]. For example, the coverage along the coast of Puerto Rico is approximately 20.0 stations per kilometer. Eight GNSS stations within the PRVI region were installed by the Puerto Rico Seismic Network (PRSN) at the University of Puerto Rico at Mayaguez (UPRM). The installation was primarily funded by a National Science Foundation (NSF) Major Research Instrumentation (MRI) project (EAR-0722540). The daily GNSS records with a sample rate of 15 seconds are archived at UNAVCO public data archiving facility. The HLCM Group Inc. also installed eight GNSS stations in the PRVI region for land surveying. The HLCM Group Inc. is a private surveying company, and it archives the GNSS data at the NGS data archiving facility. Other public agencies, such as the U.S. Coast Guard, Federal Aviation Administration (FAA), and Jet Propulsion Laboratory (JPL), also operate GNSS stations in the PRVI region. Therefore, the well-established GNSS network and the complete equipment maintenance record offer a strong foundation to apply the experiment (Figure 1). Moreover, since the GNSS signals are likely to be blocked by the circumstance, the GNSS stations need to be installed in the open area. The open area, such as the building roof or top of the mountain, could keep minimum interference from GNSS multiple paths and signal block. Figure 2 shows the site views at four typical permanent GNSS stations in the PRVI region, all of which were installed in the open area to avoid the GNSS signal block or multipath effects, with solid monuments, i.e., the short-drilled braced monument, and building mount monument.

The climate impacts the seasonal ground deformation, and different climate factors perform different impact weights in the GNSS time series. In this study, PRVI region that presently follows a tropical rainforest climate is selected to test the hybrid method. The area has recorded an annual mean temperature of 28°C, and it has a trend of increasing since the 1950s [35]. The precipitation records show a long-term trend of decreasing precipitation in the northern area and show historically drier with a positive rainfall trend in the southern area [36]. Since the PRVI is one of the islands located in the Caribbean region, it has a wind speed of 10 mph~14 mph which is recorded at 10 meters above the ground. However, hurricane activity has increased since 1995. This increase in hurricane activity may also be a result of natural variability [37]. Also, historical records from tide gauges since 1900 show a Sea Level Rise (SLR) of 1.7 mm/year. Recent satellite-based remote sensing technology shows a SLR rate of 3.2 mm/year since 1992 [38]. Moreover, since the snow/ice has different characteristics in a variety of areas (such as Alaska and Texas in the U.S), it leads to a tremendously different impact weight in GNSS time series [39]. Notably, we expect the hybrid method to be fully tested and have the characteristics of generality and efficiency. Thus, we select the location without snow/ice to test the hybrid method. Besides, to precisely evaluate the impact caused by different climate factors, the selected study area needs to have consistent climate data. We choose to use the original daily climate monitoring continuous station which is installed near the permanent continuous GNSS station in the PRVI region.

3. Results

3.1. PRVI19 Local Reference Frame

This study used the GIPSY-OASIS software package (V6.4) to obtain daily solutions using the PPP method. Firuzabadi and King [40] and Wang [41] proposed an outlier detection and removing algorithm to clean the positional or displacement time series. On average, 6 ~ 7% of the total samples have been removed as outliers in this study. According to the previous investigations, daily PPP solutions would achieve 3-5 mm horizontal accuracy and 5-8 mm vertical accuracy in the PRVI area [30]. The PPP solutions of the GNSS stations are provided with respect to a global reference frame. The positional time series referred to the IGS14 global reference frame are shown in Figure 3. Also, the raw PPP solutions inevitably include the common background of global tectonic movements. Thus, the GNSS observations referred to a stable local reference frame could minimize the influence induced from the global background movements.

In this study, we used the extended Helmert transformation and selected ten permanent stations to realize the PRVI19 reference frame. The MIDAS method was used to calculate velocity and uncertainty [42]. The velocities of the ten selected reference stations with respect to different reference frames are shown in Figure 4. The locations of these permanent stations and corresponding site velocities referred to IGS14 and PRVI19, respectively, are listed in Table 1. The seven Helmert parameters for transforming the global IGS14 reference frame to the local PRVI19 reference frame are listed in Table 2. Moreover, the plate reconstructions and geodynamic modeling are defined by the rigidity of tectonic plates. The stability of the local reference frame also relies on the assumption of the rigidity of the tectonic plate. The stability or the accuracy of the reference frame can be assessed by the average velocities of all reference stations with respect to the reference frame. Since the reference frame is defined by the reference stations, in an ideal situation, all reference stations should have no relative movement between each other with respect to the defined reference frame. Thus, if the PRVI region is recognized as a rigid microplate in tectonics, the stability of the local reference frame is to be 0 mm/year. However, in practice, a tectonic plate is not completely rigid, and it cannot be strictly stable. For PRVI19, the stability reaches approximately 0.4 mm/year in horizontal and approximately 0.5 mm/year in vertical. This level of accuracy is essential for studying millimeter-per-year ground motions in the PRVI region.

Theoretically, the longer length of GNSS data and better geographical distribution of reference stations can improve the stability of a reference frame. Wang et al. [18] updated the PRVI14 local reference frame to PRVI18 with the latest global reference frame (from IGS08 to IGS14) and reference stations. However, it indicated that there was no considerable improvement in reference frame stability. In this study, we established the local reference frame to PRVI19 with ten reference stations and a longer length of GNSS data. The PRGY, PRLT, and PRMI GNSS stations, which are located at southwestern and southern of the PRVI region, are excluded from reference stations. Those stations are located nearby the South Lajas fault which is an active Holocene structure within the past 5,000 years [43]. The Lajas Valley currently experiences a 1.5 mm/year north-south direction extension and minor right-lateral strike-slip [9]. Also, the MIPR station is excluded from the reference stations because it is close to P780 and PRN4 in geographic distribution, and the GNSS station stops recording data since October of 2017. Though the location of MIPR station can form well geographical distribution for the local reference frame, we still excluded it from the reference stations because P780 station has better quality and longer length of data and locates in the same area. However, the stability of the PRVI19 still does not considerably increase compared with the previous two local reference frames, i.e., PRVI14 and PRVI18 [18, 31]. Based on the previous versions of local reference frames and PRVI19, it proves that the stability of the PRVI19 local reference frame is approaching the rigidity of the PRVI tectonic block.

Figure 5 depicts nearly 13-year North-South, East-West, and vertical displacement time series at BYSP (2008-2020) with respect to the IGS14 and the PRVI19 reference frames. It shows a notable difference in horizontal velocities when referred to the global (IGS14) and local (PRVI19) reference frames. The 13 years of BYSP observations show that it has a velocity of  mm/year in NS and a velocity of  mm/year in EW with respect to the global IGS14 reference frame; it has a velocity of  mm/year in NS and a velocity of  mm/year in EW with respect to the local PRVI19 reference frame, respectively. In general, most engineer monitoring applications only focus on the local target deformation displacements and do not consider the movements with respect to a global reference frame. The comparison of the BYSP GNSS time series within IGS14 and PRVI19 reference frames demonstrates that the background global or regional tectonic movements can bias or obscure the local ground deformation when using a global or regional reference frame.

3.2. Daily Climate Impact Factors

Climate change is considered as an important external impact factor influencing the GNSS data precision. However, it is a challenge to clarify the relationship between daily climate change and daily GNSS records. The main reasons are because the daily climate change influence could be partially removed by the 24-hour GNSS processing method. Thus, we selected the continuous operating GNSS station BYSP, which is installed nearby a real-time climate-monitoring device. The 5-year continuous climate data are used in the model, which is collected by the weather station (TJSJ) nearby the BYSP (NOAA National Weather Service). Here, we established two models between which the only difference is that one considers the daily climate features and the other not. We used the model to evaluate whether climate change can influence the precision of the 24-hour GNSS time series, which has been transformed to the PRVI19 reference frame using the extended Helmert transformation method. The dimensionless index shows that the model without considering the daily climate change is 0.32 and the other one considering the daily climate change is 0.25. The lower dimensionless index means that the model has better performance. The results prove that daily climate change is one of the impact factors in the GNSS time series.

Furthermore, we determined the quantitative weights of impact from different daily climate factors on the GNSS time series. Figure 6 shows the importance of the 20 selected climate factors on the GNSS time series using the XGBoost model. The XGBoost parameters used to evaluate the displacements are shown in Table 3. Each row displays the impact of the feature, in which the contribution weight is shown on the -axis. Here, we used two complementary features, North-South (NS) feature and East-West (EW) feature, which are the physical movements due to the vertical displacements. Theoretically, the NS and EW observations should change when the GNSS station has vertical movements. For this reason, we involve the two features to help the hybrid model precisely evaluate different climate features and corresponding impact weights. The relationship between these two physical features is in good agreement with the results reported in Li et al. [44], which indicated that the key feature for the NS movements was the variance of the subsidence or uplift area. Based on the five years of monitoring climate records, we analyze the weight of different climate factors influencing the GNSS displacements. We found that the wind speed (average) and temperature (minimum) were the two climate factors that have the biggest impact on the daily GNSS time series. Interestingly, the humidity, temperature (average), and dew are not sensitive in the model. In the daily GNSS postprocessing records, except for the mega weather conditions, it is hard to determine the relationship between these features that can impact the vertical displacements.

3.3. GNSS Time Series Prediction

The hybrid model forecasted the GNSS monitoring displacements by learning the previous data and expected to explore the potential deformation problem in various monitoring applications. The prediction of GNSS observations is derived from the previous displacements that are referred to the stable local PRVI19 reference frame. Here, we used the BYSP GNSS postprocessing records referred to PRVI19 to train the model. The hybrid model was trained by the 1823 days of GNSS displacements. Also, we used the root mean square error (RMSE) as an indicator to evaluate the forecasting precision [45, 46]: where is the prediction value, is the measured value, and is the sample size of test dataset. In the experiment, we need to use the prediction results to compare with real deformation results monitored by the GNSS station. However, the GNSS time series still inevitably include the nonlinear change, such as the noise, which is complex and hard to remove. Gazeaux et al. [47] tested different models and methods to minimize the white noise and colored noise such as the first-order Gauss Markov model and auto-regressive moving average model. However, the best noise model is location dependent, which is impacted by the location of the GNSS site, GNSS raw data processing method, and monument designs [48]. Thus, in the comparison, we used cubic spline function interpolation to fit the GNSS measured solutions, which have been transformed to PRVI19 [49]. Figure 7 shows the predicted model results compared with the postfitted GNSS displacements using the cubic spline function in the experiment. We used optimized parameters for the cubic spline method and found the predicted values performed well. It shows that the RMSE of the forecasting method is 0.22, which is slightly lower than 0.31 from the fitted results using the cubic spline function. Moreover, we tested all parameters enrolled in the cubic spline method. It also shows that the precision of the forecasted results meets well with the spline fitted results, which are processed using the real raw GNSS records.

4. Conclusions

To precisely analyze the impact of various daily climate factors on the GNSS time series, we proposed a hybrid method and applied it in the PRVI area. We used the extended Helmert transformation method to establish the PRVI19 local reference frame, which could help avoid the bias of background global or regional tectonic movements in the GNSS time series when studying local ground deformation. The stability of the PRVI19 reference frame is approximately 0.4 mm/year and 0.5 mm/year in the horizontal and vertical directions, respectively. Also, we adopted the XGBoost algorithm and the highly stable PRVI19 local reference frame to quantitatively assess the effects of daily climate factors on the GNSS daily (24 hours) observations. Based on the 13 years of GNSS records referred to PRVI19 and climate data recorded by a nearby climate-monitoring device, we observed that the wind had the biggest impact on the GNSS time series. The results show that the average, lowest, and highest wind speeds are the first, second, and fourth-largest weights among all the climate factors. Besides, the result also shows that the lowest temperature also greatly affects the GNSS displacements, which is the third-largest weight among all climate factors. This paper introduces a new method that can quantitatively determine the impact weights of different climate factors on the GNSS time series. Moreover, we used the model to predict the GNSS records and indicate users to explore potential deformation risk. It is hoped that this study can promote the applications of the GNSS techniques and improve the understanding of the impact of different climate factors on the GNSS monitoring time series.

Data Availability

Data available in a publicly accessible repository. Data is available through http://geodesy.unr.edu/ and also appreciates the climate data from NOAA National Weather Service https://www.weather.gov/.

Conflicts of Interest

The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Authors’ Contributions

Linchao Li and Hanlin Liu contributed to the conceptualization, methodology, data curation, and writing—original draft preparation. Hanlin Liu contributed to the visualization and investigation. Linchao Li contributed to the supervision. Linchao Li and Linqiang Yang contributed to the writing—reviewing and editing.


The authors acknowledge Guoquan Wang (University of Houston) for providing the PPP solutions data quality check for this study. The first author also appreciates the UNAVCO and the Nevada Geodetic Laboratory (NGL) at the University of Nevada for sharing their GPS products with the public. Data is available through http://geodesy.unr.edu/ and also appreciates the climate data from NOAA National Weather Service https://www.weather.gov/. This research was funded by the National Key R&D Program of China (grant number 2019YFB2102700). This study was supported by the research funds from China Postdoctoral Science Foundation funded project (grant number 2020M682883), the Shenzhen Science and Technology program (grant number KQTD20180412181337494), and the Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (Grant No. 2019KQNCX126).