Abstract

Due to limited number of weather stations and interruption of data collection, the temperature field data may be incomplete. In the past, spatial interpolation is usually used for filling the data gap. However, the interpolation method does not work well for the case of the large-scale data loss. Matrix completion has emerged very recently and provides a global optimization for temperature field data reconstruction. A recovery method is proposed for improving the accuracy of temperature field data by using sparse low-rank matrix completion (SLR-MC). The method is tested using continuous gridded data provided by ERA Interim and the station temperature data provided by Jiangxi Meteorological Bureau. Experimental results show that the average signal-to-noise ratio can be increased by 12.5%, and the average reconstruction error is reduced by 29.3% compared with the matrix completion (MC) method.

1. Introduction

Temperature field data are measured at a height of at least 1.5 m above the ground, which is an important parameter to describe the environmental conditions of the land [1] and widely utilized in weather forecast. The initial research on city thermal environment phenomenon mainly employed the temperature data from meteorological stations [2]. The number of meteorological stations is often limited. In addition, the data are continuous, but not in space. Based on the above data characteristics, it is challenging to investigate the temperature-related problem over large area. Sparse temperature field data are highly correlated with low spatial variability; interpolation is usually used to obtain the temperature data missed in a region. And so far, no research on the sparse property of continuous temperature data has been conducted.

With the rapid development of sparse representation, matrix completion (MC) method [35], which extends the idea of compressed sensing to matrices, has been proposed recently. Matrix completion aims to recover a corrupted matrix from a small part of its entries. It is impossible to recover a corrupted matrix without any assumptions about the matrix. Candès and Recht [3] found if the given matrix is low rank or approximately low rank, the missing entries of the corrupted matrix can be recovered through minimizing the matrix rank. Mathematically, for a corrupted matrix , the low-rank matrix completion problem usually can be formulated aswhere denotes the rank of matrix X, represents the locations of sampling in matrix X which is the number of known entries, and is the sampling operator which obtains only the entries indexed by .

Unfortunately, matrix completion problem is NP-hard because the rank is nonconvex and discontinuous in reality. To solve the problem, Candès and Tao proposed convex nuclear norm to solve the rank minimization problem [6]. While the known entries are sampled randomly and uniformly from the unknown matrix, the missing entries can be recovered accurately if the matrix satisfies low-rank structure and incoherence condition [7]. Nuclear norm represents the sum of the singular values and can be seen as a special case of norm. It is easy to know that nuclear norm is widely adopted as a low-rank convex surrogate [8], which can be solved via the convex optimization. In order to solve the convex problem, semidefinite programming (SDP) has been proposed. Since SDP has high computational cost, several faster algorithms which are more computationally efficient than the SDP-based methods have been proposed, such as singular value thresholding (SVT) [9], singular value projection (SVP) [10], and inexact augmented Lagrangian method (IALM) [11].

Unlike the interpolation method presented in [1214], matrix completion requires the corrupted matrix to be low rank, and it works well for the case when a large portion of data is lost. Taking advantage of the low rank and spatiotemporal correlation of a matrix, MC can achieve good interpolation performance. Compared with the traditional spatial interpolation method, MC takes good use of the correlation between the data, and it could only use a few temperature field data to reconstruct the global temperature field. The reconstructed data quality is comparable to the spatial interpolation. In order to obtain good reconstruction resolution, the temperature matrix needs to be low rank based on matrix completion theory. However, the temperature field data matrix does not have a stable rank and the rank of matrix varies with time. So, we regard the gridded temperature field data as a new matrix, whose rank is more stable. Although matrix completion can recover the incomplete temperature field data perfectly, some information will still be lost in the process. In [15], the data matrix was supposed to be decomposed into a low-rank part and a sparse part, and it can be recovered individually by solving a very convenient convex program under some suitable assumptions. To recover the sparse and low-rank components of a matrix efficiently, the alternating direction method (ADM) has been proposed in [16], but the sparse part of gridded temperature field data is well suited for the application of compressed sensing (CS) due to extensive spatiotemporal correlations that result in sparser representations. The combination of compressed sensing and low-rank matrix completion represents an attractive proposition for further improving reconstruction.

In this paper, a method based on matrix completion and compressed sensing [17, 18] is presented and referred to as sparse low-rank matrix completion (SLR-MC). Different from the method proposed in [16], the low-rank part and sparse part of corrupted matrix were recovered by matrix completion and compressed sensing individually. Firstly, the temperature field data matrix is decomposed into a low-rank or an approximately low-rank matrix and a sparse matrix. Then, the low-rank matrix is reconstructed using the matrix completion method, and the sparse part is recovered using compressed sensing. The method is tested using the gridded incomplete temperature field data provided by ERA Interim and the station temperature data provided by Jiangxi Meteorological Bureau.

The rest of this paper is organized as follows. The temperature field matrix model and the proposed SLR-MC method are described in Section 2. The experiment data and results are presented in Section 3, in which the performance comparison between the MC and SLR-MC reconstruction is also given. In Section 4, a summary of the work is provided.

2. Method

In this section, we will describe the temperature field matrix model which is decomposed into a low-rank part and a sparse part, then the fundamentals of matrix completion are introduced, and finally the SLR-MC method is presented.

2.1. The Temperature Field Matrix Model

In order to overcome the influence of rank, the gridded temperature field data at each time can be regarded as a new low-rank matrix. According to [19], the gridded temperature field data T1, T2, …, TL collected over a period can be arranged in rows to a large matrix T, as shown in Figure 1.

Assume that the size of each matrix is , the rank of T1 is r1, the rank of T2 is r2, …, and the rank of TL is rL. For each single temperature field, the observation matrix may not satisfy the low-rank property; therefore, the MC method cannot be directly used to reconstruct missing data or lost data. However, due to structure similarity and strong correlation among the matrices T1, T2, …, TL, the rank R of matrix T is smaller than max (r1, r2, …, rL), and a matrix can be decomposed into two parts: a low-rank matrix TM (few nonzero singular values) and a sparse matrix TS (few nonzero entries):where rank () (m, n) and sparsity () .

Figure 2 illustrates an example of the decomposition result.

2.2. Fundamentals of Matrix Completion

Matrix completion is the technique of completing missing values of a matrix with a subset of entries selected randomly and uniformly from a low-rank matrix or an approximately low-rank matrix [3, 15]. The incomplete matrix M can be recovered by solving the following rank minimization problem [3]:where rank (X) denotes the rank of a matrix X, and the sampling operator : is defined as follows:

We use to represent the cardinality of which is the number of known entries. For example, suppose the matrix X is

If we have three elements known as we can have

However, the problem in equation (3) is NP-hard and impossible in practice. Candès and Recht proposed a nuclear norm minimization model to solve the following rank minimization model:where the nuclear norm is the summation of the singular values of X.

Unfortunately, we cannot recover any low-rank matrix (even its rank is 1) if the sampling entries in any row or column are completely missing. Suppose a matrix is of rank 1 and we do not have samples from the second column, the matrix cannot be recovered because no one can obtain all the exact entries of the second column using any method. In order to recover an unknown matrix, at least one observation in each row and column should be available. Candès and Recht [3] proved that if is sampled uniformly and randomly among all subset of cardinality m, we can solve the problem (7) with high confidence where the number of samples should obey .

In order to recover the incomplete matrix exactly, there is a restriction on the range of rank r. The selection of rank has a great influence on recovering low-rank matrix, and we use a small range of rank values and choose the value that results in the best performance (in Section 3, the rank is selected as 7).

2.3. Proposed Method

Considering model (2), the low-rank part and sparse part from the corrupted matrix T were supposed to be recovered. According to [20], a low-rank matrix or an approximate low-rank matrix can be reconstructed using the MC method. As shown in [21], a sparse matrix can be recovered with compressed sensing. Therefore, and in (2) can be obtained through

Problem (8) is a nonconvex optimization problem, where denotes the number of nonzero value, and is a tuning weight that balances the contribution of the -norm term relative to the rank minimization term and should be greater than 0. Problem (8) is extremely difficult to calculate and NP-hard, so it can be converted to the following convex optimization problem:where is the nuclear norm of and represents the ith singular value of (sorted in decreasing order). Problem (9) is also known as principal component pursuit (PCP), which can be solved by the augmented Lagrange multiplier (ALM) algorithm given in the following equation:where μ is a positive scalar, is a positive weighting parameter, the Lagrange multiplier Y is introduced to remove the equality constraint and A + E = D, denotes the Frobenius norm , and represents the inner product operator. For a given Y, A and E are determined as the values that make L (A, E, Y) reach the minimum. So, it is supposed that TM can be recovered by problem (10). Different from the method proposed in [15, 16], to solve the sparse and low-rank matrix decomposition, the sparse part was obtained by the compressed sensing method. This method represents a combination of augmented Lagrange multiplier used for matrix completion and compressed sensing used for sparse reconstruction.

3. Experimental Results

3.1. Gridded Temperature Field Data

We implemented our algorithms in MATLAB 2016. The experimental temperature field data used for testing the method are provided by ERA Interim of ECMWF (European Centre for Medium-Range Weather Forecasts) which can be obtained from the following website: https://apps.ecmwf.int/datasets. The data were collected from Asia at 00 am, 06 am, 12 pm, and 18 pm on January 1, 2014, at a height of 2 m above the ground, and the grid resolution was 0.75 degrees. The region of the study is at and , and the size of the region is 200 × 200.

Figure 3 shows the gridded temperature field data selected at 06 am on January 1, 2014, which is represented by a matrix of size 200 × 200. The value of the temperature is from 210 K to 320 K and the grid resolution is 0.75 degree. Figure 4 shows the sampled data of the global temperature at 06 am on January 1, 2014 (the sampling number is 15680), to which the reconstruction methods are applied.

4. Results

Both the MC method and the proposed algorithm are tested in this section. It can be seen from Figure 5(b) that the recovered global temperature field at 06 am on January 1, 2014, using the MC method agrees well with the original one (the rank of Figure 5(a) can be estimated by LMaFit [22] and is selected as 7 in this section).

The results at high latitudes near the North Pole and low latitude areas near the equator are both satisfactory. Although the recovered global temperature field from the low-rank matrix using the MC method is good, the recovery results in some areas are not very satisfactory since the local temperature data property is not considered. For example, red rectangle in Figure 3 presents circumpolar latitude area, and its temperature varies from 220 K to 250 K. The corresponding recovery results are from 230 K to 250 K. In other words, the temperature data lower than 230 K have not been recovered successfully. Thus, the data in the red rectangle are selected for further analysis.

Table 1 ((a) and (b)) shows the original and recovery temperature field in the red rectangle on January 1, 2014, at 06 am, respectively. Comparison of the value at same position in Table 1 ((a) and (b)) shows that the difference is about 5 K to 9 K. The analysis results for low latitude area in the black rectangle shows similar performance in Figure 3. The true temperature data range from 300 K and 310 K. However, the recovery results using the MC method are less than 307 K, i.e., the temperature data higher than 307 K have not been recovered. From Table 2 ((a) and (b)), it can been seen that the recovered temperature field data are all lower than corresponding original data with an average temperature difference of 4 K. The above test shows that the performance of the MC method using low-rank matrix alone is not ideal. As mentioned earlier, the SLR-MC method can improve the reconstruction performance of global temperature field. The temperature field data collected at different times were used to test the proposed method. The original gridded temperature data (see Figure 6) at four moments (00 am, 06 am, 12 pm, and 18 pm) on January 1, 2014, were studied. The sampling number is 15680, and matrix rank is 7. The same data shown in Figure 3 (i.e., 06 am in Figure 6) were studied first. For the high latitude region in the red rectangle, the recovered temperature using the proposed method varies from 225 K to 250 K.

The point-to-point comparison is shown in Table 1 ((a) and (c)). It can be seen that the temperature difference is reduced from 7 K to 3 K, which is smaller than that in Table 1 (b). Using the SLR-MC method, the reconstruction error can be reduced significantly, which means the recovered temperature field is closer to the original one. Similarly, it is also found that the SLR-MC method can recover temperature field data higher than 307 K (in the black rectangle).

As illustrated in Table 2 ((a) and (c)), the recovered and original temperature field data at 06 am on January 1, 2014, were very close to each other. The average error was 1 K and less than that of MC. It can be concluded that the reconstruction results using SLR-MC are more accurate. For the regions with large temperature variation, the recovery performance is more satisfactory.

In this work, both reconstruction error (RE) and signal-to-noise ratio (SNR) are used to evaluate the recovery performance of the two methods. The RE is defined as follows:where T is the original temperature field data, Tr is the reconstructed data, and norm represents the 2-norm. The SNR is defined aswhere .

From Figures 7 and 8, it can be found that the RE and SNR of the SLR-MC method are lower and slightly higher, respectively, than those of the MC method for which the details are provided in Table 3. The average SNR for the four moments is increased by 12.5% using the proposed method, while the average error is 29.3% lower.

4.1. Station Temperature Data

In this section, we evaluate the performance of the SLR-MC method on the reconstruction of station temperature data. In this experiment, we have collected the temperature data at 92 national weather stations in Jiangxi, China. Figure 9 shows the longitude and latitude of stations, where the blue points represent the location of the national weather stations in Jiangxi. Each station reports its temperature data once a day to the monitoring center, and we have downloaded the data from January 2017 to March 2017. We put each station data into a vector and arrange the vectors into a large matrix. The data matrix has been set as M = 87 (only the data from 87 stations are used) and T = 90 (which represents the length from January 2017 to March 2017).

As shown in Figure 10, the row number of the temperature matrix represents the time from January 2017 to March 2017 and the column number represents the locations of 87 stations. Figure 11 shows the sampled temperature data matrix which is selected randomly and uniformly (the sampling number is 4698), and the blue dots represent the corrupted temperature data and the red dots represent the sampled temperature data. The size of matrix is 87 × 90, and the value of the temperature is from 0 K to 300 K. The reconstructed temperature data are shown in Figure 12. As shown in Figure 12, both MC and SLR-MC methods can capture the main feature of the original temperature data matrix.

The recovery results of the SLR-MC method can capture the local feature of original matrix and more key variation details, while the MC method often loses the information. The SLR-MC method may not have a significant improvement compared with the MC method because the changed temperature values only occupy a small portion of all temperature values in the matrix. Thus, the data in the white rectangle in Figure 12 are selected for further analysis.

The white rectangle in Figure 12 represents an area with significant temperature variation from 270 K to 290 K. Table 4 shows the original and reconstructed temperature data in the white rectangle. The white rectangle size is 8 × 6, which indicates the data matrix obtained by 8 stations (see Table 4 (a)) from time slots 30 to 35. Comparison of the value at same position in Table 4 ((a), (b), and (c)) shows that the difference is about 1 K to 7 K, which means both MC and SLR-MC methods can capture most information of the original matrix. Compared to MC, the data matrix in Table 4 (c) recovered by the SLR-MC method is closer to the original data matrix in Table 4 (a). For example, the reconstruction error (RE) between Table 4 (a) and Table 4 (b) is 1.08E (−2) while that between Table 4 (a) and Table 4 (c) is 9.43E (−3). The above test shows that the performance of the SLR-MC method is better than the MC method.

5. Conclusion

In this paper, the MC and SLR-MC methods were examined to determine which technique is appropriate for retrieving missing temperature data. Instead of using the alternating direction method (ADM) proposed in [16] to recover original corrupted matrix data, the SLR-MC method separates the clean low-rank matrix from the corrupted data effectively and applies matrix completion to fully exploit the low-rank features of temperature field data. The sparse matrix is reconstructed using compressed sensing to fully capture the sparse features of temperature field data. We have demonstrated the better performance of the SLR-MC method on gridded temperature field data and point temperature data from corrupted observations. Experimental results from gridded temperature field data confirm that the average SNR is increased by 12.5% and the average error is reduced by 29.3% using the SLR-MC method. The SLR-MC method can also be applied to many other meteorological data with appropriate modification.

Data Availability

The supplementary materials were provided by ERA Interim of ECMWF (European Centre for Medium-Range Weather Forecasts) and Jiangxi Meteorological Bureau. The data provided by ERA Interim were collected from Asia at 00 am, 06 am, 12 pm, and 18 pm on January 1, 2014, with a spatial resolution of 0.75 degrees, and the data provided by Jiangxi Meteorological Bureau were collected from 92 national weather stations in Jiangxi from January 2017 to March 2017.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

The authors appreciate ERA Interim of ECMWF for providing observation data. This study was supported in part by the Major Program of National Natural Science Foundation of China (no. 91437220), Jiangxi Province Science Foundation for Youths (no. 20171ACB21038), JiangXi Municipal Science and Technology Project (no. 20171ACG70017), and China Scholarship Council (no. 201808360089).

Supplementary Materials

Array size: 200 × 200; variable: 2 metre temperature unit = “K.” The original temperature data have been recorded in Tabel original.xlsx. The original temperature data tested by the MC method have been recorded in Tabel MC.xlsx. The original temperature data tested by the SLR-MC method have been recorded in Tabel LRC-MC.xlsx. (Supplementary Materials)