Abstract
Due to the influence of data collection methods and external complex factors, missing traffic data is inevitable. However, complete traffic information is a necessary input for route planning and forecasting tasks. To reduce the impact of missing data problems, this paper uses the lowrank tensor completion framework based on TSVD to complete the missing spatiotemporal traffic data, the aim is to recover a lowrank tensor from a tensor with partial observation terms, and the WLRTCP model is proposed. We use the idea of direction weighting to solve the dependence of the original model on the data input direction, extract each direction correlation information of the tensor spatiotemporal traffic data, and use the pshrinkage norm to replace the tensor average rank minimization problem, and the study shows that the pshrinkage norm is tighter than the tensor nuclear norm and, finally, uses the alternating direction method of multipliers to solve this model. Experiments on two publicly available spatiotemporal traffic datasets verified the conjecture of data input direction’s influence on the completion accuracy, and compared with the existing classical model methods, WLRTCP has high precision and generalization ability.
1. Introduction
With the rapid development of Internet technology and traffic informatization, the scale of traffic data [1] is getting larger and larger, and intelligent transportation system (ITS) has been widely used in the management and control of roads as an important means to improve the efficiency of the transportation system and achieve sustainable development [2, 3] of transportation. In recent decades, the development and research on ITS, which integrates road, vehicle, traffic participants, and hightech factors, have been flourishing and gaining great progress, which is the future direction of transportation system development.
As the core of ITS, the traffic information acquisition system collects road traffic flow data from various sensors to form a complete urban traffic data set. At present, there are four main ways of collecting traffic flow data on urban expresswaysinduction coil, remote microwave, floating vehicle detection, and video detection. However, due to the acquisition equipment is highprecision equipment, vulnerable to the influence of external environments such as weather, temperature, sensors, communication equipment failure, and other factors, resulting in the final acquisition of traffic flow data abnormal situations, therefore, the problem of data missing [4, 5] in sensors and traffic information system is very common. The missing problem seriously affects a series of activities in intelligent transportation, such as monitoring traffic situations, predicting traffic flow [6–8], and deploying traffic planning.
Therefore, the restoration of missing data on traffic flow is necessary, and how to recover the missing data quickly has become a hot issue in the field of intelligent transportation. For the existing common missing data phenomenon in the traffic data collection process, domestic and foreign scholars have proposed a series of solutions to recover the missing data and achieved certain results. However, in the field of traffic data recovery, there are still many open problems worth studying and exploring, for example, with the rapid development of urbanization, the urban traffic situation becomes more complex, and there are more and more factors affecting the traffic condition. Finding and using the most relevant features for data modeling among many factors and further recovering the missing data are a hot research topic in the field of traffic data recovery at present. For this reason, effective completion of missing traffic data is of great research significance at theory and practice levels.
2. Related Work
Spatiotemporal traffic data recovery methods can be divided into three categoriespredictionbased, statistical learning, and other machine learning methods. Among them, predictionbased methods will generate large data processing errors when repairing the missing data set, while most other machine learningbased methods have low interpretability; therefore, the existing research mainly focuses on using statistical learning methods for traffic data restoration. From the perspective of time development, statistical learningbased methods have mainly gone through three stagesvectorbased, matrixbased, and tensorbased, and the research shows that the repair accuracy based on matrix and tensor is higher.
Among the matrixbased methods, Bayesian principal component analysis (BPCA) and probabilistic principal component analysis (PPCA) proposed by Qu et al. [9, 10] in the early stage are the most representative. Subsequently, Tan et al. [11, 12] first introduced the tensor mode to the traffic data completion and proposed the TDI model based on Tucker decomposition, showing that the tensor model could cover more spatiotemporal traffic information, who studied the mode correlation information of tensor traffic data and tried to recommend the most suitable tensor model for the traffic flow data completion. After that, the tensorbased completion methods began to be applied in the traffic data completion field. Then, Liu et al. [13, 14] proposed three classical lowrank tensor completion models, SiLRTC (simple lowrank tensor completion model), FaLRTC (fast lowrank tensor completion model), and HaLRTC (high accuracy lowrank tensor completion model), respectively, which gave a generalized computational method for the completion problem and highlighted the accuracy advantage of the model and the characteristics of highdimensional data in comparison with matrix completion. Tan et al. [15] further combined Tucker with Wopt to propose the Tucker–Wopt model, and it was shown to successfully reconstruct tensors with noise and up to 95% missing data; Chen et al. [16, 17] then extended the Bayesian decomposition of the matrix to the tensor model and combined it with the CP decomposition of the tensor proposed the BGCP model, and after adding the time regularization parameter proposed the BTTF mode. Chen et al. [18–20] have since resolved and optimized the HaLRTC model and proposed LRTCTNN (lowrank tensor completion model based on truncated nuclear norm), LSTCTubal (lowtubalrank smoothing tensor completion), and LATC (lowrank autoregressive tensor completion) model, and experiments on real spatiotemporal traffic datasets have obtained good results. Zemin et al. [21] proposed the second tSVDbased LRTC framework using the concept of tensor average rank, which transfers the tensor pattern to the Fourier domain for solving and did not need to expand the tensor to break the spatiotemporal correlation information of multidimensional channels. Song et al. [22] proposed a weighted residual model of the tensor (TWTNNR) on this basis, which was solved by the gradient descent method and obtained good results in the image domain. Cl et al. [23] proposed the LRTCp model based pshrinkage norm, pshrinkage norm replaced the original tensor nuclear norm, and proved the way of pshrinkage norm is more compact and can achieve better results. Kong et al. [24] proposed the Schattenp norm and used the Schattenp norm to replace the multiple TNN or nuclear norm of tensors, showing strong applicability in the two LRTC frameworks.
However, the LRTC framework based on multiple TNNs requires the tensor to be expanded into multiple weighting matrices for the completion computation when recovering the tensor, which may break the modal correlation of the tensors and lose key information; the tSVDbased LRTC framework operates only on the frontal slicing of the input data when processing the data and has no prior in spatiotemporal traffic data restoration. To address these problems, we propose a weighted optimization model based on the pshrinkage norm on the second LRTC framework, which solved the above problems and applied the completion of traffic data.
The main contributions of this paper are twofold1) we proposed to apply the tSVDbased LRTC framework for modeling, integrating the weighting idea of the first framework, extracting the correlation information in each direction of the tensor, reducing the influence of dependence due to the data input direction, applying the pshrinkage norm to preserve the strong correlation information inside, defining the model as WLRTCP, and finally, using the alternating direction multiplier method to solve it.
2) Two publicly available spatiotemporal traffic datasets are selected for experimental analysis, and the missing rate is set to 20%–80% under different missing scenarios. Experiment 1 verified the influence problem of data input direction on the completion accuracy; Experiment 2 compared with the existing highprecision completion models and highlighted the accuracy advantage and generalizability capability of the WLRTCP model.
The rest of the paper is organized as followsin section 3, we briefly introduced the tensor basis and the classical tensor completion model. In section 4, the tensor pshrinkage norm and the external weighted way are introduced, and the WLRTCP model is proposed and solved. In section 5, the problem of the effect of data input direction is verified, experimented on several publicly available datasets, and compared with several stateoftheart baseline models. In section 6, this study is summarized.
3. Tensor Notation and Tensor Completion Model
3.1. Tensor Basis
In this section, some basic notations are first introduced, and then, some necessary definitions [25] are briefly provided for use.
Use bold Euler letters to denote the tensors, e.g., ; use bold capital letters to indicate the matrices, e.g., ; use bold lowercase letters to indicate vectors, e.g., ; scalars are represented by lowercase letters, e.g., ; a unit matrix of size is denoted by , and the real and complex domains are denoted by and , respectively.
For a 3way tensor , the elements are represented in the same way as matrices, such as denotes the th term of the tensor . For the slice fibers of the tensor, , , and denote the th horizontal, lateral, and frontal slices, respectively, and the slices of the tensor are in the form of a matrix. Besides, the frontal slices is denoted by , and is noted as the conjugate complex of , which is meant to take the complex conjugate of all elements in .
The inner product of tensors and is defined as ; the trace of a tensor is defined as ; the Frobenius norm is defined as ; and the expansion and collapse mapping operators are defined, respectively, as
The is an expansion operator that projects the tensor to the matrix, and the size after the expansion is .
For , represents the result of the discrete Fourier transformation (DFT) of along the third dimension, and the tensor of the Fourier domain can be recovered to the real domain by the inverse discrete Fourier transform (IDFT), . For , we define as a block diagonal matrix in which each frontal slice of lies on the diagonal, denoted as
The block circulant matrix of tensor is defined and is noted as
An important property here is that the block circulant matrix can be block diagonalized in the Fourier domain aswhere is denoted as a discrete Fourier transform matrix of size , denotes a matrix of size , and denotes the Kronecker product.
Using the properties of the discrete Fourier transform, for , the frontal slice is taken by default, so the discrete Fourier transform is carried out along the third dimension, and then, the conjugate symmetry of the realvalued signal transformed to the Fourier domain is
The nature of conjugate symmetry of realvalued signals in the frequency domain is elaborated using the properties of the discrete Fourier transform, which will help to avoid redundant calculations.
Definition 1. (tproduct [26])The tproduct between and is defined as a 3way tensor of size .Tproduct is identical to matrix multiplication in form except for cyclic convolution and expansion operations. It is worth noting that tproduct reduces to matrix multiplication when .
Definition 2. (Conjugate transpose and orthogonality of a tensor)Given a tensor , its conjugate transpose is , which is obtained by conjugate transposing each slice of and then reversing the order of the transposed frontal slices 2 through . Given a tensor , if it satisfies the orthogonality condition, then there is and is the tproduct.
Definition 3. (Unit tensor and Fdiagonal tensor)Given a unit tensor of size , the first frontal slice of the unit tensor is a unit matrix of size , and all other forward slices are zero. Given a tensor , if each of its frontal slices is a diagonal matrix, then this tensor is called an Fdiagonal tensor.
Theorem 1. (t–SVD[26])For a 3way tensor , there exists tensor singular value decomposition:where and are two orthogonal tensors, is an Fdiagonal tensor. According to (4), tSVD can be performed efficiently based on matrix singular value decomposition in the Fourier domain. In the Fourier domain, each frontal slice can be independently singular value decomposed, . Applying this property, we can obtain in the Fourier domain and, finally, use the inverse discrete Fourier transform to obtain . The tensor singular value decomposition is shown in Figure 1.
The process of decomposing a thirdorder tensor through a tensor tproduct is called tSVD, and the algorithm is summarized as follows:

Definition 4. (Tensor tubal rank and average rank [21])Given a tensor , assuming a tSVD of for tensor , the tensor tubal rank of is noted as , define the number of nonzero terms on ; i.e.,The concept of the average rank of a tensorgiven a tensor , the average rank is noted as , defined as follows :
Definition 5. (Spectral norm and nuclear norm)for a given tensor , its tensor spectral norm is defined as .
For the tensor nuclear norm, assuming that the tSVD of the tensor is for tensor , the tensor nuclear norm is defined as the sum of the tensor singular values, asBut the tensor nuclear norm is only determined by the first frontal slice , which is different from the nuclear norm of the matrix. Note that is induced by the tproduct, it is the dual norm of the tensor spectral norm. is the convex envelope of the tensor average rank, and using (3), the following relation (11) can be obtained:
Theorem 2. (Tensor singular value thresholding [27])given a tensor , its tSVD is .
Given parameters and , the tensor singular value thresholding is defined as (12)For any > 0 and , tensor singular value thresholding is related to the nuclear norm as follows:
3.2. Tensor Complementary Model
The tensor completion framework based on tSVD was first proposed by Zemin et al. and introduced the problem of minimizing the average rank of tensor, which can be regarded as a highdimensional extension of matrix completion. Similar to the classical HaLRTC model problem, it aims to accurately fill the positions of unobserved elements with the correlation information inside the data. Experiments on image and video processing show that the algorithm based on lowrank tensor completion is numerically superior to matrix completion and other tensor completion methods. However, the LRTC framework based on tSVD does not have a priori in the problem of spatiotemporal traffic data, therefore, for partially observed spatiotemporal traffic data, constructed as tensor structure , and applied the concept of tensor average rank to model and analysis, as follows:where is a tensor to be recovered; is an observation tensor with missing elements; is the problem of finding the average rank minimization of the completion tensor. The constraints require that while solving the rank minimization, the position elements of the original tensor with observations are equal to the elements of the relative positions of the completion tensor , and is the set representation of the observation elements of the tensor.
For an arbitrary tensor , where denotes the set of element positions with observations, and denotes the set of element positions without observations, and the two operators and have a complementary relationship with each other, .
The rank minimization problem in (14) for tensor completion is NPhard and computationally intractable. According to Definition 5, the convex envelope of average rank is its nuclear norm , therefore, using the tensor nuclear norm to replace the tensor average rank minimization problem, and as the convex envelope of tensor average rank minimization:where means minimizing the nuclear norm of the tensor , thus, transforming the original problem into a solvable form.
4. WLRTCP Model
4.1. PShrinkage Norm
First, describe the definition of the matrix pshrinkage norm [28]given a matrix , as a map of , the proximal operator of the pshrinkage norm is , and the pshrinkage norm is defined aswhere means taking the pshrinkage norm of the matrix, means taking the th singular value of , and the singular value size ordering of matrix is .
Extending the pshrinkage norm to the tensor model, for tensor , the pshrinkage norm under the tensor singular value decomposition can be defined as follows:
Corollary 1. Optimal solution for any , , and , problem:Given the generalized singular value thresholding [29], defined the threshold operator for the pshrinkage norm as , we have is a threshold decomposition for each slice of the tensor, and the whole process can be done in the Fourier domain, so there have denotes the tensor singular value of , and for any , the additive function is defined as . Pshrinkage norm is meant to preserve the characteristics of large singular values, which shrinks less for larger singular values and penalizes excessively for smaller ones, and can be seen as another way of threshold shrinkage.
Based on the inference of [23], when , is a nonconvex envelope of the average rank of the tensor in the unit sphere of the spectral norm, and it is tighter than the tensor nuclear norm, and we have .
4.2. WLRTCP Model
The original lowrank tensor completion framework is based on tSVD as in (14), when there is data input, the nuclear norm of the tensor is only related to the first frontal slice of the tensor, and the lateral and horizontal slices are not processed; however, the frontal slice of the tensor contains data information in only two dimensions, which is contrary to our idea of using tensor modeling. Thus, to avoid the dependence of the model on the data input direction, and to solve the problem that oneway slicing does not fully utilize the correlation information within the data, inspired by the use of multiple TNN instead of the tensor rank minimization problem by LIU et al., take the weighted nuclear norm of the threeway slice to instead of the tensor average rank minimization problem, and as much as possible to reduce the influence of data input direction. Define the three tensors [30] size as shown in Figure 2, , , , using three tensors weighting to replace the original tensor, weighting is desired to retain each direction data channel’s correlation information to improve the completion accuracy. So, we weighted the original tensor completion model and performed parameter optimization to recreate the model. Therefore, for the original problem (14), the model can be rewritten as follows:
The sum of weights is 1. For the external threeway weighting, the tensor nuclear norm is taken in three directions, and the nuclear norm in each direction is given bywhere , are the corresponding blockdiagonal matrices obtained from each direction tensor by (2).
Adding the idea of the pshrinkage norm, the model is rebuilt and defined as WLRTCP. To reduce the dependence effect, introduced three thirdorder auxiliary tensor variables and an additional set of constraints , and transformed the above problem into a tractable form as follows:
To solve this optimization problem, a straightforward and widely used approach is the Alternating Direction Method of Multipliers (ADMM [31]) framework. The ADMM framework is mainly applied to solve largescale problems and to solve optimization problems with multiple nonsmooth terms in the objective, and the general idea is to decompose the large global problem into multiple smaller, easiertosolve local subproblems by decomposing the coordination process, and to obtain the optimal solution of the global problem by coordinating the solutions of the subproblems. Defined the enhanced Lagrangian function, as follows:where is the Lagrange multiplier, is the penalty parameter after adding the constraint, and the auxiliary variable is used for the dual update under ADMM. According to the solution of the ADMM framework, decompose into local subproblems, respectively, and the optimal solution of the problem is approximated by alternating updates, fixing the other position elements in the process, and the order of iterative updates is as follows:(1)First update , fixing the parameters of and . is the tensor singular value thresholding decomposition, where .(2)Update , fixing the parameters of and . Adding constraint , we have(3)Finally solve for :
Set the number of iterations, update each iteration according to the above steps, and seek the optimal solution of the model. Thus, the steps of the WLRTCP algorithm are as follows:

In the whole solution process, the influence of noise is not considered, and the WRTCP model only considers the reconstruction accuracy of the given data. Similar to most statistical learning models, it improves the recovery accuracy of traffic flow data by iterating the existing lowrank problem of observed data, so the problem of data crossing will not occur in the migration process, and it is suitable for most lowrank traffic flow data restoration problems.
5. Experiment
In this section, experiments are conducted on the WLRTCP model using publicly available spatiotemporal traffic datasets, to verify the conjecture of the directional dependence influence of the data input, and to evaluate the algorithm for its efficacy and generalizability capability.
5.1. Data Set Settings
Experiments are conducted using two publicly available datasets, both collected from real spatiotemporal traffic systems, and experiments on real datasets can verify the applicability of the algorithm in real scenarios. Both datasets have the same tensor pattern “location/sensor × day × time” except for the difference in tensor size.(1)An urban traffic dataset for Guangzhou, China, for 61 days from August 1, 2016, to September 30, 2016, at 10minute intervals, consists of 214 anonymous road segments (mainly composed of urban highways and arterial roads). According to the spatiotemporal attributes, establish a thirdorder tensor with the size of 214 × 61 × 144, where the dimensions of the tensor are road sections, dates, and time windows, respectively. For simplicity, this dataset is referred to as “G” for short.(2)Seattle freeway traffic speed data collected by Cui et al. [32]. This dataset contains the freeway traffic speed in Seattle, USA, for the whole year of 2015 from 323 loop detectors with a resolution of 5 minutes. Select a subset of January (4 weeks from January 1st to January 28th) as experimental data, and build tensor data with a size of 323 × 28 × 288 according to the spacetime attributes, where each dimension of the tensor is road sections, dates, and time windows, respectively. For simplicity, this dataset is referred to as “G” for short.
Constructed the traffic data as a tensor pattern of links/sensors, the dates, and time window ensures that the tensor data are sufficiently low rank on each slice, to maximize the use of internal correlation information and thus improve the data restoration accuracy where the link dimension represents the spatiallevel features of the traffic data, and the date and time windows represent the timelevel features.
5.2. Baseline Model
Compared the WLRTCP model with the following baseline model:
HaLRTC [14]highprecision lowrank tensor completion model, the most classical lowrank tensor completion model (LRTC), is based on the minimization of the nuclear norm of the tensor and is solved using the alternating multiplier method (ADMM) framework.
BGCP [16]Bayesian Gaussian CP decomposition, a full Bayesian tensor decomposition model, uses Markov chain, Monte Carlo, to learn the potential factor matrix (lowrank structure).
CP_ALS [33]CP decomposition model, the classical tensor completion model, is solved by least squares and gradient descent methods, which is more accurate than the original solution method.
TRMF [34]temporal regularized matrix decomposition is a temporal matrix decomposition model applied to a multiple autoregressive (AR) process to model the underlying temporal factors.
For the selection of baseline model, considered two data construction methods. For matrixbased completion, the TRMF model was selected for comparison, and the data were processed as “location/sensor × time”; for tensorbased completion, the tensor data model was constructed as “location/sensor × day × time,” and the classical CP_ALS, HaLRTC, and BGCP models are selected for comparison. In addition, we have other considerations, the WLRTCP model does not consider the influence of the directionality of the input tensor slices, and the original model is compared with the proposed model to show the importance of spatiotemporal information.
5.3. Experimental Setup
The datasets given in the experiments all tend to be complete, and for the model’s completion accuracy comparison, the true and completion values of the datasets need to be used to calculate the metric, so when conducting the experiments, the data need to be lost according to the set scenario with the missing rate, and then, the missing values are filled by the completion algorithm. For accuracy comparison, the accuracy of completion is judged by comparing the value of MAPE (%) and RMSE. The definitions of MAPE (%) and RMSE are given as (29)
In real life, traffic data are missing in two waysrandom missing (RM) and nonrandom missing (NM). The RM case is aimless, mostly caused by unstable signals during communication and transmission, while the NM case data are destroyed in a related way, mostly caused by sensor damage. The two missing scenarios set allowed for a better evaluation of the different model’s performance and effectiveness.
Taking two publicly available missing datasets as an example, randomly select a link, to show the data sparsity under 30% and 60% missing spatiotemporal traffic data in a day, as shown in Figure 3.
(a)
(b)
(c)
(d)
5.3.1. Parameter Setting
Refer to the way of original LRTC framework was set up. For the iteration step , set the initial value to = 1e8/1e9, set the maximum value of as 1e5, and update by in each iteration update; for the weighting parameter , set the sum of parameters to 1, where ; values to ; for values, set to 0.2; for the convergence condition, use to determine whether the algorithm converges; for the convergence accuracy limit , set the value of to 1e4; for the selection of the number of iterations , set to 200 can guarantee the convergence of the model.
5.3.2. Experimental Scheme
According to the actual collection of traffic data, we set two groups of experiments in the RM scenarios and the NM scenarios. In two missing scenarios, the missing rates of G and S are set to 20%–80%, respectively. The first experiment is to judge the influence problem of data input direction (slices in different directions) on the completion accuracy by comparing three data sets with different input directions in the same scenario to verify our conjecture. The second is to conduct accuracy comparison experiments, to compare with the baseline model by experiments in the same scenario, and to analyze the performance of the model. In the comparison of the accuracy values in both sets of experiments, the smaller the values of RMSE and MAPE (%), the higher the completion accuracy of the algorithm, and the higher the accuracy advantage and model generalization ability.
5.4. Experiment
Experiment 1. Influence of data input direction (slicing) on completion accuracy:
In this experiment, according to the above parameter settings, we use the one way of the original model (14) when the data input directions are , , and , and the completion accuracy of traffic data is related to frontal slicing, lateral slicing, and horizontal slicing, and in the experiment, use direction 1, direction 2, and direction 3 to replace them, respectively. The following Figure 4 shows the comparison of slices in three different directions under different missing rates and missing scenarios.
As we can see, in most cases, the accuracy is highest when the data input direction is direction 1, and the completion accuracy is close to direction 1 when the data input direction is direction 2. In the NM scenario of the S data set, the accuracy of direction 2 is better than that of direction 1 in some cases of missing rate, and the accuracy of direction 3 is the worst in comparison. For this reason, we applied the correlation information of other directions in the subsequent WLRTCP model by way of direction weighting to improve the accuracy. The above experiments verified our conjecture that the different input directions of the traffic data will lead to a significant influence on the accuracy of the subsequent completions.
(a)
(b)
(c)
(d)
Experiment 2. Comparison of completion accuracy under different missing scenarios and missing degrees:
The missing rates of G and S are set to 20%–80%, respectively. Through experiments, comparing the completion accuracy of the algorithm under different scenarios and different missing degrees, the performance of the WLRTCP model is evaluated by comparing it with the baseline model.
For the baseline models, to represent the completion performance of the matrix decomposition model (TRMF), and tensor decomposition models (CP_ALS, BGCP, and HaLRTC) on the spatiotemporal dataset, the same settings as in the previous work are followed, and all matrix and tensor decomposition models are configured with the same number of rank functions and iterations, setting the rank of CP_ALS, BGCP, and TRMF to 50 in the random missing case and 10 in the nonrandom missing case. The comparison results are shown in Table 1 and Table 2.
5.4.1. Experimental Analysis in the RM Scenario
In the case of random missing, in G and S data sets, from the value of RMSE/MAPE (%), the completion performance of WLRTCP is significantly better than other classical baseline models at different missing rates (20%–80%), and with the increase of the missing degree, for HaLRTC and BGCP, the accuracy of WLRTCP model is more obvious, while CP_ALS and TRMF are relatively stable, and the decreasing trend of accuracy is similar to WLRTCP model. Overall, the performance of the WLRTCP model is optimal in the case of RM, and in extreme cases, its completion effect is more stable than the baseline models, with strong generalization and robustness.
5.4.2. Experimental Analysis in the NM Scenario
In the case of nonrandom missing, in G and S data sets, the completion accuracy of WLRTCP is also consistently better than other baseline models. When the missing rate is 20%–60%, the completion accuracy of WLRTCP is not obvious at this stage, but as the degree of missing rate increases, when the missing is 70%–80%, the completion accuracy advantage of WLRTCP began to show, and most of the baseline models (CP_ALS, HaLRTC, and BGCP) showed distortion in the case of extreme absence. The completion accuracy of TRMF is also high in the case of extreme missing in G dataset, but the generalization is relatively lower than the WLRTCP model, and the effect is not good in the extreme case of S dataset. WLRTCP uses the pshrinkage norm to eliminate the singular values with less information, retain large data features, and compress enough correlation information. It can still ensure high accuracy in the extreme case of nonrandom shrinkage.
Overall, under RM and NM scenarios in both datasets, the WLRTCP model works optimally for the entire spatiotemporal traffic data recovery problem compared with the baseline model, has the highest data recovery accuracy and model generalizability capability, and performs more significantly in the case of extreme data deficiencies. The threeway weighting approach is able to extract correlation information for each slice direction of the tensor data and use the pshrinkage norm in each direction to replace the nuclear norm, which is a tighter convex envelope for the rankminimization problem and is able to compress to get more internal correlation information, and the two optimization methods make the completion accuracy of the WLRTCP model higher than most baseline models.
Here, in two missing scenarios, the completion renderings of the two data sets with the missing rate of 60% are shown in Figure 5.
(a)
(b)
(c)
(d)
6. Conclusions
In this paper, we proposed the WLRTCP model, which reduces the dependence of the completion accuracy on the data input direction by direction weighting, extracts the data correlation information, improves the completion accuracy by using the pshrinkage norm as a convex proxy of the tensor average rank, and solves it by the alternating direction method of multipliers. Experiments are conducted in two publicly available datasets with random missing scenarios and nonrandom missing scenarios. The data input directionality experiment shows the necessity of weighting. Subsequent comparison of completion accuracy with the current classical model shows that our proposed weighted model achieved good results in most scenarios.
Data Availability
The experimental datasets are publicly available, which are the urban traffic dataset in Guangzhou, China, and the highway traffic speed data in Seattle, U.S.A. The [tensor data] data used to support the findings of this study have been deposited in the [datasets] repository (https://github.com/xinychen/transdim).
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Authors’ Contributions
Mr. Wu and Ms. Zhao were the main authors of the article. Mr. Wu put forward innovative ideas and jointly deduced them with Ms. Zhao and Mr. Hu. Among them, Mr. Wu and Ms. Zhao contributed the most. Then, Professor Zhang was the mentor, and he guided our creation together with Mr. Zeng. During the experiment, Mr. Li pointed out the problems in the representation of data features and the setting of real data scenarios.