Abstract

Data injection attacks in a cyber-physical system aim at manipulating a number of measurements to alter the estimated real-time system states. Many researchers recently focus on how to detect such attacks. However, most of the detection methods do not work well for the nonlinear systems. In this paper, we present a compressive sampling methodology to identify the attack, which allows determining how many and which measurement signals are launched. The sparsity feature is used. Generally, our methodology can be applied to both linear and nonlinear systems. The experimental testing, which includes realistic load patterns from NYISO with various attack scenarios in the IEEE 14-bus system, confirms that our detector performs remarkably well.

1. Introduction

A cyber-physical system (CPS) is a dynamical system, which integrates the computational components (i.e., real-time operations) with its physical components (i.e., hardware facilities). Examples of CPS can be large-scale distributed systems, such as smart grid, transportation networks, railway control system, and medical monitoring. The design of CPS involves various of disciplines, such as control engineering, software engineering, and mechanics and networks. Particularly, control engineering is a communication network for transmitting sensor data (measurements) so that the system operator can in real-time monitor the production process. Among the control disciplines, a scheme called bad data detector (BDD) is applied to detect whether there exists a disruption of sensor data caused by the genetic malfunction or malicious attacks. The classical BDD technique is to utilize the “residual principle,” which calculates the difference between the observed readings and the computed readings based on the estimated system states. When an attack is injected into the system, BDD will remove those readings (collected from the sensors), of which residuals are larger than a threshold.

As the increased vulnerabilities proposed by the recent discoveries of system malware, concerns about the security of CPS are arising. In 2011, a malware, known as Stuxnet [1], successfully penetrated the networks of Iran’s uranium enrichment infrastructure via programmable logic controllers. From this instance, we can see that it is possible for an attacker to introduce errors on physical readings. Inspired by this attacking strategy, a class of attacks named data injection attacks are proposed in recent years, which can affect the system control algorithms and thus lead to abnormal operations [2, 3]. Hence, sufficient attention should be paid to the detection techniques against this attack, which is easy to be implemented by strong adversaries who are quite knowledgeable about the targeted systems.

To fight against this attack, existing works focus on the detection of data injection attacks and the protection of nonlinear measurements [4, 5]. Detectors utilizing the sparsity and low rank of the system topology are proposed in [68]. Greedy and game theory methods have been used for optimizing the placement of devices [9], to lower the possibility of the construction of data injection attacks. Applying the machine learning techniques to conduct the classification is proposed in [10]. They propose a “first difference aware” machine learning (FDML) classifier to detect the cyber attacks. A graph theory-based algorithm is proposed in [11] to determine which measurement signals an attacker will alter. However, we notice that all detection models except [11, 12] are conducted in a constrained setting, by assuming that the functions from system states to measurements are linear. This assumption is too stringent to fit for some nonlinear systems, for example, alternative current (AC) model in power grids.

This paper investigates an alternative approach to detect data injection attacks in the nonlinear system. We propose a detector framework named F-DDIA to reconstruct the initial states of the plant from the corrupted observations, which formulates an error correction problem. In particular, we notice that, due to the property of data injection attacks, only a small fraction of the observations are supposed to be attacked at a given time instance. Thus, we formulate the error correction problem as a sparse optimization problem which can be solved with the general -minimization program technique. In this paper, we apply Douglas-Rachford techniques [13] among minimization techniques. Furthermore, we employ the “divide-and-conquer” principle to construct a compressive sensing model of a linear subspace, which is interesting in the general mathematical settings.

To validate and illustrate our algorithm, we use real-world CPS power grids as a case study. In particular, we use the data injection attacks model proposed in [2], where the attacks are directed by injecting false data into the sensors. Simulations based on IEEE 14-bus test systems validate the effectiveness of our methodology. The results show that the proposed algorithm can efficiently identify the data injection attacks (i.e., with high precision and recall values) and recover the initial system states (i.e., with small average phase error).

The rest of this paper is organized as follows. Section 2 presents the system model in a nonlinear system, including preliminaries related to a broad class of attacks. Section 3 states the problem and derives a theoretical justification of the efficacy of the security algorithm in a general cyber-physical system model. Section 4 analyzes the performance of the proposed approach through simulations. Section 5 gives concluding remarks.

2. Preliminaries

2.1. System Model and Bad Data Detector

A cyber-physical system is usually described by the following widely adopted discrete-time nonlinear dynamical model:where at time : is the system state; is the bounded input vector; is the measurement vector (data collected by the sensors); denotes the state noise (i.e., Gaussian with known statistics); and denotes measurement errors. Here the matrix is a constant matrix, denotes the state transition function and denotes the topology of the system, which are the nonlinear functions with respect to the states. The process of estimating system states from the measurements is called state estimation.

In traditional weighted least squares (WLS) state estimation, the system states are valid only if the measurement residual vector is less than a threshold [14],where is the estimated system state after the process of state estimation. Specifically, the presence of bad measurements is inferred if , where is a chosen identification threshold. Upon detection of bad data, two kinds of methods, named the largest normalized residual test () and hypothesis testing identification (HTI) method, are widely used to identify whether the measurements contain bad data.

2.2. Data Injection Attack

Data injection attacks are commonly known as false data injection attacks [2], data framing attacks [3, 15], in the sense of the following definition.

Definition 1. A vector is called a -data injection attack if there exists an index set , where is the set of manipulated measurements and , such that(i);(ii);(iii).

To implement this class of attack, it requires the attacker to have the knowledge of either the measurements information () or the topology configuration (). Specifically, data injection attack can be written in the form ofwhere is the injected false measurement data. There are many ways to generate this type of attacks. For example, if is available to the attacker, the attack can be constructed in the following form (namely, false data injection attack in a linear system):where is the error injected on the system state and is the Jacobian matrix. However, to implement this attack, the attacker needs to take control of at least sensors, where .

2.3. Measurement Dynamics

We can use the polynomial regression approach to fit the measurement dynamics,where denotes the dynamics of the measurements. Furthermore, we define as the th corrupted measurement at time . That is, a polynomial regression model, which expresses the dynamics of the th measurement can be given as follows:where is called the degree of the polynomial and . We denote . As can be expressed in matrix form in terms of a response vector and a parameter vector , where , we can rewrite as a system of linear equations:where . Thus, the dynamical matrix can be estimated as

3. Our Methodologies

In this section, we formulate the detection problem as an error correction problem. We will further describe and explain why we can use -norm minimization technique (including Douglas-Rachford) to solve the detection problem.

3.1. Sparse Optimization Problem Formulation

In this paper, we consider the scenario that an attacker is limited to the resources of sensors and possesses the knowledge of system topology , as well as the historical measurements . Denote as the initial measurements (without attacks) in time base. The obtained temporal observations can be expressed aswhere . Remark that, due to the property of data injection attacks, only a small fraction of the observations are supposed to be attacked at a given time instance. Hence, noticing the sparsity of vector , the detection problem can be converted towhere is the maximum number of the meters that can be compromised. Under certain conditions which are explained above, we will focus on the problem of recovering the sparse vector from . And we denote the optimal solution of problem (10) as .

3.2. Subproblem Formulation

In the rest of this paper, we define the matrices , , and . We further define the matrices , , and in the following forms:We can further obtain the following formulation among , , and :

We denote by the columns of the matrix . Hence, problem (10) is equivalent toNote that ; we can further solve problem (13) by seeking for the locally optimal choice for each with the hope of finding a globally optimal solution ():The solution of this subproblem (14) will be given in Section 3.4. After solving above optimization problems, the optimal solution will be checked by the following constraints: For any , if , there exists the attack; otherwise, there does not exist any data injection attack.

3.3. Solving Subproblem by -Minimization

Recall that the dynamical coefficients are obtained (by polynomially fitting in Section 2.3). In view of adversary, can be rewritten as Then we use the notation as follows: where the matrices and are

In this paper, We have an approximation . The reason we take this approximation is that the difference of and is For example, when . Since the values of are small , . We have done experiments about this fact, and the experimental result supports our approximation claim. Then, in (17) can be updated asWe can further take the QR decomposition of [16]:where , , , , and is orthogonal. Before multiplying (17) by , we can haveBy using the second block row, we can solve the following problem to obtain the sparse solution , instead of :Hence, the problem is reduced to reconstruct a sparse vector from the observations . Problem (14) is equivalent to the following problem:where . As is discussed above, solving problem (24) is in general NP-hard since it requires searches over all subsets of columns of , a procedure which has exponential complexity. To overcome this problem, a frequently discussed approach considers a similar program in the -norm:This operation is common and can be found in [13, 17, 18]. Throughout this paper, we consider Douglas-Rachford splitting algorithm [13] in the context of above -minimization.

3.4. Theoretical Guarantee

In this paper, we are also interested in studying the theoretical conditions under which obtaining the solution of the problem is guaranteed. It is well known that an inverse problem of finding the solution to the compressive sensing problem involves mathematical questions on the existence, uniqueness, and stability of the solution. On the other hand, the equivalence of the solution between (13) and (25) is not very clear and proof may be needed. We therefore consider two questions for a given and signal : (i) uniqueness: under which conditions a possible sparsest solution is necessarily unique to problem (13)/(25)? and (ii) equivalence: under which conditions a sparse solution to problem (13) is also equivalent to the solution of problem (25)?

3.4.1. Uniqueness

As is described in Section 3.3, solving problem (24) requires exhaustive searches over all subsets of columns of . Actually, it is a combinatorial procedure in nature and has exponential complexity. Inspired by [7, 17], Theorem 3 provides a sufficient condition for a unique solution to problem (24). It guarantees obtaining a unique sparse vector (i.e., ) from the corrupted observations (i.e., ) for the minimization. We denote by the rows of the matrix . Before giving the theorem, we need to first introduce the following definition [17].

Definition 2 (see [17, Definition ]). Let be the matrix with the finite collection of vectors as columns. For every integer , we define the -restricted isometry constants to be the smallest quantity such that obeysfor all real coefficients .

The number measures how close the vectors are to behave. In particular, for , we can have

To see the relevance of to the error recovery problem, we consider the following theorem.

Theorem 3. In a cyber-physical system, let , , , , , and be specified as above. A sparse solution can be uniquely recovered from solving the optimization problem (13), if , and .

Proof. We first prove that if , there exists a unique to problem (24). Suppose for the sake of contradiction that the solution is not unique; then there exist two solutions . Thus, there exists at least one variable such thatwhere . Then we can haveBy construction is of size less than or equal to . Applying (27) and the hypothesis , we conclude that , contradicting the hypothesis that and are distinct.
Then we prove that is unique to problem (13). Given the proof that , or equivalently , can be uniquely obtained by solving problem (24) and , we conclude that is unique to the following problem:And given the condition that , we can conclude that is also unique to problem (13).

In the literature, a lot of efforts have been made to determine how sparse the desired corrected error must be for equivalence to hold. As we consider to use -minimization instead of (to obtain the desired error), the conditions in the above lemma may not be guaranteed. Thus, Theorem 4 gives a general condition, which guarantees a unique solution for -minimization problem.

Theorem 4. In a cyber-physical system, let , , and be specified as above. A sparse solution can be uniquely recovered from solving the optimization problemif, for all , we have and , where and are the support of vectors and , respectively.

Proof. We prove that given any and and , we can always uniquely recover from (31). Suppose for the sake of contradiction that the solution is not unique; then there exist two instinct solutions that but . We use the vectors and instead of and , respectively.contradicting the hypothesis that . Therefore, we conclude that is unique to problem (25). Equivalently, is unique to the following problem:Furthermore, given the condition that , we conclude that is unique to problem (31).

In conclusion, Theorems 3 and 4 show that the hypothesis of our theorem holds provided that the sparse error can be uniquely corrected. Naturally, if the assumption does not hold, then neither does (13) or (31).

3.4.2. Equivalence

Next, we will discuss the conditions under which it is theoretically possible to use -minimization to obtain the sparse solution (or ) instead of -minimization. We derive an algorithm for precisely verifying - equivalence. We can use the following definition and proposition proposed in [19].

Definition 5 (see [19, Definition  ]). We define as the collection of all -dimensional faces of the -ball :where .

Proposition 6 (see [19, Proposition ]). In a cyber-physical system, let , , and be specified as above. For every and , the following implication holds:if and only if and , where number of columns of that are linearly independent.

Proof. See Proposition in [19].

Note that implication (35) is the condition that we want to verify. As we need to deal with high-dimensional matrices (e.g., ), we need to give asymptotic guarantees of equivalence, which is described in Proposition 6. In our experiments, it is confirmed that we can benefit from this equivalence, even when the matrices are in high dimensions.

4. Experimental Results

4.1. Case Study: Power Network

We employ a real-world power grid system as the test system we used. A state-space control model in a smart grid consists of buses connected to transmission lines. We use the IEEE 14-bus system as the test system [20]. Moreover, we use the real load data in year 2016 from New York Independent System Operator (NYISO). The NYISO load data include the regions (namely, A-H). Similar to [12], the following procedures are used to estimate 5-minute system state () using load pattern from NYISO.(1)Link each load bus of IEEE 14-bus system to one region of NYISO using the following matrix:The first row of the matrix is the bus number of IEEE 14-bus system and the second row represents the corresponding NYISO region index.(2)Normalize the load data collected from NYISO to the initial real and reactive load of the corresponding IEEE 14-bus system. Due to lack of reactive load information in NYISO database, we use the direct current (DC) power flow model to estimate system states. This condition can be relaxed when the reactive load data is available.(3)Add the normalized load data on the IEEE 14-bus system.(4)Estimate the system state () from the solution of power flow analysis for benchmarking purpose. In this paper, we apply Newton-Raphson algorithm for estimating .

Similar to [12], we estimate operating points of the system state by adding the normalized 5-minute load data on the MATPOWER IEEE 14-bus case file [21]. In this paper, we use one-day NYISO data as the testing set. Thus, on one day, there will be operating points. So, we set to construct the F-DDIA method. Second, we prepare the attacked samples as follows. We let the parameter range from to in the IEEE 14-bus test system. For each , we simulate -specific meters to attempt the attack construction () with a randomly injected error . Thus, at most, a total of labeled samples, which includes attack samples and initial samples (without attacks), are prepared.

4.2. Parameters in Load Fitting

According to Section 2.3, the in (6) is the parameter of the measurement (load) dynamical model for power grid system. We estimate by polynomial regression using data traces of . The historical load data in NYISO and attack samples prepared in previous session are used to construct the matrix (i.e., polynomial regression in order of ) in (6). The measurement dynamics at each time are estimated by the data of hours prior to the time. For example, if we want to estimate the load dynamics at 0:05 am Jun 30, Zone F, the load data samples (which may contain attacks) during 0:05 am, Jun 29–0:00 am, Jun 30 are used.

We are concerned about what the regression order is appropriate for fitting the dynamics of the system. The experimental results show that is a suitable regression order. As the increase of will improve the load fitting accuracy at the cost of computation time, we will use in the rest of our experiments. Table 1 gives the regression results for predicting the dynamical model by using the load data on Jun 30, 2016.

Specifically, we take Zone F for an example; Figure 1 shows a quadratic polynomial fit of load in Zone F with confidence bounds (the interval indicates that we have a chance that a new observation will fall within the bounds.). We collect the hourly data to fit the model, where the blue “+” represents the actual hourly load, and the green curve describes the fitting model.

4.3. Performance Matrices

When is calculated by our detector, we set the following rule to identify whether the system is attacked:where is the observation threshold when detecting data injection attacks. The parameter will be discussed later in this section. We denote the user-defined threshold when is identified as attacked. Then, we identify whether is attacked by aggregating the values of (). We predict as attacked (denoted as ) if the sum of is larger than the all-users-defined threshold , and secure (denoted as ) otherwise:

In smart grid networks, the major concern is not only the detection of attack cases but also that of the secure cases. In other words, after following the rule (38), we need to be careful of the samples with high precision and recall performance in order to avoid false alarms. Therefore, we utilize precision and recall metrics, which are commonly used for classification tasks [10]. Specifically, as Table 2 defines, we denote CA as the number of attacked samples, which we identified as attacked, WA as the number of secure samples, which we identified as attacked, CS as the number of secure samples, which we identified as secure, and WS as the number of attacked samples, which we identified as secure.

In addition, the performance of the proposed detector can be measured by the precision and recall metrics:where and () indicate the precision and recall values for the class attacked (secure), respectively. Precision values give information about the decision performance of the algorithms among identified class. And recall values measure the degree of attack retrieval.

4.4. Performance on Detecting Attacks

We first analyze the performance of the proposed algorithm against the attacks, which are made from a set of false data injection attacks when . In the experiments, we observe that the selection of threshold parameter does affect the precision and recall performances. Table 3 shows the comparison for different values. and increase as increases and remain when . In addition, and decrease as increases. Note that the precision value at is and the recall value at is lower than for class attacked. Thus, the optimal value should be in range . Note that the performance at is quite similar to that at ; thus we do not draw the performance at in Figures 2, 3, 4, and 5 to avoid unreadability.

The performance of different values for identifying attacked samples is compared in Figures 2 and 3, where . We observe that increases and decreases when increases. The precision value of attacked class is approximately when and . The recall value of the attacked class increases with rising values and is approximately when is larger than . Although the proposed algorithm at and may correctly detect the attacked samples as increases, the secure variables are incorrectly labeled as attacked and therefore give more false alarms.

Meanwhile, the performance of identifying secure samples is compared in Figures 4 and 5. Both values (precision and recall) of the secure class are high (i.e., near ). Summing up, the above experimental results show that if we choose the parameter , our methodology can efficiently detect the data injection attacks.

4.5. Performance on Recovering System States

In this part, we compare the performances of our detector and the residual-based approach with the performance of recovering the initial systems states. We first introduce how we evaluate the performances of an algorithm. In IEEE 14-bus system, the state vector will have bus voltage magnitudes and phase angles, where the phase angle of one reference bus is set as the reference. If the system is observable [14], the state vector can be represented as follows: , where is voltage magnitude and voltage angle at bus . Therefore, the average absolute phase error for bus , denoted as , can be described as follows:where is number of testing samples;th is bus absolute phase error at time when under the th attack in the testing samples;th is bus recovered phase angle at time when under the th attack in the testing samples;th is bus true phase angle at time .

The proposed algorithm and residual-based algorithm have been tested under various attack scenarios (i.e., ). Table 4 presents the results when and , respectively. We can first see the superiority of our methodology, comparing with the residual-based algorithm. For example, when , the average phase error of our proposed algorithm on bus is , whereas the error is for the residual-based algorithm, which is times larger than our algorithm. Second, we can see that the average phasor errors for are in general smaller than those for , which means that the performances of both algorithms work better when is small. Third, we see that the F-DDIA result of bus 11 (or bus 13) is quite different from the that of bus 12 (or 14). We think the reason that causes this phenomenon is because of the value. When , the bus indexes 11–14 are with little difference. To sum up, the reason that causes this difference of the average absolute phase error is complex, and thus the F-DDIA performance depends on a series of parameters (i.e., , etc.).

4.6. Comparison on Execution Time

In our experiments, we find that the proposed approach is faster than other works. The residual-based fault detector takes around min ( s per sample), while the proposed approach only takes min ( s per sample). The min of our approach includes load dynamics fitting and Douglas-Rachford iterations process. The main computation burden for our proposed approach is to proceed Douglas-Rachford iterations for basis pursuit process. In general, we do not consider the state estimation process. This is why our proposed approach is faster than the other one.

5. Conclusions

The paper examines the problem of detecting data injection attacks in smart grid networks. We propose a detection framework named F-DDIA, which can recover the initial system state, as well as the real measurement readings. Due to the sparse nature of data injection attacks, minimization technique (including Douglas-Rachford) can be applied. The validation of the proposed detecting algorithm is validated using load data from NYISO. Our detector works well in both linear and nonlinear systems.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported in part by National High Technology Research and Development Program of China (no. 2015AA016008).