Abstract

We study the consequences of cyberattack, defense, and recovery in systems for which a physical system is enabled by a cyber system by extending previous applications of models from the population biology of disease to the cyber system and coupling the state of the cyber system to the physical system, using the synchronous model for the electric grid. In analogy to disease models in which individuals are susceptible, infected, or recovered, in the cyber system, components can be uncompromised and vulnerable to attack, uncompromised and temporarily invulnerable to attack, compromised, or reset and thus not able to contribute to the performance of the physical system. We model cyber defensive countermeasures in analogy to the adaptive immune system. We link the physical and cyber systems through a metric of performance of the physical system that depends upon the state of the cyber system using (i) a generic nonlinear relationship between the state of the cyber system and the performance of the physical system and (ii) the synchronous motor model of an electric grid consisting of a utility with many customers whose smart meters can become compromised, in which a steady state in the difference in rotor angles is the metric of performance. We use the coupled models, both of which have emergent properties, to investigate two situations. First, when an attacker that relies on stealth compromise is hidden until it is either detected during routine maintenance or an attack is initiated. The probability that compromise remains undetected declines with time and the level of compromise increases with time. Because of these dynamics, an optimal time of attack emerges, and we explore how it varies with parameters of the cyber system. Second, we illustrate one of the Electric Power Research Institute scenarios for the reverse engineering of Advanced Metering Infrastructure (AMI) by coupling the synchronous motor equations for the generator and utility to the model of compromise. We derive a canonical condition for grid failure that relates the level of compromise at the time of detection of compromise and the dissipation parameter in the synchronous motor model. We conclude by discussing the innovative aspects of our methods, which include (i) a fraction of decoy components in the cyber system, which are not connected to the rest of the cyber system or the physical system and thus do not spread compromise but increase the probability of detection of compromise, (ii) allowing components of the cyber system to return to the un-compromised state either temporarily invulnerable or immediately vulnerable, (iii) adaptive Defensive Counter Measures that respond in a nonlinear fashion to attack and compromise (in analogy to killer T cells of the immune system), (iv) a generic metric of performance of the physical system that depends upon the state of the cyber system, and (v) coupling a model of the electric grid to the model of compromise of the cyber system that leads to a condition for failure of the grid in terms of parameters of both compromise and the synchronous motor model, directions for future investigations, and connections to recent studies on broadly the same topics. We include a pseudocode as an Appendix and indicate how to obtain R script for the models from the first author.

1. Introduction

A recent survey [1] of cyberattack in complex systems with both physical and cyber components in which the cyber system enables functionality of the physical system emphasizes the importance of improving our ability to understand this class of systems, and that it is critical to go beyond the attack-defense dynamics in cyberspace, but to include how changes in the cyber domain generate changes in physical outcomes (and possibly vice versa). That is, one wishes to assure that the physical system remains in the desired operating regime or out of undesired ones and needs to discover methods to achieve this goal. Modern power and communication systems are canonical examples of such systems [2, 3] and in [1], the particular case of a smart grid subject disruption and stabilization through a Distributed Denial of Service (DDOS) or deception attacks that alters sensor information [4, 5] is treated.

In this paper, we extend the ideas in [1] in a new direction by explicitly coupling the models of attack dynamics and the physical functional response of the system under attack. This allows us to analytically explore the system dynamics of the coupled nonlinear systems, in particular how parameterized attacks of certain types and behaviors result in effects observed in performance and functionality of the resulting system (in this case, a smart power grid). In doing so, we develop a modeling approach that creates the ability for attackers to optimize the timing and pace of their attacks, and for defenders and designers to select defense strategies, deploy defensive assets, and make architectural and design choices tuned to the coupled cyber-physical system.

To do this, we build on existing nonlinear dynamic models of compromise of cyber systems that import ideas from the population biology of disease [611], characterize performance using either a nonlinear metric for a generic physical system (cf. [610]) or the nonlinear dynamics of an electric grid, and introduce counter measures having nonlinear dynamics based on the immune systems [12, 13].

Our goal is to develop a systems dynamics model to explore the roles of attack and maintenance rates, detection capability, and other design parameters on the dynamics, particularly how compromise of the cyber system propagates, and performance of the linked physical system responds. Since our goal is to understand the important variables and how they affect performance, we build a heuristic model [14] that is not specific to any particular situation but has commonalities with many complex systems. Using the model will help identify what to measure to be able to assess vulnerability to cyberattack, the consequences of attack on performance of the physical system, and to identify design tradeoffs and routes to defense.

We link the physical and cyber systems through a metric of performance of the physical system that depends upon the state of the cyber system, using (i) a generic nonlinear relationship between the state of the cyber system and the performance of the physical system and (ii) the synchronous motor model of an electric grid in which the load is a utility having consumers that used Advanced Metering Infrastructure (AMI; smart meters) that can be compromised.

In the next section, we develop the model for compromise of the cyber system, after which we introduce two metrics of performance of the physical system. We then illustrate the dynamics of the model for the cyber system, after which we treat two examples. First we show how the timing of attack when the attacker relies on stealth emerges from the nonlinear nature of both compromise and performance. We explore how performance of the physical system depends on parameters of compromise and performance and show how the optimal time of attack depends upon the parameters of the physical system and rates of external compromise and internal co-compromise.

Second, we explore compromise of Advanced Metering Infrastructure (AMI) and failure of the electric grid. AMI is an example of the Internet of Things, which increases the risk of cyber compromise because it increases the number of points of access for attackers [1419]. In this example, we couple the model of compromise to the synchronous motor model of an electric grid to illustrate Electric Power Research Institute scenario AMI.27 [20] for reverse engineering of AMI, and a specific case of how compromising smart meters can lead to load-side failure of an electric grid. Because power grids are now considered critical infrastructure, data about them are often treated in a confidential manner [21, 22]. By using a generally accessible model for the grid, we are able to clearly see how compromise propagates and interacts with performance of the grid.

2. Materials and Methods

We envision a complex system consisting of a physical system that is enabled by a cyber system that takes commands, exchanges information, and more generally interacts with the external world. Attacks on the cyber system lead to compromise of its components, which has an effect on the physical system. Our goal is to provide a framework for modeling compromise in the cyber system, linking the cyber and physical systems (both generically and specifically [the electric grid]), and use the model to explore the dynamics of compromise, attack, and recovery of the cyber system and the related performance of the physical system.

We first describe the conceptual framework for the model of the cyber system (Section 2.1), after which we derive the equations for the dynamics of compromise and recovery (Section 2.2). We then show how the dynamics simplify when the attacker relies on stealth (Section 2.3), in which compromise is built until either it is discovered during regular maintenance or an attack is executed. We consider two metrics of performance (Section 2.4): (i) a generic metric that depends upon the number of uncompromised cyber components, the number of compromised cyber components, and the level of Defensive Counter Measures (DCM) and (ii) a synchronous motor model for the electric grid for which the metric of performance is the stability of the grid.

2.1. Conceptual Description of the Cyber Subsystem

An attack on a cyber system first requires external compromise: gaining access to components that interface with the external world. Once “inside” the cyber system, the second stage of internal co-compromise can commence, in which compromised components infect noncompromised components. Compromised components may immediately reduce the performance of the physical system or the adversary may hide compromise until ready to execute the attack.

In many cases, cyber components that interface with external world can be protected by external hardness [14] in which case antimalware prevents attackers from entering the cyber system. Cyber components that do not interface with the external world can similarly be protected from co-compromise by internal hardness [14]. External and internal hardness rely on a variety of mechanisms [21, 23, 24] that we do not model explicitly. However, it is now clear that external and internal hardness are necessary but not sufficient for both a reliable and resilient cyber system [21] and that resilience of the cyber system (and thus performance of the physical system) requires some form of Defensive Counter Measure (DCM) that returns the system to a state closer to the one before the attack. Protection from external compromise and internal co-compromise may not be effective (e.g., the installing antimalware does not defend) or may lose its effectiveness over time (e.g., the attacker discovers a way to circumvent the antimalware currently installed).

Thus, at any time, the cyber system consists of five classes of components (Figure 1): First, uncompromised and vulnerable components can be compromised either externally or internally. We allow a fraction of these components to be decoys, with no functionality, but instrumented to detect compromise with high probability. Second, uncompromised cyber components that are currently invulnerable to either external or internal compromise are temporarily protected against malware, but as time progresses their antimalware software ages and is no longer effective. Third, compromised cyber components are infected by malware. Fourth, once compromised components are discovered, they are temporarily removed from the cyber system to be restored or reset later [21]. These components do not contribute to performance of the physical system. After some amount of time, components that are being reset return to the cyber system temporarily invulnerable (effective antimalware installed) or still vulnerable (either no antimalware installed or the installed antimalware is ineffective). Fifth, once compromise is detected, DCM are activated to discover and send compromised components into the resetting phase. Since DCM use cyber resources, we assume that their presence reduces the functionality of the physical system.

2.2. The Dynamics of Compromise, Defense, and Detection
2.2.1. Dynamics of the Cyber System Components

We assume that the cyber system has components, with dynamics characterized by mass action in a mean field approximation [2529].Uncompromised and vulnerable cyber components, denoted by transition to become compromised components (i) because of external compromise at a rate proportional to their numbers and (ii) because of internal co-compromise by previously compromised cyber components at a rate proportional to the numbers of both kinds of components, accounting for the decoy cyber components. In addition, temporarily invulnerable cyber components lose their protection and compromised cyber components that are reset without any protection increase the number of uncompromised and vulnerable components. We assume that the rate of these transitions is proportional to their number. Hence the dynamics areOn the right hand side of equation (1), is dimensionless; is the rate of external compromise so that the fraction of uncompromised cyber components surviving external compromise from to is ; is the rate of co-compromise so that when there are compromised cyber components at time , the fraction of uncompromised cyber components surviving co-compromise to is ; is the rate at which temporarily invulnerable cyber components become vulnerable; and is the rate at which compromised components being reset return vulnerable to compromise.Uncompromised and temporarily invulnerable cyber components, denoted by , transition to the uncompromised and vulnerable state as they lose protection. Compromised cyber components that reset with temporary protection from external or internal compromise transition to the uncompromised and temporarily invulnerable states. We assume that the rates of these transitions are proportional to the number of components so that the dynamics areIn this equation, is the rate at which compromised cyber components are reset and returned to the cyber system temporarily invulnerable. If and , then all cyber components that are reset return vulnerable to attack. On the other hand, if and , then all cyber components return temporarily invulnerable to attack. In the most general case, both and are nonzero, so that returned cyber components have a mixture of vulnerabilities.Compromised cyber components, denoted by , increase in number due to the processes in the first term on the right hand side of equation (1). Compromised cyber components transition to the resetting state due to regular maintenance during which compromised components may be discovered and when removed by DCM, denoted by . We assume that the rates of these transitions are proportional to their number for regular maintenance and to the their number and the level of DCM when DCM are activated. We let denote an indicator function that is 1 if DCM are activated at time and is 0 otherwise. The dynamics of compromised cyber components are thenwhere is the rate of transition of compromised cyber components to resetting during regular maintenance and is the rate of transition of compromised cyber components to resetting by DCM.Defensive Counter Measures are intensive malware eradication efforts. We do not specify the mechanism of the DCM, since they are situation dependent [23]. DCM are active only after compromise is detected, which may occur during regular maintenance, through recognition of external compromise, or anomalous performance of the physical system [3032]. After compromise is detected, we model the dynamics of DCM, denoted by , in analogy to T cells of the immune system [12, 13].

If in equation (4), DCM are active only when compromised cyber components are present and in absence of compromised cyber components, the only steady state is . This is the situation we assume for computations. When , there is a positive steady-state level of DCM even when there is no compromise in the system, providing ready defense of the cyber system at the possible expense of performance of the physical system (see below). The parameter controls the rate of increase of DCM; in particular, when , the rate of increase of DCM declines as the level of DCM increases.

Because cyber components are neither created nor destroyed, the numbers of uncompromised and vulnerable, uncompromised and temporarily invulnerable, compromised, and resetting components sums to . This is captured by the second terms on the right hand sides of equations (1) and (2).

2.2.2. The Detection of Compromise

A variety of methods have been proposed for detecting compromise [30], but in general the detection of compromise will be imperfect, because regardless of the specific process (e.g., signal matching or anomalous behavior) there will be both false positives and false negatives [14]. We let denote the probability that compromise remains undetected at time .

We assume that the probability of discovering compromise in the next units of time when the rate of external compromise is and cyber components are compromised is where and are the rates of detection of compromised nondecoy and compromised decoy cyber components, respectively, and is the rate of detection of compromising activity when the rate of external compromise is . In general, we anticipate that . For simplicity, for computations here we set .

Assuming that compromise is undetected at the start, ; subsequently

Equation (5) generates a trajectory for the probability that compromise is not detected by time . We use this trajectory to characterize the stochastic process of detection. That is, from , we generate realizations of the time at which compromise is detected, and explore the consequences of different times of detection on the performance of the physical system.

2.2.3. Summary: a Mixed Deterministic-Stochastic Nonlinear Model

The full model for dynamics of the cyber system consists of equations (1)–(5). Because of equations (1)–(4), the model is inherently nonlinear. Because of equation (5), it is stochastic, even though the rest of the dynamics are deterministic. We implement this mixed deterministic-stochastic model by solving equations (1)–(5) with to generate the trajectory of . Given this trajectory, we generate times of detection, denoted by by drawing uniformly distributed random variables and comparing them with . Then, for every , we solve equations (1)–(5) to determine the entire trajectory of compromise, detection, and recovery from compromise.

For the base case, we used these parameters: , . All computations were done in R Studio Version 1.0.143 with underlying R version 3.6.1.

2.3. Simplification When the Attacker Relies on Stealth

When an adversary relies on stealth, compromise is hidden until the attack is executed. That is, cyber compromise may be ongoing for a long period of time and undetected [14, 21]. In general, the longer an attacker waits to initiate the attack, the more likely it is that compromise will be detected and removed [3335], suggesting that an optimal time of attack will emerge from the dynamics of compromise and detection.

To model this situation, we assume that compromise (both external and internal) increases until (i) it is detected through regular maintenance or (ii) the attacker decides to launch an attack at time . We assume that if compromise is detected before the attack is launched, the value to the attacker is 0. Otherwise, when the hidden compromise becomes active, assessing the value of the attack requires a metric of utility for the attack, which we consider to be (1) the number of compromised cyber components or (2) the reduction in performance of the physical system.

2.3.1. Dynamics before Attack or Detection

For simplicity, we assume all cyber components are initially vulnerable to compromise. Since compromise is hidden until attack or discovery, always and no cyber components are being reset. Consequently before attack or detection, only equations (1), (3), and (5) are relevant. That is, cyber components are either un-compromised and vulnerable or compromised, but compromised cyber components are not recognized until either detected or the attack is executed. We let denote the number of un-compromised and vulnerable cyber components so that equations (1) and (3) reduce to

Since which has solutionfrom which follows directly.

2.3.2. The Value of Attack

The probability that compromise remains hidden until the time of attack is , determined by the solution of equation (5). The value of an attack requires that we introduce a utility function for the attacker. We will consider two choices. First, the value of attack may be measured in terms of the number of compromised cyber components at the time of attack. Second, the value of attack may be measured in terms of reduced performance of the physical system after the attack is executed.

The value of attack measured in terms of the expected number of compromised cyber components is

Since is an increasing function of time and is a decreasing function of time, their product will have a peak, leading to an optimal time of attack and times of attack around that optimum that are “pretty” good, in the sense of giving nearly the same value as the value at the optimal time of attack.

Assessing the value of attack measured in reduced performance of the physical system due to compromised cyber components requires a metric for the performance of the physical system when the cyber system has uncompromised and compromised components, and level of DCM . We describe such a performance function in the next section.

Given , the value to the attacker is the difference between – performance of the physical system in the absence of compromised cyber components and DCM–and –the performance of the physical subsystem when there are un-compromised cyber components and compromised cyber components at the time of attack, and no active DCM. The expected value of this performance function is

Other choices of utility functions are possible, although as will be seen in the next section, a generic metric of performance of the physical system can capture a wide range of utility functions, from threshold relationships to linear ones. We also note that the value of attack to the adversary may not be the same as the cost to the owner of the system.

2.4. Metrics of Performance of the Cyber Physical System

In general, the cyber and physical systems are coupled nonlinearly. We first consider a generic measure of performance in which performance of the physical system is a function of the number of uncompromised cyber components, the fraction of decoy cyber components, the number of compromised cyber components, and the level of DCM. Performance increases as the total number of uncompromised, nondecoy cyber components increases, and decreases as both the total number of compromised cyber components increases and the level of DCM increases.

We then turn to the synchronous motor model of the electric grid, with stability as the metric [21, 32], although we will show that both reliability and resilience of the grid emerge from the model.

2.4.1. A Generic Metric of Performance

We model performance function of the physical system as follows. First, we assume that performance of the physical system increases with the number of uncompromised cyber components according towhere is the number of uncompromised cyber components at which performance of the physical system is 50% of its maximum value (set to 1 without loss of generality) and captures the dispersal in performance of the physical system (Figure 2).

Equation (11) can accommodate a variety of assumptions about performance of the physical system. For example, if performance of the physical system is approximately a linear function of the total number of uncompromised cyber components, then parameters giving the dotted line in Figure 2(c) are appropriate and can be chosen to give the 50% point of performance. On the other hand, if the physical system is a communications system and only one or a few cyber components need to be uncompromised for a message to get through, then a small value of is appropriate. In communications systems, performance will be determined by bandwidth, connectivity, and message accuracy, all of which can be compromised; mission performance metrics are accuracy, timeliness, and completeness of the information. Mapping these onto the sigmoidal functions requires detailed knowledge of the particular system and is thus beyond the scope of this paper.

We use a similar kind of sigmoid to characterize the reduction in performance due to compromise of the cyber system and assume that DCM use computational bandwidth, thus also reducing performance of the physical system. We assume that when the level of DCM is , performance of the physical system is reduced by a factor , where .

A generic metric of performance of the physical system when a fraction of the cyber components are decoys, and there are uncompromised cyber components, compromised cyber components, and the level of DCM is , is then

If the attacker needs to compromise many cyber components to degrade performance of the physical system, we set to a moderately large value. On the other hand, if the attacker can shut down the physical system by compromising a few cyber components, then will be small. The terms in equation (12) capture that decoy components in the cyber system are not part of the functionality of the physical system.

For computations using this generic performance function in the baseline case, we used .

2.4.2. The One Generator-One Load-Many Consumers Electric Grid

One approach to modelling complex electric grids is to use second-order oscillator equations derived from Kirchoff’s laws that balance power at each node of the grid [3355]. That is, a power grid is modeled as a system of electrically coupled devices that deliver power from generators to machines/loads via transmission lines and a key objective of management of electrical grids is that generators and loads are properly synchronized. When a load is too strong, loads are unevenly distributed, or a major disruption occurs, and a generator or load may lose synchrony [20]. If this is sufficiently strong, it may lead to grid failure [5055].

In the synchronous motor model, the generator or load is characterized by a power balance equation [53] of the form where the terms on the right hand side are, respectively, the rate at which the motor dissipates energy, the rate at which it accumulates energy, and the rate at which it transmits energy. The state of each generator or load is described by the rotor angle, measured relative to a reference device rotating at a standardized frequency, and satisfying the swing equation [5054]where is the deviation of the rotor from the reference frequency, is the moment of inertia of the rotor times the reference frequency, is the coefficient of damping/friction as the rotor revolves, and and are, respectively, the mechanical power generated and the net electrical power transmitted through transmission lines.

A power grid is modeled as a coupled collection of equations similar to equation (13); and the deviation from the reference frequency for the generator or load, denoted by , satisfies [53].where is the dissipation occurring in the generator or load, is the coupling strength of the transmission line, and is a matrix whose entries are 1 if generators/loads and are connected, and 0 otherwise. for generators and for loads. The power network will be in a steady state when consumption and generation of energy balance, so that if there are in total generators and loads .

In order to focus on compromise in the demand-side cyber subsystem, we use the simplest grid, as in [56], consisting of a single generator () and a single load () (Figure 3) envisioned as a utility with many customers having AMI. The physical system consists of generators, substations, transformers, and towers and transmission lines; there are two cyber systems coupled to it. The first cyber system is the Supervisory Control And Data Acquisition (SCADA) [2, 3739] system that collects measurements from substations and sends out control signals to equipment, substation automation or protection systems, and energy management systems. The second is the cyber system of consumers with AMI. To simplify the analysis, we focus on compromise of the latter cyber system and how compromise in it can destabilize the grid.

We let and denote the phase deviations for the generator and utility. In this case, reduces to a single value; we replace by , for simplicity set the dissipation parameters , replace by for the generator and for the load, so that the equations for generator and load are

Subtracting the second equation from the first equation, we obtain a single second-order equation for the phase difference , written as the first-order system

For this model, the transmitted power is [53], and a natural metric of performance of this grid model is that the transmitted power is in a steady state. If , this grid has a steady state at . If , there is no steady state; instead the solution cycles. However, when but close to , if the dissipation coefficient is small enough, a periodic solution (and therefore unstable operation of the grid) coexists with a steady state. In particular [5759], for each value of , there is a critical value of the dissipation coefficient, which we denote by so that if then there is no periodic solution and the grid will be globally stable. However, when , there is also a periodic solution ([52], Figure 2) so that the operation of the grid will be unstable; Rohden et al. [52] identify this situation with a power outage. Major power grids are often operated in the region in which is on the order of to reduce the cost of overcapacity in transmission lines [53], which makes the possibility of transition from a steady state to oscillatory or chaotic behavior more likely.

For computations in the base case, we used .

3. Results and Discussion

We begin (Section 3.1) by illustrating the basic dynamics of the model and associated performance of the physical system using the generic performance metric. These show that a steady-state level of compromise is reached, even with active defense, consistent with the observation that one should assume that cyber systems have been penetrated and focus on determining the consequences of the penetration [14]. Because of the mixed deterministic-stochastic nature of the model of compromise, we report distributions for the time of detection and level of compromise at the time of detection and then show realizations of the dynamics. As a sensitivity analysis (Section 3.2), we explore how the dynamics of compromise and performance of the physical system depends upon the fraction of decoy cyber components . We then (Section 3.3) show results when an attacker relies on stealth, particularly how the optimal time of attack emerges and how it depends upon parameters of the system and the choice of metric for value to the attacker. We begin the section on the grid model by briefly summarizing (Section 3.4.1) the dynamics of the one generator-one load grid in the absence of compromise and then (Section 3.4.2) couple the model of compromise to the model of the grid. We illustrate EPRI scenario AMI.27 [20] for reverse engineering AMI by showing how smart meters sending misleading signals about power demand can lead to load-side failure of the electric grid and derive a canonical condition for failure of the grid in terms of the number of compromised components at the time of detection and the dissipation parameter of the synchronous motor model.

3.1. Dynamics of Compromise and Recovery

For these results, we set . Absent DCM, compromise is removed only through regular maintenance. In Figure 4, we show the dynamics of the number of cyber components (Figure 4(a)), and performance of the physical system and the probability that compromise is detected (Figure 4(b)). We use the dynamics of the probability of detection to create stochastic realizations of the time of detection (Figure 4(c)) and the number of compromised cyber components at the time of detection of compromise (Figure 4(d)).

The cyber system reaches a steady state in which compromised cyber components co-exist with vulnerable and temporarily invulnerable cyber components. Consequently, performance of the physical system declines to a steady state.

Performance in Figure 4(b) can be compared directly to Figure 3.14 in [21], showing the time course of the fraction of the load delivered before, during, and after a cyberattack or Figures 2.1–2.3 in [32] characterizing the resilience of a system in response to a disruption/attack. When DCM become active, although performance of the physical system may increase because of the removal of compromise, performance of the physical system need not return to its previous state following disruption/attack (see below).

Conditioned on time of detection, the behavior of the full system is captured by equations (1)–(5), which we illustrate by choosing 8 times of detection that capture most of the range in Figure 4(c).

The cyber system recovers to more than 90% of its initial state (left panels in Figure 5). That is in the steady state, and regardless of the time of detection of compromise, there are uncompromised and vulnerable, uncompromised and currently hardened, and compromised cyber components (with the remainder resetting). Compromised components of the cyber system persist in the steady state because both external attack and internal co-compromise continue. The number of compromised components remains at a low level because DCM are maintained at a nonzero level (Figure 5, right hand panels, blue lines). The cost of a nonzero steady state of DCM is reduced performance. To illustrate this point, in Figure 6, we plot performance for the same times of detection as in Figure 5. Performance drops as compromise builds before detection of compromise and the minimum level of performance depends upon the time of detection of compromise. However, in the steady state, performance only returns to about 70% of its initial value.

3.2. The Role of Decoy Cyber Components

To explore the role of decoy cyber components, we swept over eight values of , ranging from 0 to 0.25 (in the spirit of [60]). For each value of , we determined 300 times of detection. Conditioned on both and the time of detection, we computed the dynamics of the cyber system and the performance of the physical system.

Because of the formulation (Equations (1), (3), and (5)) as increases, the probability of detecting compromise by a given time will increase, and the number of compromised cyber components at a given time will decline (Figures 7(a) and 7(b)). What cannot be predicted is that the standard deviation of both the time of detection and the number of compromised components at the time of detection of compromise decline as increases.

In light of the results shown in Figures 5 and 6, we anticipate once DCM are activated, the system will recover and reach a steady state that is independent of both the time at which compromise is detected (the stochastic component) and varies in a deterministic way with . The consequence (Figure 7(c)) is that although the mean of the steady-state performance of the physical system declines with , the standard deviation of steady-state performance is virtually 0.

On the other hand, the minimum performance of the physical system depends upon the time at which compromise is detected (as in Figure 6), and thus on the value of . Minimum performance is determined both by the level of compromise of cyber components at the time that compromise is detected and the response of DCM. In Figure 7(d), we show the mean and standard deviation of the minimum performance of the physical system. The mean of minimum performance is an increasing function of and the standard deviation of minimum performance is a decreasing function of .

3.3. Stealth and the Optimal Time of Attack

When the attacker relies on stealth, the results become deterministic because the value to the attacker is an expectation over the time of detection In Figure 8(a), we show the time course of the two value functions. An optimal time of attack clearly emerges from these figures, as does a range of times of attack for which the value of attack is “pretty good” [61]. Conditioned on the other parameters, we predict the attack will occur earlier when value to the attacker is measured in terms of loss of performance of the physical system rather than the number of compromised cyber components. In Figures 8(b) and 8(c) we show the optimal time of attack across a range of values of and .

3.4. Load Side Failure of the One Generator-One Load Grid due to Compromise in AMI

We begin by summarizing the properties of the grid model in the absence of any compromise and then link it to the model of compromise.

3.4.1. Properties of the Grid Model in the Absence of Compromise

In the absence of compromise, the parameters , and and initial conditions and lead to a steady state for equation (16). After transient initial dynamics, the steady state given by is reached. Since transmitted power is and the steady-state value of increases with , transmitted power also increases with .

As discussed above, as approaches from below, instability may occur. For example, with all other parameters as above, for equal to or , the grid reaches a steady state, but for an oscillatory solution develops. Conditioned on the other parameters, the critical value for falls between 0.177 (oscillations) and 0.178 (steady state approached).

These results suggest that one route to the instability of the electric grid (which we explore in detail below) is increased demands of power. To illustrate the concept without the complexity of the model for compromise and detection, we solved equation (16) for and where , 0.8, 0.85, 0.9, and 0.95 characterizes the increase in demanded power. In this case, so that a steady state exists. However, we expect instability before reaches 1. This is indeed the case–for a value of between 0.831 and 0.832, the grid loses stability. Furthermore, increasing makes the system more stable: when , the value of leading to instability falls between 0.8588 and 0.8589 and when the value of leading to instability falls between 0.8844 and 0.8845. This is an example of criticality in physical systems as they operate closer to their capacity and become more fragile to perturbations. One of the roles of models such as we develop here is to help recognize and characterize potential modes of failure of the physical system.

3.4.2. Load Side Failure of the One Generator-One Load Grid due to Compromise of AMI

We couple the models for compromise and its detection and for the electric grid by replacing by , where is how much one compromised AMI increases the demand for power, and , computed from equation (3), is the number of compromised AMI at time . We assumed for computations. In Figure 9, we show transmitted power assuming that the dynamics of compromise and times of detection are the same as in shown in Figure 5.

The grid shows a signal of compromise with a secular, almost linear increase in the demand for power (the portion of the trajectories before the time of detection represented by the vertical dotted line). If detection occurs early enough (as in the first six panels), DCM are activated, compromise is reduced, and the grid returns to a stable steady state. However, as shown in the bottom two panels, if detection occurs too late, then the grid enters a region of instability even after DCM are activated.

To compute the risk to the grid due to compromise of AMI in the cyber system, we note that when power demanded by the load is and compromise is detected at time , the condition for instability of the grid iswhich also can be written as

Equation (18) is a canonical relationship linking compromise and load-side failure of the grid. Since detection of compromise is a stochastic process, the time of detection and level of compromise at the time of detection vary. Because detection of compromise is a random process, the level of compromise on the left-hand side of Equation (18) is a random variable. The risk of grid failure is the probability that the level of compromise on the left-hand side of Equation (18) exceeds the ratio on the right-hand side of that equation. In Figure 10, we show risk to the grid as a function of the per-compromised AMI increase in demanded power. If the per-compromised AMI increase in demanded power is small enough, the risk to the grid from compromise is virtually 0. There is a range in which the risk to the grid is a linear function of . For large enough values of , failure of the grid is guaranteed.

Thus, knowing the increase in demanded power by a compromised cyber component can be useful for predicting grid failure or not. For example, drawing a horizontal line in Figure 10(a) or 10(b) at the level of risk to the grid that can be tolerated and locating the value of at which the line intersects the curve will provide information on the value of per AMI increased demand in power that can be tolerated. The increase in demand is an anomalous operating condition, which may be used for detection of compromise ([21], Recommendation 4.10, pg 92).

4. Conclusions

Our work involves coupling of a set of nonlinear differential equations for the dynamics of compromise and defense of the cyber system with either (i) a nonlinear generic performance function to understand cyberattack via stealth or (ii) the nonlinear rotor equations for an electric grid to understand the role of load-side compromise on the stability of the grid. Doing so allowed us to explore how an optimal time of attack when using stealth emerges (Figures 8 and 9) and to derive a canonical condition (Equation (17) or (18)) for stability of the electric grid in the face of load-side compromise.

4.1. Novel Contributions of this Paper and Connection to Recent Other Work

The innovations of our work include (i) a fraction of decoy components in the cyber system, which are not connected to the rest of the cyber system or the physical system and thus do not spread compromise but increase the probability of detection of compromise, (ii) allowing components of the cyber system to return to the un-compromised state either temporarily invulnerable or immediately vulnerable, (iii) adaptive Defensive Counter Measures that respond adaptively to attack and compromise, (iv) a generic metric of performance of the physical system that depends upon the state of the cyber system, and (v) coupling a model of the electric grid to the model of compromise of the cyber system that leads to a condition for failure of the grid in terms of parameters of both compromise and the synchronous motor model.

4.2. Directions for Future Work

We discuss future work that is related to the model of cyber compromise and the generic performance function, the model of cyber compromise and the electric grid, and how the methods and results presented in our paper relate to recent studies [1, 62] on the general topic of security in systems with cyber and physical components.

4.2.1. Related to Cyber Compromise and the Generic Performance Function

Topics emerging from the model for cyber compromise and generic performance function warranting future investigation include: (i) The dynamics of compromise and recovery of the cyber system show that there is an opportunity for further investigation of operational resilience by planning for degradation of the cyber system when designing the linked physical system and (ii) when the attacker relies on stealth, a natural extension of our results is to explore how Figure 8 changes when the probability of detection depends upon the rate of external compromise, which sometimes generates a “wake” that is detectable.

4.2.2. Related to Cyber Compromise and the Electric Grid

We have shown how load-side compromise can destabilize an electric grid. If the metric of performance for an electric grid is the fraction of the load delivered, then equation (11) does not need any kind of rescaling to capture performance of the physical system, as can be seen by comparison of our Figure 2 with Figure 3.14 in [21].

Although defending the grid by disconnecting the utility from the generator is one route to defense, it is an extreme solution. On the other hand, developing a mechanism to recognize when individual consumers are demanding an anomalous amount of power and then disconnecting those consumers from the utility is a potential defense. Smart meters report on regular schedule with a defined message structure in a predictable way [21]. Signs of compromise or malfunction could include another reporting structure, reports at unusual times, or reported values that are outside of the usual range [18]. An implication of these results is that it is important to match the detection strategy to the willingness to accept risk when the underlying demand for power has a temporal pattern. That is, at times of high demand for power, compromise in the cyber components of AMI may lead to instability of the grid, suggesting that the strategy for detection of compromise should be adjusted to the pattern of demands for power. Models such as we have developed here can help in the allocation of resources for detection. To make this idea fully operational, one would include a detection model [30] with false positives (consumers are not requesting additional power but are seen to be doing so), false negatives (consumers with compromised AMI are not detected), and an economic model to assess the costs of power disruptions [2]. Doing so is beyond the scope of this paper.

At the level of the utility, one defense is to have a mechanism, via generalized governors or electric springs [63], that increases the value of as the perceived demand for power increases. In this case, the cost is that the utility operates less efficiently as increases. As in the case of disconnecting consumers, developing the requisite models is feasible but beyond the scope of this paper.

Another future step is to expand the number of generators and loads. The Zealand grid [53] is an excellent candidate for such a next step. Alternatively, one of the recommendations concerning resilience of electric grids is to rely on various types of micro-grids [21]. By their very structure, such grids will have many points of access that interface with the external world [51], thus raising an entirely new set of issues for which the models in this paper can be applied. Similarly, the development of consumer-generated solar power that can be moved to the electric company (thus changing the historical one way distribution of electricity from the power company to the consumer to bi-directional transfer of power) raises issues for which the models developed here can be adapted. Furthermore, as the electric grid becomes more and more dispersed, resistance to and recovery from cyberattacks will increasingly depend upon rapid or even real-time measurements and responses [22].

4.2.3. Connections to Other Recent Work

We now briefly discuss how the ideas in our paper link to [1] and an additional recent survey [62] of filtering in networked nonlinear systems. In [1], the authors describe approaches to tolerate cyberattacks (based approaches such as game theory or control theory); these can be explicitly modeled and tested for functional effectiveness of the cyber system using our approach. Our work complements that in [1] and in the future, our work can be used to explore the impact of cyberattack on a smart grid, examining the potential for access-and-maneuver types of attacks to disrupt system control and place the power system into ineffective and, in some cases, destructive states.

The survey in [62] raises a number of potential extensions of our work and directions for future exploration including.(i)Our methods can be used for the analysis of particular communications and contract protocols, investigating within the system context which protocols are more or less resilient than others and the performance and effectiveness of different communications protocols/policies under different network conditions.(ii)Using our methods, one can directly experiment with computational representation of more granular network-induced complexities, and validate the generality of the analytic method and associated assumptions.(iii)A natural extension of our analytic methods is to treat other types or sources of volatility that may propagate or affect nodes in different ways.(iv)Because of the explicit representation of the compromise of the cyber system, our methods can be used to explore how filtering methods can be used to improve response in the face of loss of nodes when nodes play both sensing and communication roles and to explore the consequences when sensor and communications functions are impacted asymmetrically [64].(v)At a more theoretical level, our methods can be used to examine applicability of set-membership filtering to deal with system information censored or otherwise unavailable due to cyberattack [65].

4.3. Final Conclusion

In this paper, we provided a framework based on the population biology of disease for the analysis of compromise in complex systems with cyber and physical systems, gave examples of how the framework could be used for both a generic metric of performance and for a metric of performance involving the electric grid, and suggested opportunities for further explanation. Much remains to be done using this framework.

Appendix

A. Pseudocode for the Computations

The pseudocode for the computations given here follows the code available from the first author. Both the code and pseudocode are written in R, and organized in a way to allow other individuals to access the ideas directly. Thus, efficiency of computation has been sacrificed for clarity of development and presentation. In general, we adapt the form of the pseudocode from [65] and continue to use the mathematical notation from the paper, which is modified in clear ways in the actual code. Rscript can be obtained from the first author at [email protected].

A.1. Create the Generic Performance Function

To create the generic performance function in equation (12), use these steps.(i)Specify the parameters , and .(ii)Cycle over to .(iii)Cycle over to .(iv)For each combination of and , set and (v)The generic performance function in equation (12) for is then the product .

A.2. Specify the Remainder of the Parameters

The remainder of the parameters are as follows:(i)Those for the dynamics of compromise, defense, and detection (equations (1)–(5)): , .(ii)Those for the one-generator one-load model for an electric grid (Equation (16)): .

A.3. Run the Model of Compromise in the Absence of Defensive Countermeasure

In order to produce the results shown in Figure 4(a), solve equations (1)–(3) and (5) in the absence of DCM (that is, set ). We used the order Runge Kutta scheme in the package deSolve in R. The steps are as follows:(i)Specify the time increment when solving equations (1)–(3) and (5).(ii)Dimension the dynamic variables , and as vectors of length .(iii)Specify the initial conditions; if all components are initially uncompromised (as in this paper) these are and .(iv)To produce Figure 4(b), solve the differential equations and link to the generic performance function from Section A.1.

A.4. Create the Distribution of Times at which Compromise is Detected and the Level of Compromise at the Time of Detection

In order to produce the results shown in Figures 4(c) and 4(d), follow the steps given below:(i)Specify the number of replicates of the time of detection and create vectors and of length (ii)Cycle from 1 to (iii)For each , draw a random variable uniformly distributed on [0, 1] and determine the time for which and . Set equal to this time and , where is computed from Section A. 3.

A.5. Run the Model of Compromise when Defensive Countermeasures are Activated

In order to produce the results shown in Figures 5 and 6 in which a range of times of detection is systematically evaluated, we now solve the full deterministic-stochastic model, which requires the indicator function , which will be 0 for times less than the time of detection and 1 for times greater than it. We model this as the cumulative Gaussian distribution function, which is essentially a step function for small enough standard deviations. Thus, in the code, we compute by by adding one more equation to equations (1)–(5):where is the standard deviation (for computations, we used ) and is the time at which compromise is detected. Once equation (A.1) is appended to equations (1)–(5), one proceeds as follows: (i)Specify the value of and the range of times of detection; for the results shown in the paper, we used the vector .(ii)Cycle over detection times.(iii)Follow the same steps as in Section A.3 with equation (A.1) appended.(iv)For each value of , confirm that the so generated is essentially a step function at .

A.6. Sweeping Over the Fraction of Decoy Components

In order to explore the role of decoy components (Figure 7), we convert from a scalar to a vector; for computations we used and then proceed as follows: (i)Convert the fraction of decoy components to a vector with the range to be explored.(ii)Cycle over (iii)For each value of repeat the steps in Sections A.3–A.5; doing so generates all of the data needed for Figure 7.

A.7. When the Attacker Relies on Stealth

When the attacker relies on stealth (Equation (8), Figure 8), we have an explicit solution for the number of uncompromised cyber components ( in equation (8)) and the number of compromised cyber components (. This allows rapid exploration by sweeping over the rate of external compromise and rate of co-compromise , implemented with these steps.(i)Specify the range over which the rates of compromise and co-compromise are explored and create or replace the scalars and by vectors. Specify a vector for the time at which attack is initiated.(ii)Cycle over the rate of external compromise and the rate of co-compromise (iii)Cycle over time from to .(iv)For each time, compute from equation (8) and .(v)Cycle over each time of attack.(vi)Compute the performance directly from equations (9), (10), and (12).

A.8. The One Generator-One Load Model for the Electric Grid in the Absence of Compromise

As noted in the text, the one generator-one load model may show instability depending on the parameter values in equation (16), for which the transmitted power is proportional to . In order to explore the nature of this instability, proceed with these steps.(i)Specify the scalars and and a vector for the values of in equation (16).(ii)As above, dimension vectors for and of length and specify their initial values. Solve equation (16) (we used deSolve in R).(iii)Repeat the previous step with different values of to numerically determine the value of at which an instability develops for given values of and .(iv)In order to initially explore the failure of the grid due to increasing demand, choose values of , and for which the solution of equation (16) is stable. Create a vector that increases the demanded power.(v)Cycle over and for each value of it, solve equation (16) as above to determine whether the grid has stable or oscillatory behavior.

A.9. Couple the Model of Compromise and the Mode of the Electric Grid

One is now in a position to couple the models for compromise and the electric grid, in which in equation (16) is replaced by where is determined during the solution of equations (1)–(5) and proceeds with these steps.(i)Specify the value of (ii)Meld Sections A.5 and A.8 above, with in equation (16) being replaced by . This will be sufficient to produce the results shown in Figure 9. To produce the results shown in Figure 10, one adds the steps.(iii)Specify a vector for the increase in demanded power (which provides the x-axis in Figure 10) and critical values for determined from Section A.8; denote this vector by .(iv)For each combination of time of detection , , and , find the fraction of that exceeds the threshold given in equation (18).

Data Availability

All the results in this paper were generated using R version 3.6.1 (available at the CRAN website https://cran.r-project.org/) in RStudio version 1.0.143. R script for the models used in this paper can be obtained directly from the first author.

Conflicts of Interest

The authors have no conflicts of interest.

Acknowledgments

The authors thank Jeff Dunne for the suggestion of using a double logistic metric of performance, rather than a single logistic that depends only upon un-compromised cyber components; Sue Lee for modeling load-side failure of the electric grid due to compromised AMI; Paul Velez for the suggestion of adding the analogue of killer T cells; and colleagues in the Theoretical Ecology Group, Department of Biology, University of Bergen for comments in response to a talk by MM. The authors thank an anonymous referee for pointing out references [1, 62]. This work was performed as part of the employment of Jimmie McEver at the Johns Hopkins University Applied Physics Laboratory [JHU-APL]. Marc Mangel was supported by contract 15–1015 with JHU-APL.