Abstract

This paper proposes a conceptual model to simulate the response of sociotechnical systems to crisis. The model draws on a concept of “sociotechnical resilience” as the theoretical framework, which underscores the hybrid nature of sociotechnical systems. Revolving around the notion of transformability, the concept considers sociotechnical resilience to be constitutive of three fundamental attributes, namely, informational relations, sociomaterial structures, and anticipatory practices. Our model aims to capture the complex interactions within a sociotechnical system during a recovery process by incorporating these core attributes in the operational units embedded in a multilevel directed acyclic graph, information networks, and recovery strategies. Furthermore, the model emphasizes specifically the role of informational configuration during a disruption. We introduce two recovery strategies in our simulation, namely, random recovery and informed recovery. The former represents the unprepared responses to crisis, while the latter incorporates the reporting process to support the command centre in making optimum decisions. The simulation results suggest the importance of system flexibility to allow structural reconfiguration at the organizational level. Our proposed model complements the theoretical principles of sociotechnical resilience while laying a practical foundation of sociotechnical modeling for resilience enhancement in real-world settings.

1. Introduction

Since ecologist Holling propounded the concept of resilience [1], the notion of resilience as a system stability has ramified into various fields. As the world is seeing more turbulences and disruptions caused by ecological and human-made disasters, studies on resilience have burgeoned in various fields such as, to name a few, social systems [2], supply chain [3], enterprise management [4], catastrophe management [5], and coastal engineering [6]. Generally, resilience is defined as the systems’ capability to survive and maintain its function by absorbing or recovering from internal or external changes [710]. It is generally distinguished from the traditional concept of safety, which seeks to identify and eliminate negative behaviors within the system resulting in an accident. In contrast, the resilience concept recognizes the coexistence of both the negative and positive behaviors within the system and focuses on improving the probability of positive outcomes while reducing the probability of negative outcomes [11].

Recently, the concept of resilience has been adopted into the study of sociotechnical systems [12] to analyze resilient capacity of critical urban infrastructures such as power grid, water supply networks, and telecommunication/cyber infrastructures [13, 14]. It is driven by a realization that these critical urban infrastructures are fundamentally sociotechnical systems; they are composed of technical components while being run by human organizations. Conceptually, the sociotechnical framework helps to uncover minutiae interactions between human agents and technical apparatuses. In this area of study, the emphasis is placed on technical aspects [5, 15, 16], as well as on individuals and organizational entities [17, 18]. In advanced urban environments, sociotechnical systems are designed and built as complex adaptive systems that consist of multiple agents organized around a specific hierarchy, contain feedback loops, and embody emergent properties [1922]. Due to the complex nature of sociotechnical systems, understanding the behavior of these systems cannot be adequately achieved through a linear formulation because each of the components is interconnected in a nonlinear fashion. In such multiplex interactions, the behavior of sociotechnical systems is influenced by emergent properties that shape the dynamic movement of the system.

Following Hettinger et al., we consider computational modeling and simulation to be an effective method to study sociotechnical behavior [23]. Furthermore, given the hyperdynamic nature of sociotechnical systems, the agent-based model is a suitable method for the simulation purpose because of its flexibility to incorporate complex agent interactions. The merit of this method is that it can support the decision-making process when a structural change in a sociotechnical system is taking place [24]. Today, the use of agent-based methods is largely common in various fields such as air traffic control [2527], health care [28, 29], energy systems [30, 31], and complex organization [32]. However, the application of agent-based modeling in sociotechnical systems for resilience analysis remains relatively limited. This is likely caused by the availability of social and technical data for model validation, as well as the lack of a solid theoretical framework in the field. While sociotechnical modeling and resilience studies such as power grid networks in South Korea [33], Twitter interactions and epidemic processes [34], as well as supply chains and CO2 policies [35] are well validated by the technical and organization data, there is still a gap in incorporating the notion of sociotechnical systems as a hybrid entity.

To date, works on sociotechnical modeling continue to grow. We note substantial shortcomings in existing models in which the hybrid nature of sociotechnical systems is not strongly reflected. Some of these models are nearly completely devoid of social elements or at best taking into account marginal social variables [3639]. It is this gap that we wish to fill in this paper by offering a new approach in developing a sociotechnical model, and at the same time, using this model to create a computational simulation in sociotechnical systems. Since the existing framework of resilience remains fragmented between those emphasizing engineered features and those focusing on social and organizational conditions, we develop our model by emphasizing the hybrid nature of sociotechnical systems as they consist of the social constructs of people and technologies [12]. Thus, the concept of sociotechnical resilience is adopted in our model. This model also incorporates the paradigm of resilience as the complementary attribute of risk management which emphasizes strategies on minimizing loss or increasing the recovery rate [40]. The quantification of the system’s resilience adopts the use of critical functionality, a concept that is embodied in operational resilience [41]. This concept will be discussed in detail later in the paper. The following section will introduce the theoretical framework, which we have adopted in our model.

2. Conceptualizing Sociotechnical Resilience

Studies on sociotechnical systems are abundant, and conceptualizations of resilience are plenty. Yet, studies that combine the two are quite a few. One of them is conducted by Amir and Kant who have proposed the concept of sociotechnical resilience [12]. Recognizing the hybrid nature of sociotechnical systems, sociotechnical resilience is characterized as an inherent capacity built around transformability. In contrast to Walker et al. in defining resilience, adaptability, and transformability in the context of socioecological systems [42], transformability is placed at the core of sociotechnical resilience. Amir and Kant maintain a distinction of transformability in sociotechnical systems from the one in socioecological systems. This distinction is the result of the difference in temporal and spatial scale of the systems of interest. While Walker et al. focus on socioecological systems that centre on social and natural environments, sociotechnical systems are artificial systems where humans and machines interact in a structured configuration such as transportation systems, water supply systems, telecommunication systems, and energy systems. As a result, sociotechnical resilience revolves around the idea that the “building block” of sociotechnical systems are intentional hybrids [4345], meaning they are both technical as well as social at the same time; both are entangled entities where humans and technologies are social constructs.

Given its unique characterization of sociotechnical resilience, our model adopts this concept and frames the resilience of sociotechnical systems as an integrative capacity to cope with internal failures or external shock. This capacity lies in system agility to transform its configuration from one form to another. The process of transformation is extremely crucial because it facilitates repair and adaptation in the aftermath of crisis and disruption. The aftermath transformations involve technical, organizational, and even institutional reconfiguration following the changing environment after a disruption. Thus, it is transformability, the ability to transform, which constitutes sociotechnical resilience [12]. A distinctive feature of this concept appears from its emphasis on key attributes of sociotechnical networks rather than emphasizing protocols and processes of resilience enhancement as found in mainstream resilience analysis [4648]. Looking further into the way in which transformability is internally built within a sociotechnical system, there are three key attributes of transformability, namely, informational relations, sociomaterial structures, and anticipatory practices. We found the composition of these attributes to be more suitable for our purpose to develop our sociotechnical modeling of resilience. To grasp the meaning of these attributes and how it can be translated into a computational model, it is instructive to elaborate each of them as follows.

2.1. Informational Relations

Informational relations represent the production and distribution of information that are extremely crucial in crisis response. These aspects can be emphasized as one that “deals with how information flows in the systems to support continued operations.” As information is instrumental in determining how effective and efficient the coordination responds to crisis [4952], any sociotechnical system cannot afford to have weak informational relations. In the concept of sociotechnical resilience, informational relations refer to the pathways of information between machines, individual operators and managers, subsystems, and/or organizations. Informational links between various types of elements contain a specific meaning or context, which defines how it is received. The information exchange between machine and human may involve engineering medium and require technical knowledge (e.g., temperature level monitoring in a chemical plant or electrical load monitoring of a power grid by engineers), while information sharing between humans serves to manage interdependencies or coordination purposes [53, 54]. In our sociotechnical model, we treat informational relations as the reporting lines from operational units to command centre or local coordinator to inform the disruption impacts. The information will allow the command centre to decide and perform the optimum system recovery.

2.2. Sociomaterial Structures

The constitutive entanglement of the materials and human organization in sociotechnical systems creates what Amir and Kant called as sociomaterial structures. It is “structures” as they are defined by how each entity is interconnected with one another in a hybrid configuration. The entities in sociotechnical systems belong to social realm such as individuals, groups, and organizations, while at the same time belong to the material realm, thus hybrid in nature. An interesting example is a study by Orlikowski et al. on the use of Blackberry phones in a firm called Plymouth Investments. The communication using Blackberry which has the “push email” capability changed the organization’s communication norms by altering people’s expectations of availability, intensifying interactions, and redefining the working time boundaries [55, 56]. This example of sociomaterial practice shows how a Blackberry, a technological unit that is designed and configured by humans, in turn, changes the organization’s communication culture.

Another aspect that comes along with the hybridity is interpretive flexibility [57]. In the previous example, while Blackberry was intentionally designed to ease email communication, the pressure in the workplace may push people to intentionally use it beyond traditional communication norms. In terms of sociotechnical systems, this aspect determines how flexible the entities are structured or undergo reconfiguration during disruptions in sociotechnical systems. Therefore, in order to improve the resilience of sociotechnical systems, it is important to incorporate the characteristics of hybridity and interpretive flexibility when optimizing correct functioning and minimizing the malfunction probability of the technical dimension. In our proposed model, sociomaterial structures are embodied in agents as the operational units and represented as a network with a specific topology.

2.3. Anticipatory Practices

The last aspects of sociotechnical resilience are anticipatory practices, defined as a set of recovery protocols designed for an organization to rapidly bounce back from crisis or disruption to the normal operational state. In addition to this definition, the scope of anticipatory practices includes routine activities aiming to anticipate possibilities of future occurrences of events [58, 59]. Since anticipatory practices are highly dependent on the context of a system to function as disaster prevention and management, the conceptual model we designed only incorporates anticipatory practices as recovery strategies during a disruption. In this way, the recovery protocols are reflected as the strategies of the command centre in determining the order of nodes to be repaired.

In this paper, we propose a computational model to simulate sociotechnical resilience, taking into account the three core attributes discussed above. Each of these attributes is translated into the model as a multilevel directed acyclic graph (DAG) of sociotechnical units, reporting lines, and recovery protocols. Furthermore, the simulation is designed to show the performance and resilience of various information flow strategies for a given disruption scenario and physical network configuration. By incorporating information flow in the model, we are aiming to expand the understanding of the complexity of sociotechnical resilience thus helping researchers and practitioners to plan, design, build, and develop organizational and technical aspects in infrastructural systems.

3. Modeling Sociotechnical Resilience

Following the concept of sociotechnical resilience discussed above, our model of sociotechnical resilience is constituted by a graph , where is a set of nodes connected by a set of links . The graph has levels of nodes. Each node represents a basic structure of a typical control loop as illustrated by Leveson in which an automated controller is supervised by a human controller [60]. For our purpose, we simplified this control loop into what we called as “operational unit” that serves as the building block of sociotechnical systems in our model. Figure 1 shows our model of operational unit which consists of a human operator and machine. Each unit serves a certain amount of demand in its service area, producing output for the system. The operation of the unit may depend on the service of other units. In that case, a directional link is created from the “dependee” node to the “dependent” node. A link can only be created from the upstream unit (upper level) to the downstream unit (lower level). Creating a link between units at the same level or from a lower to a higher level is not allowed. When a unit is disrupted, all of the downstream units which are connected to the disrupted unit will be disabled. Consequently, units that are disrupted, disabled, or both cannot produce output and service to other units. For simplicity, we only consider one type of system, e.g., power systems that are serving population/households in many cities, or subway systems operating across the city to serve mobility demands of commercial, industrial, or residential areas. Further modification and adjustment of the model will be needed to consider multiple interdependent infrastructures.

The performance or the critical functionality at time for our model is defined as the normalized total output of the active nodes in the system (as shown in equation (1)). Active nodes are the nodes that are not in disrupted or disabled state. The output of each node is generated randomly between 0.01 and 1. The output becomes 0 when the node is in a disrupted or disabled state and back to its initial value after it is recovered or being active again. The resilience for each recovery strategy is calculated using equation (1), where and are the simulation duration and number of simulations, respectively:

In this model, we applied two recovery strategies, namely, random recovery and informed recovery. The former represents the less prepared mode of response during a disruption. For example, when some sections in a subway system are disrupted, the recovery team may be sent to the disruption sites without particular patterns. It is primarily based on received emergency reports without considering the number of passengers in each affected station and trains at that time. This is due to the lack of information and coordination at the organizational level. So, the system responds to crisis in a suboptimum manner. The latter refers to a situation in which the system, before taking action, first considers the impact of each disrupted node to the system performance, thus allowing the command centre to prioritise the most “rewarding” nodes to be repaired. Using the previous example, the command centre of the subway system will take action based on the information of the number of affected passengers sent by all station head and train staff to determine the most critical section to be repaired first. The fundamental distinction between these two strategies lies in the amount of information used to make decisions and take actions in response to disruption.

In the random recovery strategy, each disrupted node is chosen randomly to repair. The time needed to recover one node is . This strategy is illustrated in Figure 2. The second recovery strategy considers the impact of each disrupted node to the system performance before deciding the repair order. The impact of a disrupted node is calculated by summing the output of the node and all of the disabled nodes as the result of its disruption (equation (2)). In other words, the disabled nodes are also the descendants of node . Subsequently, the information of the disrupted node’s impact is passed to the command centre or local coordinator. It takes time step to complete the sending information process.

Since there are multiple ways in which informational links are structured and used, we decided to develop three different simulations of reporting lines configuration. The first is the direct reporting mode, where all of the nodes in the system report to the command centre directly. The second configuration is the hierarchy reporting mode, where each node reports to its local coordinator. There is one local coordinator for each level. After it receives information from all disrupted nodes of its respective level, the local coordinator will send all of the information it has collected to the command centre, also in time step. The third configuration is the hybrid mode, which has the same configuration as the hierarchy reporting mode, but a certain percentage of disrupted nodes are reporting directly to command centre. These three configurations of reporting lines are shown in Figure 3. It should be noted that for the local coordinator or command centre, the sending or receiving information process cannot be executed in parallel with another process. Thus, the local coordinator or command centre can only receive information from one unit at one time:

After the command centre receives all of the information directly from nodes or from local coordinators, it will make an optimum decision to determine an order of the disrupted nodes to be repaired. The decision is based on the projected gain obtained when a disrupted node is repaired. The gain is calculated using equation (3), where is current time and is the number of disrupted nodes in the upstream levels which provides service to node directly or indirectly. In this scenario, a disrupted node is counted in if its disruption will cause the node to be disabled:

Figure 4 illustrates the informed recovery strategy using a simple example, where we assume that , and each unit has the same output . In this case, the disrupted nodes are node 2, node 3, and node 6. For the case of the direct reporting mode, the command centre will receive information from all disrupted nodes at since each unit takes time step to report to the command centre. The information received by the command centre is the impact value of each node. Node 2 has 7 direct and indirect dependent nodes, thus having an impact , while node 3 and node 6 have impact values of 4 and 1, respectively. And then, the gain for each node is calculated by the command centre. Repairing node 2, 3, and 6 will gain , , and . Therefore, the repair order will be node 2, node 3, and then node 6. Since the time to repair a node is 10 time steps, the system will get fully recovered at .

4. Results and Discussion

In each simulation, we generated a network composed of 300 nodes in four levels : . Level 1 is the highest level and level 4 is the lowest level, meaning there will not be any incoming link to any node in level 1 and outgoing link from any node in level 4 since a node cannot provide service to the same or higher-level nodes. In generating the network, each node from level 2 and lower will establish one link randomly with a higher-level node. There is a probability that the node can have additional links. The links are directional and always flow from the higher-level nodes to the lower-level nodes. For example, a node from level 3 may have a link from a level 1 node and two links from level 2 nodes. In our model, we set the probability . Each simulation is started by generating the output value for each node randomly between 0.01 to 1. Afterwards, a disruption event is generated, causing percent of nodes in each level to be disrupted which consequently disables all of their dependent nodes. The recovery time and sending information duration are set to and , respectively.

We started by simulating the random recovery strategy for initial damage to 10%, 20%, and 30%. For each disruption level, we averaged the results over 1000 simulations. Figure 5 shows the performance of this strategy. This result shows the impact of initial damage to the system, where a 30% initial damage can cause a drop in system performance to be as low as 0.2. In this strategy, disrupted nodes are recovered in a random order.

The comparison between the performance of random and informed (direct reporting mode) recovery strategy for initial disruption is shown in Figure 6(a). This series of simulations demonstrates that informed recovery strategy is better compared to random strategy since it takes into account the output and impact value of each node before determining the order to repair the disrupted nodes. The process of receiving information is reflected during the early phase of the informed strategy, where the performance does not increase since no disrupted node gets repaired. After that, the performance hikes significantly after it starts repairing nodes by prioritizing the highest gain nodes to be repaired first. Interestingly, in case of a small initial disruption , as shown in Figure 6(b), the performance between those strategies does not differ significantly. In fact, random strategy may be slightly better off due to the time constraint.

For a large-scale disruption, the high number of disrupted nodes which needs to report to the command centre may cause a bottleneck effect, slowing the decision to start the recovery process. For example, Preece et al. identify the potential of bottleneck effect at the information system of the UK’s emergency call centre in case of large-scale disaster [61]. On an individual level, the time pressuring situation, complex, and high information intensity environment can result in cognitive overload due to the mental capacity, which is known as bounded rationality [62, 63].

Regarding the bottleneck issue, the reporting line in the system can be reconfigured to adopt a hierarchical structure. This structure is common in disaster management, which utilizes incident management structure as the command centre to coordinate ambulance, hospitals, police, and fire department [64]. Another example also includes the tiered-structure systems in medical surge capacity management, where the medical response and resource allocation are distributed based on the disaster severity which can be a local, state, interstate, or national level [65]. Our results in Figure 7 demonstrate the advantage in applying the hierarchical reporting mode compared to random and informed-direct strategies. The gap between the resilience value of the hierarchical and direct reporting mode becomes wider as disruption percentage increases, indicating a better performance of the hierarchical reporting mode during major disruptions.

While the hierarchical structure is the most practical and widely adopted, we also investigated the hybrid reporting mode. This mode is a mix of direct and hierarchical structures. We can assume that this is the case when the command centre will not only follow the standard rules of the hierarchical structure by waiting for reports from the local coordinator, but it also proactively seeks information directly by itself. In a specific scenario, this type of information flow can be applied. For example, when a disaster happens in a certain area, the rapid health assessment (RHA) teams are sent to the location to assess the condition and to measure the medical logistics needed for that area. The RHA teams can be provided by the local or national government. These combined resources accelerate the information gathering for the central facility to make the best decisions regarding medical resources allocation to the affected areas. In establishing the reporting lines configuration of the hybrid mode, we used a parameter as the probability that a disrupted node will report to the local coordinator, not directly to the command centre. Figure 8 shows the mean of the resilience of the informed-hybrid recovery strategy for various levels of probability . The optimum value of the parameter for this network is 0.65, meaning that the probability of a disrupted node reports to the local coordinator is 65%. Using this value, we compare the resilience curves of all of the recovery strategies (Figure 9) to show how system resilience can be improved by configuring the informational structure.

In practice, our model emphasizes the importance of flexibility in informational relations between entities in a sociotechnical system. Flexibility allows adaptation at an organizational level during a crisis, which can be embodied through different modes of reporting lines configurations. For example, the simulation results can be used to guide stakeholders indesigning the system to be able to adapt to the information flow in the system based on various disruption severity, such as using the direct reporting mode for low disruption severity to minimize the cost of manpower, and then change to the hierarchical reporting mode for a rapid and efficient information processing in case of moderate to high disruption severity or directing resources to partially bypass the hierarchical structure for an even faster information collection during large-scale disruptions.

Furthermore, the results of our model and simulations were meant to capture the complex interaction in sociotechnical systems while incorporating the core attributes of sociotechnical resilience. We model the sociomaterial structures as the network of operational (sociotechnical) units representing the human operator and the machine, along with the dependencies of services between those units. Each unit also has output value to represent the demand to be served in its respective area. Furthermore, we model informational relations as reporting lines configuration, which allows each unit to report the impact of disruption to the local coordinator or command centre. The decision in a form of recovery order is based on this information of disruption impact. The anticipatory practices are embodied in the recovery strategy employed by the command centre. In this model, we demonstrated how information network plays an impactful effect on the systems’ capability to respond to a crisis. While adaptation capability is not implemented in the simulations, our model provides insights into how adaptation can be taken in real-world settings through information network reconfiguration at the organizational level based on the existing sociomaterial structure and the scale of the crisis.

5. Conclusions

Sociotechnical system is not simply an aggregation of social and technical aspects, but it is hybrid in nature. This is the underlying feature of sociotechnical resilience. In this study, a conceptual model has been proposed to lay a stronger foundation to translate the abstract concept of sociotechnical resilience into practical forms. As explained throughout the paper, our research introduced a novel way of modeling sociotechnical resilience as a hybrid phenomenon reflected in network-based interactions which emphasized the role of informational flows in the recovery process of sociotechnical systems. The model allows the adjustment of various configurations of informational relations, rendering it to be useful when stakeholders wish to enhance infrastructure resilience. This is achieved through a computational modeling that informs the design process of a better sociotechnical structure, information-sharing networks, and recovery strategies.

By taking into account the behavior of complex sociotechnical systems, we incorporated the attributes of sociotechnical resilience, namely, informational relations, sociomaterial structures, and anticipatory practices. Our model shows the interplay between these resilience factors through a multilevel directed acyclic graph, reporting lines configurations, and recovery strategies. The practical implication of our model is to guide stakeholders to efficiently and effectively plan their resources to be used in response to a crisis. For example, a direct reporting structure can be applied for a small-scale disruption. Actions can even be taken immediately for a very small disruption as shown in the random recovery scenario. For larger scales of disruption, the reporting lines should be reconfigured to a hierarchical structure to prevent a bottleneck, thus increasing information processing for command centre. The system performance can be further improved in case of large-scale disruption by having flexible procedures, where the command centre utilizes its human resources to proactively receive or seek information on the impact of each disrupted unit.

It should be noted that our model is applicable under three conditions. First, the system has to be hybrid where the operation does not take place in a purely technical realm such as electrical circuit or mechanical devices, but it has to involve organizational interactions in which information and coordination are shared and negotiated. Consequently, the model may have some degrees of unpredictability because it deals with organizational behavior. This is reflected especially in the random recovery scenario. At the same time, as simulated in the informed recovery scenario, we showed that the degree of unpredictability can be minimized through information and coordination that can only be provided by human operators and managers, not only from sensors or automated mechanisms. The second condition is that the system must have dynamic properties such as demands of the population it seeks to serve in its specific operational area. It entails that these properties change from time to time. Lastly, our model is suitable more for systems that are constructed around networks, where the degree of complexity is high. Having said that, we need to note that our present study did not observe the resilient behavior of sociotechnical networks across different systems. Therefore, further study must address this limitation in order to model large-scale interdependent sociotechnical systems.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is an outcome of the Future Resilient Systems project at the Singapore-ETH Centre (SEC), which was funded by the National Research Foundation of Singapore (NRF) under its Campus for Research Excellence and Technological Enterprise (CREATE) programme (FI 370074011).