Abstract

Change-oriented risk management is the key content of civil aviation safety management. Hazard identification is considered as one of the most difficult and flexible parts. To address the risk management due to changes introduced in existing systems, in this paper, a system change-oriented hazard identification (SCOHI) model is firstly proposed. The SCOHI model identifies hazards by integrating “5M” (mission-man-machine-management-medium), and hazard and operability (HAZOP) techniques specify changes in a system and the associated impacts on the surrounding environment. Compared with the traditional brainstorm process, the SCOHI model provides an explicit way for hazard identification in a dynamic environment. Then, taking an air navigation service provider (ANSP) in Northwest China as an example, a case study of system changes from nonradar control operations to radar control operations is analyzed. The effectiveness and applicability of the SCOHI model are tested with a risk assessment. The results from the preliminary evaluation show that the four key system change-oriented hazards are air traffic control (ATC) skills, staff capacity, control procedures, and airspace structure. In addition, the “Man” category accounts for around 55% of the total risk, ranking number 1, followed by “Management,” “Medium,” and “Machine” categories. Finally, a sound risk control strategy is provided to the ANSP to help in controlling the risk and maintaining an acceptable level of safety during system changes.

1. Introduction

Safety assessment and risk management play an important role in civil aviation safety. They continuously help identify and trace hazards and suggest mitigation against risks in order to maintain an acceptable level of safety and enable systems to function in a proper manner. Hazard identification aims to find adverse sources and unsafe conditions that may lead to the occurrence of undesired events. Hazard identification is considered one of the most difficult and flexible parts for safety analysis and hazard prevention [1].

In an air transportation system, there are a set of procedures, people, and equipment. Explorations regarding hazard identification have been undertaken by numerous researchers, scholars, aviation experts, airline managers, and policy makers. Depending on the hazard identification sources and the approach to hazard identification, three groups of methods for identifying hazards in civil aviation are defined in the International Civil Aviation Organization (ICAO) Doc. 9859 Safety Management Manual (SMM). (1) Reactive: A reactive method collects hazards by looking into incidents and accidents that have already occurred. (2) Proactive: A proactive way uses all possible means to address hazards before it brings out any adverse effect. These techniques may include safety survey, safety audit, or voluntary safety reporting system. (3) Predictive: Predictive refers to the applications of statistics with the purpose of predicting future potential hazards [2]. However, due to the fact that there are no two identical systems in the world, the one-size-fits-all hazard identification technique does not exist. As a result, various researchers and practitioners and aviation industries such as airport, airlines, and air navigation service providers (ANSPs) developed their unique methods and techniques for hazard identification. For example, in the field of ANSPs, the European Organization for the Safety of Air Navigation (Eurocontrol) released its regulatory document named risk assessment and mitigation in air traffic management (ATM) in early 2001, which mandated the safety assessment in the ATM industry. Eurocontrol also established a set of methodologies and tools called the safety assessment methodology (SAM) to guide the implementation of the ATM safety assessment in Europe [3]. In the U.S., the System Safety Handbook (SSH) was published by the Department of Defense (DoD) and the Federal Aviation Administration (FAA) [4].

As air transportation is a highly technology-driven industry, upgrading existing systems is frequently happening in operational centers. The changes to a system will definitely lead to changes of system risk. Thus, safety assessment in civil aviation should find out what could be the new risks caused by system changes and to what extend the system safety output could be affected. The conventional hazard identification techniques, such as failure mode and effect analysis (FMEA), fall short. First, current hazard identification techniques are designed to apply to an existing system, and the changing risk and impacts are generally not included in hazard identification procedures. Second, the aviation operation system is a large-scale, embedded, real-time, and safety-critical system, with a complex human-machine-environment interaction. System changes are recognized to be a difficult and costly problem and a major source of risk in terms of cost, schedule, and quality. A change analysis is generally conducted at the last stage of current safety assessment. A more proactive approach should be taken to the hazard identification and analysis of changes and the associated risk.

Therefore, the objective of this study is to propose an effective risk management method for hazard identification and risk control under system changing circumstances. Taking an ANSP center in Northwest China as a study case, the main tasks are to identify new risks associated with the operational changes of the existing system, subsystem, or system components, measure the associated risk, and finally provide an efficient guideline for risk control.

2. State of the Art

2.1. Civil Aviation Safety Management System

Nowadays, a great number of methods and techniques have been successfully developed for safety practitioners to enhance aviation safety in real world applications [58]. In particular, the PDCA cycle ( plan-do-check-action), total quality management (TQM), quality management system (QMS), and safety management system (SMS) have archived great impacts on air safety improvement [911]. The ICAO has mandated the implementation of a SMS in airlines, airports, ANSPs, and aircraft manufactures [12]. A SMS includes necessary organization structure, accountability, policy, and procedures [13]. It not only employs the PDCA cycle and deals with safety issues in quality, environment, and finance sectors, but also incorporates safety under a general management framework [14]. In 2006, the ICAO developed a comprehensive framework named Doc. 9859 safety management manual for SMS. Then in 2013, the ICAO further upgraded the requirements of SMS by releasing a new Annex 19 Safety Management. Annex 19 discusses state safety program (SSP), SMS, and other safety management practices and establishes an aviation safety management framework for the ICAO’s contracting states [2, 13]. In SMM and Annex 19, the ICAO outlines four fundamental pillars of SMS; they are safety policies and objectives, safety risk management, safety assurance, and safety promotion. Safety policies and objectives provide a framework and benchmark for the SMS achievement. Safety risk management plays an important role in identifying hazards, assessing related risks, and developing appropriate mitigation. Safety assurance monitors the compliance with standards and regulations in conjunction with the routine usage of gap analysis (GA). It also provides a confidence level for SMS operations and evaluates the effectiveness of SMS strategies. Safety promotion provides training and other necessary activities in order to increase safety awareness and generate positive safety culture within the organization [2, 13].

2.2. Safety Assessment Process

As shown in Figure 1, the general process of safety assessment includes hazard identification, risk analysis, risk control, and assessment documentation or report [1]. Theoretically, identified hazards are assessed in terms of severity () and likelihood/probability () of consequences, and they are prioritized in the order of risk-bearing potentials. Then, hazards are generally assessed by a group of experienced professionals through standardized techniques and analytical procedures. If the risk () is considered acceptable, operation continues without any intervention. If it is not acceptable, a risk mitigation process will be engaged. During these processes, documents that record the whole process are generally produced as evidence to show that all risks have been identified and managed and will not bring any unexpected consequences [1, 4].

2.3. Conventional Hazard Identification

Even though there is a large body of research dealing with hazard identification and management [1, 15, 16], predictive studies are commonly used in the aviation industry; they are presented as follows:(i)Functional hazard assessment (FHA). The FHA is a predictive technique that attempts to explore the effects of functional failures of parts of a system. Hazards are extracted through consequence analysis on certain functions lost or degraded. Eurocontrol uses the FHA as part of the safety assessment methodology (SAM). As a primary hazard identification tool, the FHA is usually used in the early stage of system design. It is directive and excessive information is not mandatory. Meanwhile, it has limitations, for instance, it may not go thoroughly throughout the system, and external conditions are not fully considered.(ii)Hazard and operability (HAZOP). The HAZOP methodology is a process hazard analysis (PHA) technique used worldwide for studying not only the hazards of a system, but also its operability problems, by exploring the effects of any deviations from design conditions [17]. The technique takes different parts of a system into consideration, such as hardware, software, procedures, and human operators. An important feature of the technique is the application of a combination of parameters and guide words, which are used as the hazard index. The parameter pressure, for example, is generally combined with the guide words “more,” “less,” or “other than.” The HAZOP is widely adopted in safety-critical industries; however, it is sometimes subject to participants’ expertise and experience.(iii)Failure mode and effect analysis (FMEA). The FMEA aims at analyzing potential failure modes of a system and evaluating possible negative effects related to systems, designs, and processes [18, 19]. The FMEA generally works out a worksheet, which includes descriptions of components, failure modes, failure rate, causal factors, effects, detection, and actions. The FMEA is one of the earliest structured reliability and risk analysis methods, and its advantages and disadvantages are also obvious. Decades of application provide helpful guidance to users. However, properly executing a FMEA generally means lots of paperwork and is time-consuming. In some instances, information missing or incorrect output may also exist due to an expert’s “blind spots.”(iv)Fault tree analysis (FTA). The FTA uses a binary tree-structured notation based on Boolean logic to identify root causes of an undesired event and to calculate related probability. The purpose of the technique is to graphically present a tree representing possible normal and abnormal events that can result in a top-level undesired event. The FTA is commonly regarded as a classic quantitative technique in reliability and risk analysis. It provides a powerful tool to let people see the paths between causes and accidents; thus, it can find key points to accidents prevention. The FTA starts with a fault or a failure, not a process or parts of the system, so its result may not present a holistic view. On the other hand, when a system is huge and/or complex, the FTA will be difficult to finish without professional software.

Most of these hazard identification and analysis techniques originated from industry and work well in a hardware system. However, things become quite different when applying them in a complex system with a more human-machine-environment interaction. On the other hand, these techniques are generally designed to apply to an existing system or daily operation. As for the safety assessment caused by system changes, especially changes that happen in complex system such as aviation, these techniques normally fall short.

3. System Change-Oriented Hazard Identification

3.1. System Changes

In this paper, a conceptual framework for system change-oriented hazard identification (SCOHI) is developed that is intended to have the effect of facilitating the changes from the current 5M model to one in which the change is anticipated and managed in an informed way (see Figure 2). The framework illustrates the relationship between them in the influence of system changes.

The “5M” refers to mission, man, machine, management, and medium; those are the five core areas in which accident/incident causing factors may appear. The “5M” provides a clear frame for the description of the system and its working environment. Each element of the “5M” could be broken down into subelements or factors based on the specific system that needs to be assessed. As shown in Table 1, the relevant factors in the ANSP field are listed as an example. The “C” refers to changes. The changes are a difficult and costly element because of the uncertainties and risks associated with them. The Hazard and operability (HAZOP) methodology will be applied to support the system change analysis. First, a list of key features or elements is developed in the identification of the malfunction of a specific process. Second, a set of guide words, such as “more or less,” “early or later,” and “increased or decreased,” are used to reflect the changes of system in different 5M areas. Table 2 provides a framework to identify any changes of a system in the civil aviation ANSP field.

3.2. Hazard Identification and Risk Assessment

Hazard identification is regarded as the key for safety assessment. Developing a rational change identification worksheet is extremely important for hazard identification when using the SCOHI. To assess the risk associated with a change, it is necessary to be able to assess both the probability of change and the impact of that change [20]. Therefore, the change analysis should consist of both the sensitivity analysis and impact analysis. The sensitivity analysis predicts which changes are highly sensitive to the system. The impact analysis predicts the consequences of change. The combination of sensitivity and impact provides a measure of risk consequence. Based on the general safety assessment procedure, the SCOHI model employs a three-step hazard identification approach (see Figure 3). In the first step, a system and its environment should be clearly described so that the system, its subsystems, and components are well understood by the people working on the task of safety assessment. All factors within the working environment that may affect the operational result are required to be clearly identified and defined as well. The second step will work on the change identification worksheet, i.e., to identify changes that might happen pertaining to the system and its working environment. The third step will be based on the information provided by the sensitivity analysis and impact analysis of changes to define the consequence score of the risk.

After the application of the SCOHI model, the risk assessment could be conducted. Assuming that there are risk consequences and the likelihood associated with this risk , then, the normalized risk score r of this hazard iswhere is a measure of scale; here, we use the maximum value of  = 25. If and , it means there is no system-oriented change happening. Then for all , three levels of risk are designed for this hazard:

4. Case Study

The case study is conducted on an air navigation service provider (ANSP) named “Z” in China. The ANSP is an organization that provides the air navigation service on managing the aircraft in flight or on the maneuvering area of an airport. Air traffic control (ATC) service is the most important service provided by an ANSP, which is to prevent collisions, organize, and expedite the flow of air traffic and provide information and other support for pilots. Controllers provide instructions, clearances, and flight information to guide the flights from one point to another point. The separation between aircraft depends on the communication, navigation, and surveillance technologies. The accuracy of aircraft position provided to controllers directly affects the minimum separation between aircraft; thus, the number of aircraft that could be handled by one controller. In our study case, “Z” manages one of the busiest airspaces in China. Along with the fast growth of daily flights, air traffic controllers’ workloads have been complex and stressful. Therefore, operational optimization and effective means are expected to maintain or even smoothly expedite air traffic flow. Transition from a traditional procedural control (or nonradar control) to a radar control is considered as one of the important approaches that have been taken to accommodate the rapid growth of air traffic in the airspace. Thus, radar control implementation is identified as a vital change to the current system.

Some background knowledge needs to be understood before the application of the SCOHI method. The traditional nonradar control runs based on pilots’ position reports and time-speed calculation from point to point and solves the conflicts between aircraft by applying complex separation standards. Under the nonradar control situation, controllers require pilots to continuously report their position when passing by specific navaids or waypoints and send instructions to pilots based on the reports. The fact is that due to invisibility of aircraft by line of sight, controllers have limited the holistic picture of the entire air traffic situation. For safety purposes, controllers have to separate aircraft with a little more-than-needed separation (or safety margin), which means less capacity in the airspace. On the other side, when radar control is applied, the position reports are no longer considered as a mandatory action performed by pilots. With the direct monitoring airplanes on the radar screen, controllers could vector aircraft effectively and manage more aircraft, which reduce largely the air to ground communication. Moreover, as radar provides a much more precise position to controllers, compared with pilot’s oral position reports, the required minimum separation between aircraft is largely reduced. Consequently, the capacity in the airspace could be increased.

4.1. Application of SCOHI

First, a group of 12 participants were formed as a safety assessment working team. A relevant air traffic managerial department was notified to provide support for the working team. The working team consists of air traffic controllers and aviation experts (their expertise ranges from aeronautical telecommunication, navigation, and radar to ATC-integrated automation system). In addition, safety experts from universities and experienced air traffic controllers from other air traffic control centers are invited to work together. Second, a general safety assessment procedure was followed in the safety assessment for the transition of the air traffic control surveillance method in “Z” air traffic control center. Together with the application of the proposed SCOHI model, safety assessment requirement for ANSPs issued by the Civil Aviation Administration of China (CAAC) was applied in the assessment, particularly for the classification of the hazard severity, likelihood, and risk classification matrix. Third, within the “5M” framework, a safety assessment worksheet was developed to find changes at different levels of the system. Parts of the safety assessment worksheet and its outputs, the “Man” part, is showed in Table 3. Several sessions of brainstorming processes that involve controllers, technicians, and safety experts had been undertaken, and possible changes and derivative hazards were addressed and analyzed through free and open discussions.

4.2. Results and Discussion

The safety assessment results were obtained after several meetings that were guided by safety experts, and documents were recorded afterwards. They are shown in Table 4 and Figures 4(a) and 4(b).

As shown in Table 4, four key hazards were identified with the risk level labelled as “Unacceptable,” and they cover man, management, and environment categories. The four key system change-oriented hazards are ATC skills, staff capacity, control procedures, and airspace structure. One hazard was identified with a risk level labelled as “Tolerant,” that is training. It is found that for risk titles with functionality and task, the risk score r is 0. It means that no changes are found in this area, so the risk level R is acceptable. In total, the risk assessment heat map with all the thirteen risk items is shown in Figure 4(a). It is easier to find out the relative position of different risk titles in terms of likelihood and consequence. In addition, different risk categories have different risk impact due to system changes. As shown in Figure 4(b), the “Man” category counts for around 55% of the total risk, ranking number 1, followed by “Management,” “Medium,” and “Machine” categories.

The most important 5 hazards associated with transformation from the nonradar control operation to the radar control operation and the risk control suggestions to mitigate those risks could be described as follows:(i)ATC skills: controllers’ previous working experience is not suitable for radar control operations; they need to improve their skills on radiotelephony communication, conflict detection, and resolution with the application of radar separation standard, situation awareness, and radar screen scanning. To control the risk, first, the associated simulation training must be accomplished before the implementation of radar control on-site. Second, several supervisor positions should be open at the beginning of the test phase of radar control operation. Third, training should be updated to target new problems emerging during the radar operation.(ii)Staff capacity: in radar control operations, the traffic flow volume will be much higher than that in nonradar control operations. Due to a workload issue, the required number of controllers must be increased. However, there is a long cycle to train a controller in the field; thus, it is necessary to develop the workforce gradually in several years. The number of sectors in future radar-control airspace should be carefully considered due to staff capacity.(iii)Control procedure: most of the control procedures will be modified to adapt to radar control operations, especially the transference of flights between two sectors, coordination between different control units, flow management procedure, minimum radar vectoring altitude, and flight procedures. To control the risk, simulation training and theoretical assessment are necessary.(iv)Airspace structure: under radar control airspace, sufficient maneuvering airspace is necessary to solve the conflict. Well designing the airspace structure, routes, and sectors makes a foundation for the future operation. A better airspace structure will increase the airspace capacity and safety. To mitigate the risk associated with airspace structure, a team that consists of controllers, airspace design experts, and different airspace users should be set up to discuss and find a suitable solution for the future airspace structure.(v)Training: training is vital for the successful transformation of nonradar operations to radar operations. The training schedule and topics should be designed in consideration of controllers’ workload. According to the CAAC aviation law, no less than 40 hours radar training is mandatory for each controller. The performance of each controller must be assessed at the end of training.

Following the CAAC aviation law, planning the training program is important to control the associated risk because the aforesaid risk control processes have different risk control impacts on the time scale. Based on experiences, it may take several years to mitigate those hazards from “Unacceptable” or “Tolerant” levels to an “Acceptable” level. In the case study, the risk control plan for the next five years is illustrated in Figure 5. In this plan, we attempt to control the risk step by step. In one year, the four key hazards should be mitigated to a “Tolerant” level, then to an “Acceptable” level in the next two or three years. It should be emphasized that staff shortage problem is a mid to long term issue; so, it takes more time to finally decrease the risk of “Staff capacity” to an “Acceptable” level.

5. Conclusion

This paper mainly focused on hazard identification that is commonly regarded as one of the most important considerations in aviation risk management and safety assessment. In order to deal with the system-oriented changes and the associated risk, a hazard identification SCOHI model combining the “5M” model and HAZOP techniques was proposed. By applying the proposed methods on a real ANSP in China associated with professionals, it is found that the “Man” category should be paid extreme attention for risk control in comparison with the other four categories: machine, management, medium, and mission. Moreover, referring to the four key system change-oriented hazards, such as ATC skills, staff capacity, control procedures, and airspace structure, and one tolerant hazard, such as training, a risk analysis and a control plan were discussed. In the end, the SCOHI model was regarded as effective in an on-site safety assessment activity of an air traffic operation center in China. This study was one of the first of safety assessment probes in China’s civil aviation industry, and it was regarded very useful to the upgrade of air traffic control operation in other regions.

Data Availability

The datasets generated and/or analyzed during the current study are not publicly available due to security issues but are available from the corresponding author upon reasonable request.

Disclosure

The views expressed in this paper are entirely those of Man Liang and do not necessarily reflect UniSA or Australian government policy.

Conflicts of Interest

Man Liang works for the University of South Australia (UniSA).

Authors’ Contributions

Yiran Xie assists in some of the paper draft preparation.

Acknowledgments

Le-ping Yuan has received research grants from the Civil Aviation University of China. This work was supported by the National Key Research and Development Program of China (Grant no. 2016YFB0502405).