Abstract

The International Civil Aviation Organization (ICAO) has mapped out Single-Pilot Operations (SPO) as the core development direction for the next generation of commercial aircraft operations in 2030. Safety is a key airworthiness factor in commercial aircraft design. Due to the higher degree of air-ground task collaboration and complexity in the SPO mode, the traditional safety analysis methods applied in two-pilot mode cannot effectively identify the potential hazard patterns in the system. To address the above problems, a safety analysis method that combines model-based safety analysis (MBSA) with hazard pattern mining is introduced, and a differential bicluster mining algorithm named TFCluster is proposed to identify maximum differential biclusters from real-valued function-resource matrices without candidate maintenance. Experiment studies on public datasets indicate that TFCluster is efficient and scalable, and outperform the existing differential bicluster algorithms. Taking the typical operating scenario—midterm conflict resolution in the SPO mode as an example—safety analysis of air-ground task collaboration for flight conflict in the SPO air-ground collaborative architecture is carried out. It is found that the proposed method can effectively identify potential hazard patterns, feedback to the system architecture design, and assist safety analysis.

1. Introduction

With the enhanced capability of the airborne systems, the crew size of the commercial aircraft is correspondingly reduced. As an important direction for future aviation technology, single-pilot operations (SPO) of commercial aircraft have completely revolutionized the traditional dual-pilot operating mode, which has drawn widespread attention from main manufacturers, airlines, and scholars in the world.

In the SPO mode, the original two pilots are reduced to one. In case of a flight conflict, the single pilot may not be able to respond promptly, posing a threat to flight safety. In order to avoid accidents, on the one hand, from the system architecture design perspective, we refer to the Class G Concept System proposed by NASA [1], which introduces a ground operator to support air-ground collaborative decision-making in nonnominal flight. On the other hand, from the viewpoint of enhancing airborne system capability, high level engaged system integration is required to support air-ground task collaboration for enhancing airborne system capability. For example, the integration has the potential of enhancing air and ground surveillance capabilities for full flight schedule and avoid conflicts.

To ensure the safety of a flight task, it is necessary to collaborate with multiple subtasks, that is, task collaboration. In the SPO mode, flight tasks are accomplished through the collaboration between one pilot on board and one ground operator at the ground station. Therefore, air-ground task collaboration is a key feature of SPO. However, while improving system capability, air-ground task collaboration also brings safety issues in terms of fault mixing. Therefore, safety analysis of air-ground task collaboration for flight conflict in SPO will be carried out in this paper.

Recently, research on SPO of commercial aircraft has focused on operating architecture design [2, 3], human-machine function allocation [4], single-pilot workload [5, 6], air-ground communication links [7], etc. Lachter et al. [8] and Schmid and Korn [9] summarize the existing conceptual design of SPO, which is mainly divided into alternative design [10, 11] and displacement design [12]. The current mainstream adoption is the advanced airborne automation system along with ground station assistance, which is proposed by NASA [1]. Faulhaber and Friedrich and Sprengart et al. [5, 13] conduct studies on the assessment of single-pilot workload. Foreign researchers have developed intelligent cognitive human-machine interfaces [4, 14, 15] based on artificial intelligence technology to sense and analyze pilot workload in real time, and conduct real-time management of workload to ensure flight safety [2, 16]. Min et al. [17] studied SPO human-machine task allocation by comparing and analyzing the current task allocation in two-crew operations. Carloni and Manica [7] studied the air-ground communication link requirements in SPO mode. Literature rarely focused on SPO safety. Yong et al. [18] studied task synthesis safety analysis in the SPO mode based on hazard pattern mining, but the algorithm is based on discrete data. Yue et al. [19] proposed a real-valued mining algorithm for SPO safety analysis; however, there is a lack of detailed description of the algorithm mechanism and no safety verification in conjunction with SPO scenarios, which will be compensated in this paper.

The avionics system in SPO mode has a high degree of integration and complexity, which makes traditional safety analysis methods unable to effectively identify the potential hazard elements. In view of the sharp rise in the number and complexity of system failures under SPO air-ground task collaboration, in this paper, a safety analysis method that combines model-based safety analysis (MBSA) with hazard pattern mining is proposed. Through mining a large number of simulation scenarios and historical test data in the SPO mode, the hidden hazard patterns and propagation mechanisms behind the model data could be uncovered and transformed into knowledge for safety analysis. Thus, comprehensive safety management of SPO air-ground collaborative architecture could be achieved, ultimately improving the overall safety of the aircraft.

The contributions of this paper mainly include the following aspects: (1)In response to the significant increase in the number and complexity of system failures under SPO air-ground task collaboration, this paper proposes a safety analysis method based on the combination of model-based analysis and hazard pattern mining, which can be used to guide safety analysis and system architecture design by mining and analyzing a large amount of model operating data(2)A real-valued differential bicluster mining algorithm—TFCluster algorithm is proposed for the analysis of the function-resource allocation under different tasks. Compared with the conference paper [19], this paper provides a more detailed description of the mechanism of TFCluster and tests its efficiency on both public and SPO experimental datasets(3)A safety verification framework is constructed in this paper, which combines model-based analysis with hazard pattern mining. Taking the midconflict handling process in the SPO mode as an example, the model-based safety verification of SPO air-ground task collaboration is carried out. The mining results obtained from the TFCluster algorithm can effectively identify potential hazard patterns, feed back to the system architecture design and assist safety analysis

The remaining sections of this paper are organized as follows. Section 2 firstly introduces conflict detection in SPO, then analyzes safety issues, and finally proposes the safety analysis method combining MBSA and hazard pattern mining. Section 3 describes our proposed TFCluster algorithm in detail. Section 4 compares the proposed algorithm with the existing algorithms in mining efficiency, and takes a typical operating scenario—midterm conflict resolution in SPO for air-ground task collaboration safety analysis. Conclusions are drawn in the final section.

2. SPO Safety Analysis Method

2.1. Conflict Detection in SPO

There is a loss of visual information due to the absence of the copilot observing the situation outside the aircraft in the SPO mode, so surveillance capabilities need to be further enhanced to avoid the consequent conflicts. The next-generation surveillance system in SPO involves airborne automation system, onboard single pilot, ground station and air traffic control (ATC), and achieves coordination through air-ground data link and ground network.

In order to avoid collision accidents, it is necessary to predict flight trajectory and calculate probability of conflict in advance. For different predicted time, conflict detection is divided into long-term conflict detection, midterm conflict detection and near-term conflict detection, as shown in Figure 1. Among them, near-term conflict refers to potential conflicts within the next five minutes, and is addressed through the traffic collision avoidance system (TCAS), which alerts the onboard pilot and provides guidance for resolving the conflict. TCAS also supports emergency manoeuvring, such as emergency collision avoidance. Long-term conflict detection aims at potential conflicts that may occur in the future, beyond several hours. Aircraft in the airspace report their status to ATC, and ATC identifies the potential conflict through analysis. Coordinated resolution of the long-term conflict will be the responsibility of ATC. If adjustments by this aircraft are required, ATC will negotiate with the ground operator in advance to avoid conflicts by adjusting flight routes or modifying the required time of arrival (RTA). Midterm conflict detection is aimed at potential conflicts in the next 10 to 30 minutes. The airborne surveillance system senses and identifies traffic threats and meteorological constraints, and then transmits them to the ground station through air-ground data link. The ground operator makes collaborative decisions with the onboard pilot and interacts with ATC to organize the maneuvering process and adjust the flight status, such as autonomous crossing, and altitude adjustments. Midterm conflict detection is accomplished through the air-ground collaboration among airborne automation system, onboard pilot, ground operator. Therefore, we choose midterm conflict for the follow-up study because it best reflects the characteristics of SPO air-ground coordination.

Midterm conflict does not require an immediate response from the pilot, but need to be handled by the pilot on board, the ground operator and ATC together. Several relevant flight tasks are included as follows. (1)Establish a collaborative surveillance among the pilot on board, the ground operator and ATC. The airborne data fusion and intelligent analysis system supports the onboard pilot to analyze the surveillance data and transmit the processed data to the ground station. The ground operator receives the surveillance data through air-ground data link, and receives airspace traffic/meteorological information from ATC through ground network. The data will be visualized through ground simulation software to facilitate the ground operator in making collaborative decisions. All the three parties simultaneously monitor the operating status of the aircraft, identify potential hazards and provide threat warnings(2)Determine collaborative handling measures among the onboard pilot, the ground operator and ATC. The onboard pilot senses and identifies the airspace traffic and route meteorology with the assistance of airborne automatic systems, ATC monitors the operating status of aircraft in the airspace and completes flight constraint management, and the ground operator is responsible for communicating and negotiating with ATC to finalize the conflict resolution, such as horizontal crossing and interval maintenance. The criteria and constraints for handling should be established, such as minimum safety separation and horizontal offset distance, to form a three-way collaborative handling method

2.2. Safety Issues in SPO Mode

In the absence of the complementary capabilities, interactive decision-making and status confirmation of another captain, safety issue is the first problem faced by SPO. For example, in midterm conflict detection, the lack of visual information from the copilot and the lack of face-to-face decision making may lead to safety hazards. To make up for the absence of another captain’s ability, a ground operator is introduced to provide flight decision assistance and flight emergency response. Besides, the more integrated and advanced airborne automation system is necessary to assist the onboard pilot. Hence, air-ground task collaboration can be achieved through task synthesis, function fusion, and resource integration.

Among them, resource integration is achieved through the process of resource organization, operation and management, which enables resource capability sharing, resource operation reuse, and resource status management, thereby improving device resource utilization, operating efficiency, and availability. With the improvement of hardware computing capabilities, more and more functions are implemented through embedded software. Based on modularized function design, resource integration enables the same hardware platform to load multiple function modules, such as data processing function and display executing function, ultimately achieving process-based resource reuse and goal-based resource sharing. Function fusion enhances system task execution through dynamic collaboration between different functions. For example, the surveillance data of Automatic Dependent Surveillance-Broadcast (ADS-B) and Traffic Collision Avoidance System (TCAS) can be fused and displayed comprehensively to achieve more accurate flight situation awareness [20] to support conflict detection. Task synthesis support the operation of nonnominal flight process and part of automated decision-making based on the current system capability status and the detected external environmental parameters, thereby reducing the workload of the single pilot.

Even though air-ground task collaboration brings the above benefits, it also increases system complexity, such as resource sharing, function cross-linking, and software and hardware interaction, making system faults spread and mix in the integration, fusion and synthesis, which has a great impact on system safety. In particular, multiple tasks may occur during midterm conflict detection in SPO, where air-ground task collaboration not only brings efficiency improvements, but also makes the resources lose their closeness. Resources are interconnected in terms of professional capabilities, resource operations, and organizational aspects, which results in safety issues where the system state is difficult to determine and fault composition is challenging to diagnose.

2.3. Safety Analysis Method Combining MBSA and Hazard Pattern Mining

For the abovementioned safety issues resulted from air-ground task collaboration in SPO, the traditional safety methods are flawed. Traditional safety analysis methods such as FTA and FMEA mainly rely on engineering experience, which are generally based on static logical reasoning, and are not carried out simultaneously with system design. In the SPO mode, as the complexity of the avionics system increases, it is difficult to list all the system failure modes and effects. In addition, due to the iteration of the system design, the failure mode may not match the system architecture. It is difficult to carry out the collaborative design of system safety and functional performance. To address the inconsistency between the system architecture design model and the safety analysis model, model-based safety analysis (MBSA) [21] is applied in this paper. The system design process is combined with the safety analysis process to ensure data consistency at the model level, and then the complex integrated system will be modeled through layered modeling [22].

To cope with the increasing number and complexity of system failure modes in the context of air-ground task collaboration in SPO, a new safety analysis method combining model-based safety analysis (MBSA) and hazard pattern mining is proposed, as shown in Figure 2. In accordance with the top-down system design process, various system models are built in different design stages, such as the flight scenario model in the requirements definition stage, the organization structure model in the system architecture design stage, the operating process model in the system operating analysis stage, and the function decomposition model and resource allocation model in the architecture decomposition stage. Then, model data will be extracted and the hazard pattern mining algorithms will be designed for analysis. Through the mining and analysis of a large number of model operating data, potential hazard patterns or more reasonable function-resource allocation scheme can be obtained to assist safety analysis and system architecture design.

The new safety analysis method combining MBSA and hazard pattern mining does not abandon the traditional safety analysis method, but serves as an auxiliary means to deal with the significant increase in the number and complexity of system failure modes in SPO task collaboration. The complex integrated system model is constructed through hierarchical modeling [22], which realizes the synchronization of the system design and the safety analysis. Different safety analysts could use the same research object, and adopt different analysis methods, such as FTA and FMEA, based on different analysis focuses. As an aid to traditional safety analysis, the hazard pattern mining method can make up for the lack of manual analysis and make safety analysis and verification more reliable.

3. TFCluster Algorithm

Based on the proposed safety analysis method, this section presents a hazard pattern mining algorithm called TFCluster, which can facilitate the analysis of function-resource organization during the execution of different tasks, thereby establishing a task-function-resource safety relationship and enabling the safety analysis of SPO air-ground task collaboration.

3.1. Problem Description

When dealing with the midterm conflict in SPO, there may be situations of multiple concurrent tasks. For example, while the onboard pilot collaborates with the ground operator and ATC to resolve conflicts, he also needs to monitor the airspace traffic and meteorological conditions to prevent new conflicts. Hence, this section will analyze whether there are potential hazards in the specific function-resource allocation scheme under the requirements of multiple concurrent tasks through hazard pattern mining.

The execution process of a task is the organizational and operational process of system functions, while the execution process of a function is the organizational and operational process of system resources. Therefore, the completion of the top-level tasks cannot be inseparable from the support of resources and functions. Here focuses on the air-ground task collaboration in SPO. The execution of each task can be abstracted as an matrix called function-resource matrix, where each row represents a resource, and each column represents a function. The values in the matrix indicate the extent to which a function uses a resource, and it can be represented as discrete data or real-valued data. In DFCluster [18], the values in the function-resource matrix are discretized as 1, -1, and 0, which reflects whether a function uses a resource, similar to a switch, and lacks a representation of the degree of use. However, in real systems, the extent to which a function uses a resource varies from case to case. Hence, this section targets the real-valued function-resource matrices for mining. The values in the real-valued function-resource matrix range from 0 to 1.0 means that the resource does not need to be called when a certain function is executed, 1 means that the resource must be called when a certain function is executed. For example, for a communication resource with a bandwidth of 20 kHz, function needs to occupy a bandwidth of 5 kHz to transmit data, so the occupation degree of the communication resource by the function is 0.25. Tables 1 and 2 show the function-resource matrices when tasks and are executing, respectively.

A differential bicluster mining algorithm named TFCluster, is proposed in this section. Maximum differential-used and low-used biclusters could be identified without candidate maintenance, and several pruning strategies are applied to improve mining efficiency. The mining process of TFCluster is shown in Figure 3. First, the original function-resource matrices are scanned to construct function-function weighted graph. Then, all maximum biclusters are identified using function expansion.

Through bicluster mining algorithm-TFCluster algorithm, differential biclusters could be identified from function-resource matrices of different tasks. The identified biclusters imply the relationship between multiple functions and resources, which can facilitate the analysis of function-resource organization during the execution of different tasks, thereby the task-function-resource safety relationship could be established, and SPO air-ground task collaboration could be performed. Specifically speaking, through analyzing the identified differential-used and low-used biclusters, it is possible to avoid task execution conflicts caused by simultaneous calls to the same resource under different functions. The mining results can be used to allocate resources reasonably. On the one hand, it can improve the efficiency of resources, on the other, it can avoid safety problems caused by air-ground task collaboration.

3.2. Preliminaries and Definitions

In order to enhance the differential bicluster algorithm for air-ground task collaboration safety analysis, three parameters are introduced to identify the differential-used and low-used biclusters from the real-valued function-resource matrices. They are: (1) parameter used to measure the relevance between functions. (2) parameter that restricts low usage of a resource. (3) parameter used to measure the differential usage of a resource.

Definition 1. Low-used function-resource bicluster is defined as a bicluster where all resources and functions meet the Equation (1) and Equation (2): where is the function-resource matrix of task , is the function-resource matrix of task , is a user-defined parameter used to measure the relevance between functions, is a user-defined parameter that restricts low usage of a resource, is any resource in the function-resource matrices and , is the set of functions in and , represents the maximum value of the resource under a set of functions, and represents the minimum value. Then, the safety index of low-used function-resource bicluster can be expressed as follows:

Definition 2. Differential-used function-resource bicluster is defined as a bicluster where there is at least one resource that meets the Equations (4)–(7) under certain two functions, and the resource meets the constraints in Definition 1 under other functions. where is the function-resource matrix of task , is the function-resource matrix of task , is a user-defined parameter used to measure the relevance between functions, is a user-defined parameter used to measure the differential usage of a resource, is a user-defined parameter that restricts low usage of a resource, is any resource in the function-resource matrices and , is the set of functions in and , represents the maximum value of the resource under a set of functions, and represents the second maximum value. Then, the safety index of differential-used function-resource bicluster can be expressed as follows:

Definition 3. In order to facilitate description of bicluster with differential usage rate and low usage rate, suppose the use values of resource under the functions and of task are and , and the use values of resource under the functions and of task are and . There are four representations for under and : (1) If , , , meet Definition 2. and max, then the contribution rate of to and meets differential-used requirement, expressed as “”. (2) If meet Definition 2 and max, then the contribution rate of to and meets differential-used requirement, expressed as “”. (3) If meet Definition 1, then the contribution rate of to and meets low-used requirement, expressed as “”. (4) If does not meet Definition 1 or Definition 2, then no record is given.

Therefore, each resource in the biclusters identified by TFCluster algorithm satisfies the first three cases in Definition 3 for all functions. In order to improve the mining efficiency, the mining process takes the form of function expansion without candidate maintenance.

3.3. Mining Process

The mining procedure of the TFCluster algorithm consists of two main steps. Firstly, the original function-resource matrices are scanned, and a weighted function-function relational graph is constructed based on Definition 3. Secondly, maximum differential-used and low-used function-resource biclusters are identified through function expansion. Moreover, some pruning strategies are designed to improve the mining efficiency.

3.3.1. Weighted Function-Function Relational Graph

Definition 4. Weighted function-function relational graph is defined by the set . Each node in the vertex set in the graph represents a function. If there is an edge between a pair of vertices, it means that the resource set under above vertices must satisfy Definition 3, and the set of edges is denoted as . The weights of each edge are the resource set satisfying Definition 3 under the two functions connected to this edge, and the set of the weights is denoted as .

Different from DFCluster algorithm [18], the weight set not only records the expression and symbol of the resource, but also records the maximum value () and the second maximum value () of each resource under these two functions. Taking the data in Tables 1 and 2 as an example, the weighted function-function relational graph as shown in Figure 4 can be constructed.

Through storing the original function-resource matrices in the weighted function-function relational graph, repeated access to the original data can be avoided, thereby improving the efficiency of the algorithm.

3.3.2. Maximum Biclusters Mining

Based on the above definitions, the extended bicluster satisfies anti-monotonicity. If the bicluster extended by does not satisfy the constraints, then its arbitrary superset also does not satisfy. Hence, the larger-scale bicluster can be obtained through intersecting the weights on each edge in the weighted function-function relational graph. Since the weighted graph of TFCluster also records the maximum value and the second maximum value of each resource, then the bicluster expansion will be different from that in the DFCluster algorithm [18]. In DFCluster, if is extended from , the weights of edges , and must be intersected to obtain the resource set under . But in the TFCluster algorithm, the resource set under can be obtained only by calculating the intersection of the weights under and , as well as satisfying the following constraints.

Assuming that is the current extended differential-used bicluster, is the maximum value of resource under function to be extended, is the minimum value. Based on Definition 3, the intersection of the resources under the current extended function and the candidate function can only be obtained in the following four cases: (1) The resource under the current extended function is “”, and the resource under the candidate function is also “”. (2) The resource under the current extended function is “”, and the resource under the candidate function is “”. (3) The resource under the current extended function is “”, and the resource under the candidate function is “”. (4) The resource under the current extended function is “”, and the resource under the candidate function is also “”.

Theorem 5. Assume that the maximum value of the resource under the function to be extended is and the minimum value is . The maximum value of resource under candidate function is , and the minimum value is . The form of the resource under function to be extended is “”, and the form of the resource under candidate function is also “”. Then, the following conditions must be met to make the resource exist in the intersection of function and function : It can be seen from the resource forms under current extended function and candidate function that the resource with a utilization rate greater than must be under . The resource utilization rate under function and function are both lower than , then the extended must be a differential-used bicluster, whose difference has been satisfied by the differential-used biclusters and . The resource will exist in the intersection of function and function as long as the minimum values of the resource under the and satisfy the correlation.

Proof of Theorem 1. Proof: if then then

Theorem 6. Assume that the maximum value of resource under function to be extended is and the minimum value is . The maximum value of resource under the candidate function is , and the minimum value is . The form of the resource under function to be extended is “”, and the form of the resource under candidate function is “”. Then, the following conditions must be met to make the resource exist in the intersection of function and function : It can be seen from the resource forms under current extended function and candidate function that the resource whose utilization rate is greater than must be under , and the utilization rates of the resource under and are both lower than . Then, the extended must be a differential-used bicluster. The resource will exist in the intersection of function and function , as long as the maximum value under , the minimum value under , and the maximum value under satisfy the difference, and the minimum values under and satisfy the correlation.

Proof of Theorem 2. Proof: if then then

Theorem 7. Assume that the maximum value of resource under function to be extended is and the minimum value is . The maximum value of the resource under candidate function is , and the minimum value is . The form of the resource under the function to be extended is “”, and the form of the resource under candidate function is “”. Then, the following conditions must be met to make the resource exist in the intersection of function and function : It can be seen from the resource forms under current extended function and candidate function that the resource whose utilization rate is greater than must be under , and the utilization rates of the resource under and are both lower than . Then, the extended must be a differential-used bicluster. The resource will exist in the intersection of function and function , as long as the maximum value under , the minimum value under , and the maximum value under satisfy the difference, and the minimum values under and satisfy the correlation.

Proof of Theorem 3. Proof: if then then Even if the resource forms under current extended function and candidate function are both “”, “” or “” may still appear under the intermediate functions, which is because the definition of the resource symbol only depends on the functions of the beginning and the end. Different cases can be distinguished by comparing the relationship between the maximum value of the resource and the parameter .

Theorem 8. Assume that the maximum value of resource under function to be extended is and the minimum value is . The maximum value of resource under the candidate function is , and the minimum value is . The form of the resource under function to be extended is “”, and the form of the resource under candidate function is also “”. Then, the following conditions must be met to make the resource exist in the intersection of function and function : Based on the resource forms under current extended function and candidate function , and the minimum value under PC is greater than , then the extended must be a differential-used bicluster. The resource will exist in the intersection of function and function , as long as the maximum value under and the larger between the minimum value under and the maximum value under satisfy the difference, and the larger between the maximum value under and the minimum value under , and the smaller between the minimum values under and satisfy the correlation.

Proof of Theorem 4. Proof: if then then

Theorem 9. Assume that the maximum value of resource under function to be extended is and the minimum value is . The maximum value of resource under candidate function is , and the minimum value is . The form of the resource under function to be extended is “”, and the form of the resource under candidate function is also “”. Then, the following conditions must be met to make the resource exist in the intersection of function and function : Based on the resource forms under current extended function and candidate function , and the minimum values under and are both lower than , then the extended must be a low-used bicluster. The resource will exist in the intersection of functionand function, as long as the larger between the maximum values under and , and the smaller between the minimum values under and satisfy the correlation.

Proof of Theorem 5. Proof: if then then

3.4. Pruning Strategy

Predecessor detection [23] is applied in the TFCluster algorithm to identify maximum biclusters without candidate maintenance. If the resource set under the candidate function has an inclusive relationship with that under the prior function, then the biclusters extended by the candidate function have been extended by the prior function and are a subset thereof. Hence, the candidate function can be pruned to reduce the search space and avoid unpromising processes. When describing the representation of the resources, the resource is represented as “”, “” and “”, respectively, which is to facilitate the design of the pruning strategy.

Pruning 1. Assume that is currently extended bicluster, is the candidate function set of , and is the prior function set of . For any resource under the candidate function , if its representation is “”, and there is a prior function and a resource “” under , then resource under function can be extended by the prior function , thus can be pruned.

Since under function is represented as “”, the resource whose utilization rate is higher than must exist under a certain function in , and the utilization rate of under function and function are both lower than . Hence, the current extended bicluster is a differential-used bicluster. Since there can only be one resource usage rate higher than under all functions in differential-used bicluster, then the bicluster obtained by can also be obtained by .

Proof of Pruning 1. Proof: if then then then

the bicluster obtained by can also be obtained by .

Hence, can be pruned.

Pruning 2. Assume that is currently extended bicluster, is the candidate function set of , and is the prior function set of . For any resource under the candidate function , if its representation is “”, and there is a prior function and a resource “” under , then resource under function can be extended by the prior function , thus can be pruned.

Since under function is represented as “”, the resource whose utilization rate is higher than must exist under the candidate function , and the utilization rate of under all functions in is lower than . The resource under prior function is recorded as “”, so the resource usage rate of under is lower than . Hence, the current extended bicluster is a differential-used bicluster. Since there can only be one resource whose usage rate is higher than under all functions in the differential-used bicluster, then the bicluster obtained by can also be obtained by .

Proof of Pruning 2. Proof: if then then then

the bicluster obtained by can also be obtained by .

Hence, can be pruned.

Pruning 3. Assume that is currently extended bicluster, is the candidate function set of , and is the prior function set of . For any resource under the candidate function , if its representation is “”, and there is a prior function and a resource “” under , then resource under function can be extended by the prior function , thus can be pruned.

Since under function is represented as “”, then the utilization rate of resource under candidate function is lower than , and the utilization rate of resource under all functions in may be lower than or higher than under a certain function. The resource under the prior function is also recorded as “”, so the resource usage rate of under is lower than . Hence, the differential-used bicluster or low-used bicluster can be extended by .

Proof of Pruning 3. Proof: if then if then then if then then

the bicluster obtained by can also be obtained by .

Hence, can be pruned.

The pseudocode of the TFCluster is shown in the following Algorithm 1.

Input: Two real-valued function-resource datasets: and , the minimum number of resources in bicluster: , the minimum number of functions in bicluster: , low-used parameter , differential-used parameter , weighted function-function relational graph: , the current extending differential function-resource bicluster: .
Output: the maximal differential-used and low-used function-resource bicluster set;
Initialization:; ∅;
Method:TFCluster()
ifthen
  construct ;
end if
 scan L and find all the candidate set of ;
for each candidate in
  if the differential function-resource bicluster does not satisfy Pruning 1. and Pruning 2. and Pruning 3., and the number of resources in is greater than , then
   ;
   ;
   ;
  TFCluster()
  else if satisfies Pruning 1. or Pruning 2. or Pruning 3. then
    delete
   end if
  end for
if is greater than any candidate bicluster of and the number of functions in is greater than then
  output();
end if
Return

The TFCluster algorithm is used in Tables 1 and 2, assuming that the correlation parameter is set to 10, the low usage parameter is set to 0.3, differential usage parameter is set to 10, the minimum thresholds for the number of functions and the number of resources are set to 3 and 1, respectively. The mining process is shown in Figure 5.

4. Experiment and Analysis

4.1. Efficiency Comparison

To evaluate the performance of TFCluster, two bicluster mining algorithms, SDC [24] and DRCluster [25] are compared with TFCluster. All algorithms were implemented in C language and tested on a laptop with Intel(R) Core(TM) CPU and 8G memory. Differential bicluster mining algorithms have been widely used in the field of bioinformatics for detecting disease-causing or growth-related genes. In this study, we selected AGEMAP [26], a gene database, for experimentation to test the efficiency and scalability of TFCluster. AGEMAP records gene expression data in mice as they age. The dataset includes samples from five groups of mice aged 1, 6, 16, and 24 months, respectively. Based on the quality of the original dataset, we chose the data from mice aged 6 and 16 months as the input for subsequent experiments.

SDC algorithm: The algorithm identifies all the differential coexpression biclusters from two real-valued datasets based on the idea of the Apriori algorithm. In the original SDC algorithm, only the differential coexpression relative constant column biclusters can be identified. In order to identify the differential coexpression relative constant row biclusters, original SDC algorithm is improved as follows: first, a set of association samples for each item in the two datasets are generated based on the association threshold . Then, all differential coexpression relative constant row biclusters that satisfy the differential support are identified using item expansion based on the Apriori principle. Finally, the results are judged to output only the maximum biclusters.

DRCluster algorithm: The algorithm identifies the maximum differential coexpression biclusters from two real-valued datasets. The mining process of DRCluster is divided into two main steps: first, construct sample weighted graph of differential expressions. Then, identify the maximum differential expression biclusters from the constructed weighted graph using sample expansion.

Multiple datasets of different sizes are generated based on AGEMAP. The parameters of each algorithm are set in such a way that SDC and DRCluster could obtain the highest mining efficiency. (1) In SDC, the differential support is set to 1, and both the minimum thresholds for the number of items and the number of samples is set to 3. (2) In DRCluster, both the minimum threshold for the number of items and the number of samples are set to 3. (3) In TFCluster, the correlation parameter is set to 10, the low usage parameter is set to 0.3, the differential usage parameter is set to 10, and the minimum thresholds for the number of functions and the number of resources are set to 2 and 1, respectively.

4.1.1. Resource Scalability

All algorithms are benchmarked in terms of runtime by varying the number of resources, where the number of functions is fixed at 10, and the number of resources is set to 100, 200, 300, 400, 500,…,3000. Figure 6 shows the runtime (in seconds) to identify the biclusters for each algorithm. It can be seen that TFCluster is a clear winner for all datasets, and its advantage is more obvious when the number of resources is larger. At 3000 resources, TFCluster can still complete mining within 10s (6.623 s), it is more than 10 times faster than DRCluster (68.561 s). The SDC algorithm suffers from memory explosion when the number of resources is greater than 1500, while this is not the case with TFCluster as it does not store candidate sets.

4.1.2. Pruning Strategy Analysis

TFCluster is compared with TFCluster with pruning 1, TFCluster with pruning 2, TFCluster with pruning 3, TFCluster without pruning in terms of runtime at different sizes of datasets. It can be seen from Figure 6 that pruning strategies have a significant impact on the execution time. There is a lot of repetitive mining in TFCluster without pruning, which results in low mining efficiency. TFCluster with pruning 3 outperforms all algorithms except TFCluster, this is because pruning 3 greatly simplifies the process of mining low-used biclusters as described in Section 3.4. When the number of function is up to 30, and the number of resource is up to 600, TFCluster without pruning (89.875 s) is about twice as slow as TFCluster (48.236 s), and the advantage of TFCluster becomes more obvious as the dataset increases, which shows that the designed pruning strategy greatly improves the mining efficiency of the algorithm.

4.1.3. Parameter Setting Analysis

The TFCluster algorithm introduces three parameters: (1) parameter used to measure the relevance between functions, (2) parameter that restricts low usage of a resource, and (3) parameter used to measure the differential usage of a resource. Among them, parameter is used to constrain the identified biclusters to be constant row. As the data in database AGEMAP all satisfy the constant row constraint, parameter analysis experiments here focus on the low usage parameter and the differential usage parameter .

TFCluster is benchmarked in terms of runtime and result number by varying the low usage parameter and the differential usage parameter . The number of functions is fixed at 10, the number of resources is fixed at 300, the differential usage parameter is fixed at 3, and the low usage parameter is set to 0.1, 0.15, 0.2, 0.25,…,0.35. From Figure 7, it can be seen that as the low usage parameter increases, the smaller the constraint on the mining results, the more biclusters will be identified, but the running time of the algorithm is largely unaffected by , remaining at around 0.5 s.

The number of functions is fixed at 10, the number of resources is fixed at 300, the low usage parameter is fixed at 0.3, and the differential usage parameter is set to 3, 4, 5,…,8. From Figure 7, it can be seen that as the differential usage parameter increases, the greater the constraint on the mining results, the less biclusters will be identified, but the running time of the algorithm is largely unaffected by , remaining at around 0.5 s.

Overall, the experimental results show that TFCluster outperforms all the algorithms tested. The resource scalability experiments show that TFCluster has excellent scalability even with large datasets. The pruning experiments show that pruning strategies can improve mining efficiency, particularly in large databases. The parameter experiments show that the low-used parameter and the differential-used parameter have little impact on mining efficiency, but have a significant impact on mining results.

4.2. Model-Based Analysis

Here takes the midterm conflict in the SPO mode as an example to carry out safety analysis. The model-based simulation framework is shown in Figure 8. Flight simulation is realized by the flight scenario simulation software Prepar3D, and SPO system architecture is modeled by Magic System of System, including scenario operating model, task organization model, function-resource allocation model, etc. Prepar3D passes flight state information (e.g., position, velocity, and attitude) to Magic System of System to trigger the execution of the internal system model, and Magic System of System feeds back key variables or flight instructions (e.g., velocity variable and ignition instruction) to drive the flight simulation. Then, the real-valued function-resource matrices of different tasks are extracted from Magic System of System model. Next, potential hazard patterns could be identified using the proposed TFCluster algorithm, and based on this, safety analysis could be carried out. Finally, the SPO system model could be jointly verified and iteratively designed based on the mining results.

In the future SPO system, virtual pilot assistance (VPA) system [14] is introduced to reduce the workload of the pilot onboard, enhance aircraft surveillance capability, and promote air-ground collaboration and information sharing, as shown in Figure 9. The communication system ensures information sharing between aircraft and ground station, ATC, and airlines to support air-ground collaborative decision-making. Airborne Surveillance and Separation Assurance Processing (ASSAP) subsystem is applied by the surveillance system, which is integrated into the flight management system (FMS) to provide automated SA&CA capabilities. FMS obtains flight environment information through cooperative or noncooperative sensors, and receives real-time traffic and weather information from ATC and airlines through data link to provide guidance, navigation and control, and supports 4DT route planning, optimization, negotiation and verification with the pilot through human-machine interface.

Then, taking the midterm conflict resolution during land-based cruise in SPO as an example, this paper conducts a safety analysis of SPO based on the combination of MBSA and hazard pattern mining. The specific flight scenario is designed as follows: during the flight, commercial single-piloted aircraft (CSPA) receives airspace traffic information through ADS-B and Traffic Information Services-Broadcast (TIS-B), and weather information through weather radar. The flight management system (FMS) analyzes the surveillance information and finds that there will be a cross-route conflict with the B777 after 20 minutes, that is, their minimum separation does not meet the safety isolation requirements. FMS automatically warns and provides several route modifications to resolve the potential conflict. The onboard pilot and the ground operator decide to give way to the B777 after air-ground collaborative decision. Since the two aircraft have self-isolation capabilities, the ground operator of CSPA requests a shortened flight interval from ATC. ATC calculates the flight interval requirements based on the conflict location, meteorological conditions, navigation accuracy and airspace traffic, and sends the flight interval constraints to CSPA and B777. The medium-term conflict resolution of CSPA and B777 is carried out autonomously by themselves, and ATC only monitors the entire process. The above flight scenario was modeled and visualized by Prepar3D, as is shown in Figure 10, and the activity diagram of the above flight scenario is shown in Figure 11.

Midterm conflict does not require an immediate response from the pilot, but needs to be handled by the pilot on board, the ground operator and the ATC together. The system operating process model, organization structure model, function decomposition model and resource allocation model for the above scenario are constructed separately, as shown in Figure 12.

Taking the two flight tasks of collaborative surveillance () and collaborative handling () mentioned in Section 2.1 as an example, their respective real-valued function-resource matrices can be extracted, where denote airspace visual awareness, traffic situational awareness, flight conflict situational awareness, collision avoidance and air-ground collaborative interaction, respectively. denote computer resources: ADS-B decoding, TIS-B decoding, weather radar decoding, and hazard identification. denote execution resources: CDTI display, hazard alarm, TCAS enquiry. denote communication resources: air-ground data link and ground network. denote awareness resources: meteorological awareness and traffic awareness. The specific data is obtained through human assessment, which is not the focus of this paper.

The TFCluster algorithm is used to mine the two matrices extracted above. The correlation parameter is set to 10, the low usage parameter is set to 0.3, the differential usage parameter is set to 10. There are 14 items of mining results, as shown in Table 3, and the mining time is 0.669 s.

Then, the original data in the two matrices are replicated to expand the dataset. Then, algorithms are benchmarked in terms of runtime by varying the number of resources, where the number of functions is set to 5, 10, 15, 15, 20, 25, and 30, and the number of resources is set to 11, 220, 550, 1100, 2200, and 3300. Figure 13 depicts the comparison result. When the number of resources reaches 1100 and the number of functions reaches 20, the SDC algorithm cannot output results. It can be seen that the TFCluster algorithm has high algorithm efficiency and resource scalability.

From the perspective of hazard pattern analysis, the mining results indicate that the concurrent execution of collaborative monitoring and collaborative handling under SPO can lead to a shortage of air-ground communication resources. To address this issue, we recommend the implementation of dedicated links, such as 5G AeroMACS 2.0, 5G LDACS 2.0, 5G ATG, and 5G public networks [27], as a means of mitigating the pressure on air-ground communication. From the perspective of improving resource efficiency and effectiveness, ADS-B and TCAS can be considered for integrated surveillance to enhance surveillance capabilities in the face of midterm conflict. ADS-B system sends the current traffic situation and predicted trajectory of the aircraft in the airspace to cockpit display of traffic information (CDTI) in real time to establish flight environment situational awareness. ADS-B system determines the distance-based or time-based threat ranking and establishes the conflict avoidance situational awareness, which will be sent to TCAS. TCAS provides the conflict avoidance process organization based on conflict avoidance situational awareness, that is, TCAS flight conflict situational awareness is based on ADS-B traffic situational awareness. The newly constructed surveillance process model is shown in Figure 14.

Then, the newly constructed surveillance process model is mined using the TFCluster algorithm and the results are shown in Table 4, where the safety index enhancements have been bolded. Taking the biclusters in Table 3 and in Table 4 as an example, the safety index of resource TCAS enquiry is improved when function airspace visual awareness and function air-ground collaborative interaction are executed simultaneously, and resource hazard identification is changed from being unsafe to having a safety index of 1.5. It can be seen that after integrated surveillance by ADS-B and TCAS, the safety index of some resources are improved, such as the computer resource hazard identification and the execution resource TCAS enquiry, and finally the enhancement of the surveillance capabilities in case of midterm conflicts is achieved.

5. Conclusions

Due to the higher degree of air-ground task collaboration and complexity in the SPO mode, the traditional safety analysis methods applied in two-pilot mode cannot effectively identify the potential hazard patterns in the system. Hence, air-ground task collaboration safety analysis based on real-valued differential bicluster mining is carried out in this paper. (1)A new safety analysis method combining model-based safety analysis (MBSA) and hazard pattern mining is proposed, which combines the system design process with the safety analysis process, ensuring data consistency at model level(2)A real-valued differential bicluster mining algorithm—TFCluster algorithm is proposed to analyze whether hazards will occur in specific function-resource allocation scheme under multiple concurrent tasks. Taking the midconflict handling process in SPO as an example, and the model-based safety verification of SPO air-ground task collaboration is carried out. The newly constructed surveillance process model based on the mining results finally enhances surveillance capabilities in the face of midterm conflict(3)In addition, further investigation of more accurate model construction and weighted sequence pattern mining is needed. First, multidimensional matrices can be introduced to study the more complex redundancy design, monitoring management and fault tolerant strategy in SPO. Secondly, in real-world applications, the invocation of functions and resources is based on time intervals, hence adopting weighted sequence pattern mining would be more meaningful for analyzing SPO ground-air task collaboration

Data Availability

Data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

Portions of this work were presented at the Digital Avionics Systems Conference (DASC) in 2021, TFCluster: An Efficient Algorithm to Mine Maximal Differential Function-resource Biclusters for Single-Pilot Operations Safety Analysis.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This research was funded by the Natural Science Foundation of Shanghai (20ZR1427800) and the New Young Teachers Launch Program of Shanghai Jiao Tong University (20X100040036).