Several unexpected behaviors may occur during actual treatment of clinical pathways, which will have negative impact on the implementation and the future work. To increase the performance of current deviation detection algorithms, a method is presented according to business alignment, which can effectively detect the anomaly in the implementation of the clinical pathways, provide judgment basis for the intervention in the process of the clinical pathway implementation, and play a crucial role in improving the clinical pathways. Firstly, the noise in diagnosis and treatment logs of clinical pathways will be removed. Then, the synchronous composition model is constructed to embody the deviations between the actual process and the theoretical model. Finally, algorithm is selected to search for optimal alignment. A clinical pathway for ST-Elevation Myocardial Infarction (STEMI) under COVID-19 is used as a case study, and the superiority and effectiveness of this method in deviation detection are illustrated in the result of experiments.

1. Introduction

The data released by the seventh census show that China has gradually entered an aging society. Coupled with the impact of the coronavirus epidemic in recent years, the medical system is facing increasing pressure. Hence, the use of clinical pathway that can regulate medical behavior, improve medical quality, and control medical costs has become an inevitable choice for medical reform in China [1, 2]. With strong national support, the clinical pathway has entered the stage of large-scale promotion [3].

The essence of clinical pathway is to adopt standardized procedures for the diagnosis and treatment of a certain disease. However, in the actual implementation process of clinical pathway, there may be many unstable factors; for instance, patients do not understand the implementation standard of clinical pathway, medical staff are not positive about the process, etc. Factors mentioned above have greatly affected the promotion of clinical pathway. In addition, the imperfection of clinical pathway, including patient condition mutation, may lead to unexpected behaviors in the process of clinical pathway diagnosis and treatment. Such behaviors are all called deviations. Under various pressures, it is urgent to optimize and improve the clinical pathway. Process mining aims to mine and optimize the business process through effective data in event logs [4, 5], and the process mining methods can be used to detect the deviated behaviors, which is beneficial to predict the trend of the patient’s diagnosis and treatment process, find the defects in the clinical pathway, and provide judgment basis for the intervention behavior in the diagnosis and treatment process, so as to improve the clinical pathway [68].

With the wide application of medical information systems, a large number of electronic logs are generated in the process of diagnosis and treatment, and electronic medical records are also widely used in hospitals. The method of mining deviation behaviors has also changed from a forward-looking method requiring manual recording to a retrospective method automatically processed by computers. This paper establishes the process model of clinical pathway and uses business alignment to detect the deviation between the business process and the execution records.

Lots of formal methods of process model description have been presented up to now, e.g., BPMN [9], C-net [10], EPC [11] and Petri net [1215]. Petri net is the most mature and in-depth process modeling language, which can concisely and intuitively describe and analyze complex systems [1627]. Hence, Petri net is selected as the process model description method.

Business alignment is one of the most advanced compliance checking methods at present. The use of business alignment can effectively detect the deviation of clinical pathway [2830]. Compliance checking of complex models has always been a challenging research topic [31]. Adriansyah et al. [32] presented a technique which combines the process model and log model into a product model to get optimal alignment. Cook et al. [33] presented a means to obtain alignment results by comparing trace and process model with quantifying similarity. Song et al. [34] presented a heuristic trace replay method to align recorded traces and theoretical models, which reduces the search space.

By studying the existing methods, there is still something that can be improved in the efficiency of business alignment. Hence, a business alignment method is presented according to synchronous composition model of Petri nets, which effectively reduces the complexity of the model to improve alignment efficiency. A sequence of events recorded in the medical log of a case is called a trace. There may be some invalid traces in the log, which are called noise. Before the compliance check, it is necessary to filter the diagnosis and treatment logs by preprocessing to remove noise. Deviation detection is to find the optimal alignment based on the filtered treatment logs and locate possible problems in the traces. Since the efficiency of the optimal alignment computation is particularly critical, this paper uses algorithm.

The remainder of this paper includes the following. The background knowledge is introduced in Section 2, including the concept of trace, Petri net, reachable graph, and alignment. A noise filter algorithm and an optimal alignment computation algorithm are presented in Section 3. The comparison of the experiments between our method and the classical one is shown in Section 4. Finally, Section 5 draws the conclusion and the ideas of follow-up research.

2. Basic Knowledge

A tremendous amount of data is stored in hospital information systems. We can extract a lot of medical events from them. A series of events in one case is organized as a trace, which is the basic element of logs.

Definition 1. (trace). Given a set A that includes activities, trace is a process instance, namely, a sequence of activities, denoted as σ ∈ . If there is a trace of non-empty multiple set L ∈ β (), L is called a diagnosis and treatment log, and the set includes each finite sequence on set A, denoted as ; the set of all multiple sets on set is denoted as β ().

Definition 2. (Petri net). Given a set A that includes activities, Petri net is a tuple on set A, denoted as N = (P, T; F, α, mi, mf). Set P contains all places, and set T contains all transitions, where T ∪ P ≠ ∅, T ∩ P = ∅; set F ⊆ (T × P) ∪ (P × T) represents the directed arc set of the relationship between the place and the transition, which is called the flow relationship or arc relationship; α:T ⟶ Aτ is the mapping function of transition and label, and τ is the invisible transition, that is, Aτ = A {τ}; the state of Petri net is named as marking, which is a multiple set of the place set. mi ∈ β (P) stands for the initial state, and mf ∈ β (P) stands for the final state.
A simple Petri net example N1 for clinical pathway of ischemic stroke is shown in Figure 1, and Table 1 shows the relationship of transitions, labels, and activities in model N1.
For ∀m ∈ β (P) and ∀t ∈ T, t can be fired under marking m iff m (p) ≥ ∀pPF (p, t) holds, represented as m [t >; at this time, after the transition t occurs, the new state of the system denotes as m′, and ∀p ∈ Pm′ (p) = m (p) − F (p, t) + F (t, p), represented as m [t > m′. The set R (m) contains all the markings which can be enabled under m.
The definition of Petri net’s reachable graph can be constructed by the transition firing rule in Definition 2.

Definition 3. (reachable graph). Given a Petri net N, reachable graph of N is represented as TS = (S, A′, T′), where S = R (mi), A′ = A, T′ = {(mi, α (t), mj) ∈ (S × A × S)| ∃tT (mi[t > mj])}.
Figure 2 shows the reachable graph TS0 of N1. Every node in the graph stands for a reachable marking of N1, every arc means a transition that is enabled under its previous marking, and arrow points to the new marking after transition occurs.
When the trace in the diagnosis and treatment log is replayed on the model, the two may not be completely fitted. This unfit state is called deviation which can be detected by alignment.

Definition 4. (alignment) Given a set A including activities, a Petri net N, and a trace σ, the alignment between σ and A, namely, γ ∈ (A≫ × T≫) is a moving sequence (where ≫ denotes no movement and A≫ = A ∪ {≫}) which has the following rules:(1)The projection of the first column elements in the ordered set of movements on A (ignoring ≫) is recorded as .(2)The projection of the second column elements in the ordered set of movements on T (ignoring ≫) produces a complete firing sequence, denoted as .For every (a, t) ∈ γ, there are four possibilities:(1)(a, t) is named as log movement in case of (a ∈ A) ∧ (t = ≫).(2)(a, t) is named as model movement in case of (a = ≫) ∧ (t ∈ T).(3)(a, t) is named as synchronous movement in case of (a ∈ A) ∧ (t ∈ T).(4)(a, t) is named as illegal movement.Activity is an element of traces, and transition is an element of model. According to Definition 4, alignment is a sequence of movements, and movement reflects the correlation between the activities and transitions. For log movement, it means that an activity cannot be executed in the model; for model movement, it means that a transition is not observed in the trace; for synchronous movement, it means that the activity can correspond to the transition; for illegal movement, it will not occur in actual business process, so it will be ignored in this paper. Among them, (1) and (2) are the unfitness between trace and process model, which represent the deviations in alignment.
For a given trace and a process model, multiple alignment results may be computed. To get the best alignment result, a cost function needs to be introduced to compute the cost of each movement. Among all the alignment results, the alignment with the minimum total cost value is the optimal alignment.
For every (a, t) ∈ γ, the cost function is as follows:(1)The value of lc ((a, t)) is 1, when (a, t) is log movement.(2)The value of lc ((a, t)) is also 1, when (a, t) is model movement.(3)The value of lc ((a, t)) is 0, when (a, t) is synchronous movement.(4)The value of lc ((a, t)) is +∞, when (a, t) is illegal movement.

3. Alignment Methods

3.1. Preparation for Alignment Computation
3.1.1. Removal of Noise

Before the implementation of synchronous composition, it is necessary to filter the traces in the diagnosis and treatment log, remove the noise, and retain effective traces.

There are generally two possibilities for traces with sequential deviation in event logs: ① the trace itself is noise and ② the activity which should occur in sequential relationship or selection relationship occurred in parallel relationship incorrectly. In the clinical pathway, it must be strictly in accordance with the implementation standards of clinical pathway, and the activities are mostly sequential or selective. Hence, traces with inverted sequence activities are considered as noise in this paper.

In Petri net, when a transition can occur before other transitions, there is a precedence relationship between this transition and its subsequent transitions, which is denoted by the symbol “⟶.” When a transition cannot occur before others, there is no precedence relationship between this transition and others, which is denoted by the symbol “#.” For the possible sequential deviation traces in the diagnosis and treatment log, this paper presents the concept of precedence relationship matrix as follows.

Definition 5. (precedence relationship matrix). Given a set A including activities and a Petri net N, the precedence relationship matrix is a matrix of |T| × |T|, whose row and column labels are the activities, denoted as M:{α (tx)|tx ∈ T} × {α (ty)|ty ∈ T} ⟶ {“#,” “⟶”}. The matrix contains the following elements:(1)For any α (tx) ∈ S and α (ty) ∈ S, if , , then M [α (tx)][α (ty)] = “⟶,” denoted as α (tx) ⟶ α (ty), where 1 ≤ x, y ≤ n.(2)For any α (tx) ∈ S and α (ty) ∈ S, if , , then M [α (tx)][α (ty)] = “#,” denoted as α (tx) # α (ty), where 1 ≤ x, y ≤ n.Take N1 as an example, and its precedence relationship matrix Mpr is shown in formula (1), where ax = α (tx) and 1 ≤ x ≤ 5.To determine whether a trace is noise, it is necessary to check all activities in the trace one by one and compare the order between the current activity and its subsequent activities by the precedence relationship matrix. If not fitting, the comparison process is terminated, and the trace is regarded as noise, which should be removed from the log.
To store information of parallel activities, it is necessary to set a parallel activity set S as “current activity set” to compare with their subsequent activities, and the first activity in the trace is put into S before starting the traversal. Meanwhile, several variables are set as follows: let the current activity in traversal be activity cur and the subsequent activity in trace be activity post. The rules are as follows:(1)If there is one or more activities that precedence relationship matrix allow to occur earlier than activity post in the set S and otherwise do not hold, remove these activities from the set S and place activity post in the set S.(2)If all activities in the set constitute the parallel relationship with activity post, then put activity post in the set S without removing any other element.(3)If there is an inverted sequence deviation between any activity and activity post in set S, this trace is regarded as noise and should be abandoned directly.Algorithm 1 shows the pseudocode.
To filter the log, it is necessary to traverse the diagnosis and treatment log L and then gradually check the activities in the sequence σ, so two nested loops are required. O (n2) is its time complexity. The value of n is affected by the size of L and the length of the sequence σ. The space complexity is O (n).

Input: diagnosis and treatment log L, precedence relationship matrix M of N1;
Output: filtered log L′.
(1)S = Ø;
(2)L′ = L;
(3)for all σ ∈ Ldo
(4)  S = S ∪ {σ (1)};
(5)  for all cur ∈ σ do
(6)   //terminate if current activity is the last one
(7)   if σ.indexOf (cur) == σ.size − 1 then
(8)    break;
(9)   end
(10)   post = σ (σ.indexOf (cur) + 1);
(11)   if (∃a ∈ S)  ⇒  (M [α−1 (a)][α−1(post)] == “#”) then
(12)    L′ = L′ – {σ};
(13)    //activity a can occur earlier than activity post, but the reverse is not true
(14)   else if (∃a ∈ S)  ⇒  (M [α−1 (a)][α−1(post)] == “⟶” && M [α−1(post)][α−1(a)] == “#”) then
(15)    for all (a ∈ S) ˄ (M [α−1(a)][α−1(post)] == “⟶” && M [α−1(post)][α−1(a)] == “#”) do
(16)     S = S – {a};
(17)    end
(18)    S = S ∪ {post};
(19)   else
(20)    S = S ∪ {post};
(21)   end
(22)  end
(23)  S = Ø;
(25)return L′;
3.1.2. Synchronous Composition

For a given trace and its corresponding process model, a new model can be generated by the following operations: ① convert the trace into the log model described by Petri net; ② merge two corresponding transitions with the same activity label in two models; and ③ merge presets and postsets of the same transitions, respectively. Finally, the generated model is the synchronous composition model of trace and process model.

The log model transition and process model transition with the same label x will construct synchronous movement. For other synchronous movements, the transitions should be named according to Definition 4. Algorithm 2 shows the pseudocode.

Input: process model Npm, log model Nlm;
Output: synchronous composition model Ncm.
(1)Pcm = Ppm ∪ Plm;
(2)Fcm = Ø;
(3)Tcm = Ø;
(4)mi,cm = mi,pm ∪ mi,lm;
(5)mf,cm = mf,pm ∪ mf,lm;
(6)for all ∈ Tlmdo
(7)  //place transitions in Tcm according to Tlm;
(8)  Tcm = Tcm ∪ {(, )};
(9)  //set the related mapping functions and arc relations
(10)  αcm ((, ty)) = αpm (ty);
(11)  Fcm = Fcm ∪ {(p, (, ty))|p ∈ ty ˄ ty ∈ Tpm} ∪ {((, ty), p)|p ∈ ty ˄ ty ∈ Tpm};
(13)for all ty ∈ Tpmdo
(14)  //place transitions in Tcm according to Tpm;
(15)  Tcm = Tcm ∪ {(, ty)};
(16)  //set the related mapping functions and arc relationships
(17)  αcm ((, ty)) = αpm (ty);
(18)  Fcm = Fcm ∪ {(p, (, ty)) | p ∈ ty ˄ ty ∈ Tpm} ∪ {((, ty), p) | p ∈ ty ˄ ty ∈ Tpm};
(20)for all ∈ Tlm ˄ ty ∈ Tpm ˄ αlm () == αpm (ty) do
(21)  //place synchronous transitions in Tcm;
(22)  Tcm = Tcm ∪ {(, ty)};
(23)  //remove log transitions and model transitions with the same labels;
(24)  αcm ((, ty)) = αcm (, );
(25)  Tcm = Tcm − {(, ), (, ty)};
(26)  //set the related mapping functions and arc relationships of the new transitions; remove the related arc relationships of the removed transitions of the deleted transitions
(27)  Fcm = Fcm ∪ {((, ty), p′)|p′ ∈ (, )} ∪ {(p′, (, ty))|p′ ∈ (, )};
(28)  Fcm = Fcm – {((, ), p′)|p′ ∈ (, )} – {(p′, (, ))|p′ ∈ (, )};
(29)  Fcm = Fcm ∪ {((, ty), p) | p ∈ (, ty)} ∪ {(p, (, ty)) | p ∈ (, ty)};
(30)  Fcm = Fcm – {((, ty), p) | p ∈ (, ty)} – {(p, (, ty)) | p ∈ (, ty)};
(32)return Ncm = (Pcm, Tcm; Fcm, αcm, mi,cm, mf,cm);

The algorithm integrates the transitions, places, and arc relationships of process model Npm and log model Nlm into synchronous composition model N3. There are three loop structures, so O (n) is the time complexity. The quantity of transitions, places, and the arc relationships affect the size of n, so space complexity is O (n).

Taking N1 as an example, trace σ1 = <a, f, b, e> is given. Figure 3 shows the log model N2 which is converted from σ1. The synchronous composition model N3 computed by Algorithm 2 from log model N2 and process model N1 is shown in Figure 4.

3.2. Computation of Alignment
3.2.1. Reachable Graph of Synchronous Composition Model

Since alignment is a sequence of movements, the transitions occur with the change of states and reachable graph can clearly express the changes of states and occurred transitions, and a weight can be given to every arc of a reachable graph to represent the cost of the occurred transition. The computation of optimal alignment can be converted to find the shortest path of a directed weighted graph, so the reachable graph of N3 is then computed. The reachable graph TS1 of N3 is shown in Figure 5.

It can be inferred from Definition 4 that (, t1) is a synchronous movement, so lc ((, t1)) = 0; (≫, t3) is a model movement, so lc((≫, t3)) = 1; (, ≫) is a log movement, so lc ((, ≫)) = 1. By analogy, it can be obtained that the values of the other transitions in Figure 5 are lc ((, t2)) = 0, lc ((≫, t4)) = 1, lc ((, t5)) = 0.

3.2.2. Search for Optimal Alignment

Considering the particularity of the clinical pathway model, the sequential and selective structure should be used as far as possible in this kind of model and the cyclic structure should be avoided. Hence, the cyclic structure in the reachable graph of the model will be ignored, and the graph can be easily converted into the form of relationship matrix. It can be seen that the optimal alignment computation is not complicated, and the efficiency is one of the key performance measures. Hence, it is necessary to select appropriate search algorithms to maximize efficiency.

As mentioned earlier, the optimal alignment problem is also the shortest path problem. Among the solutions to the shortest path problem, the basic search algorithms (e.g., the depth-first algorithm, the breadth-first algorithm, and the Dijkstra algorithm) are well known with simple structure and easy implementation. However, in dealing with this problem, it is necessary to traverse paths one by one, and the efficiency is poor. The intelligent algorithms (e.g., genetic algorithm, reinforcement learning algorithm, and ant colony algorithm) are too complex and need to be trained through a large number of datasets to adjust parameters, which is suitable for solving the tricky issues in complex scenes and is not suitable for solving this problem.

Since the weight of each arc is known, algorithm is suitable for this problem, which is a famous heuristic algorithm. When solving the shortest path, algorithm will estimate the distance between the current node and the target node. The accuracy of the heuristic function will affect the efficiency of algorithm, and the accuracy represents the proximity between the estimated and the actual value. In addition, the code implementation of algorithm is simple. Since the estimated value computed by the cost function used in this paper is basically equal to the actual value, the computational efficiency can reach the highest level. For reachable graph TS1 obtained in Section 3.2.1, Algorithm 3 shows the pseudocode.

Input: reachable graph TS of synchronous composition model N3;
Output: optimal alignment γ.
(1) //initialize a priority queue by (total cost value of mi to mf) + (total estimated value of the current node to the target node) in ascending order
(2)pqueue.create ();
(3)visitedNodesSet = Ø;
(4)pqueue.push (TS.initialmarking);
(5)while pqueue.size ()! = 0 do
(6)  currenrNode = pqueue.poll ();
(7)  if currenrNode == targetNode then
(8)  //recursively search the predecessor node of currenrNode to get optimal alignment
(9)  γ=getOptAlignment (currenrNode);
(10)  return γ;
(11)  else
(12)   //visit all successorNodes of currentNode
(13)   for all successorNode ∈ currentNode.getSuccessors() do
(14)    //calculate the new total cost value of successorNode
(15)    newcost = successorNode.calNewCost (currentNode);
(16)    if successorNode ∈ visitedNodesSet then
(17)     //for the visited node, if the new total cost value is smaller, update the total cost value of successorNode
(18)     if successorNode.getTotalCost() > newcost then
(19)      successorNode .setTotalCost (newcost);
(20)      pqueue.push (successorNode);
(21)     end
(22)    Else
(23)     visitedNodesSet = visitedNodesSet ∪ {successorNode};
(24)     successorNode.setTotalCost (newcost);
(25)     pqueue.push (successorNode);
(26)    end
(27)   end
(28)  end

Considering that the algorithm uses the priority queue, assuming that it uses quick sort, O (nlog2n) is the time complexity, and the quantity of reachable markings affects the size of n. In Algorithm 3, the frequency of iterations is affected by the complexity of the reachable graph, too. Hence, the time complexity is O (n2log2n), and the space complexity is O (n).

Taking reachable graph TS1 as an example, formula (2) is the result of optimal alignment.

Algorithm 3 finds the optimal alignment by calculating the minimum sum cost value of the occurred transition sequences. In the implementation of the algorithm, it is not necessary to generate reachable graph of synchronous composition model. The synchronous composition model can be directly used as the input to generate the search space, which can simplify the solving step, save the processing time, and improve the computational efficiency.

4. Experiment Analysis

4.1. Experiment Settings

This experiment will compare the scale of the model and the efficiency of alignment. There are three main elements to measure the scale of the model, including(1)Quantity of places: how many places there are in the model. In general, the less the number, the lower the complexity.(2)Quantity of transitions: how many transitions there are in the model. In general, the less the number, the lower the complexity.(3)Quantity of arcs: how many arc relationships there are in the model. In general, the less the number, the lower the complexity.

The factors that measure alignment efficiency are the time consumed and the memory space occupied by the process of optimal alignment computation. In this experiment, the memory occupancy is measured by the quantity of reachable markings generated in the process of running Petri net. Hence, there are two main factors to measure the alignment algorithm:(1)Quantity of reachable markings (quantity of queued nodes): how many reachable markings generated there are when the model is running. When calculating the optimal alignment, the number is equal to the quantity of the nodes queued in the priority queue, which is also called as the quantity of queued nodes.(2)Computation time: the time consumed when computing the optimal alignment.

For a given process model, a log set, and a cost function, the computation process of the optimal alignment which presented by Adriansyah et al. can be divided into two steps: first, generate the search space; then, search for the optimal alignment. The reachable graph of the new-generated model is considered as the search space, whose complexity can be measured by the scale of the new model and basically determine the cost of the entire optimal alignment computation method. This experiment focuses on this part. Similarly, the method in this paper uses algorithm for optimal alignment computation. Theoretically, this method reduces unnecessary transitions and arcs, so the quantity of reachable markings is less, and the efficiency should be better than the classical one when computing the optimal alignment.

Next, the clinical pathway of ST-Elevation Myocardial Infarction (STEMI) under COVID-19 is taken as an example for experimental verification. The model is manually established according to the natural language description of the diagnosis and treatment process. The specific clinical pathway is shown in Figure 6, and the relationship between activities and transitions is shown in Table 2.

The main work of this experiment is to compare the scale of the search space and efficiency of the alignment computation between the classical method and the method presented in this paper. All the fitting traces in logs are generated from the running process model, then noise is added in the proportion of the deviation number to the length of the trace from 0% to 30%, and each 5% is a group. Each trace is dealt with by Algorithm 3 and the classical algorithm for 10 times, respectively. The mean computation time and model scale parameters are obtained, so as to fully compare the differences between the two under the premise of control variables.

4.2. Experimental Environment

The experiment code adopts Java language, and Table 3 shows the hardware and software platform configuration.

4.3. Algorithm Superiority Verification

The results are shown in Figures 710. Among them, Figures 7 and 8 show the comparison of the scale parameters of the two generated models. Figure 9 shows the comparison of the mean queued node number of the two models, and Figure 10 shows the comparison of the mean computation time of the two methods.

This paper does not give a comparison on the quantity of places between the two models because when the trace length and the process model are the same, the quantity of places in the two models is the same, too. Comparing the product model in classical algorithm and synchronous composition model in this paper, the quantity of places in the two models is equal to the element number of the union of places in the log model and process model.

Figure 7 compares the quantity of transitions between the two models. The number in the synchronous composition model is about 37, and the number in the product model is about 54. The synchronous composition model reduces some unnecessary transitions, which greatly decreases the scale of the model.

Figure 8 shows the comparison about the quantity of arcs between the two models. The number in the synchronous composition model is about 92, and the number in the product model is about 125. Arc is the flow relationship explained in Definition 2, which indicates the relationship between the transition and the place. The synchronous composition model not only reduces transitions but also reduces some unnecessary flow relationships, so it reduces the complexity of the search space.

In the part of the optimal alignment computation, Figure 9 shows the quantity of reachable markings in reachable graphs of two models, i.e., mean queued nodes in the optimal alignment computation process. The number in the synchronous composition model is about 47, and the number in the product model is about 72. The quantity of reachable markings affects the time of iteration and sorting efficiency of algorithm, as shown in Algorithm 3.

In Figures 7 and 8, the synchronous composition model reduces unnecessary transitions and arcs compared with product model, which makes the reachable graph closer to the real situation and reduces the unnecessary reachable markings.

Figure 10 compares the mean computation time of the two methods in finding the optimal alignment. The mean computation time of Algorithm 3 is about 150 ms, while the time of the classical algorithm is about 245 ms. It saves about 40% of the time and can complete the computation of optimal alignment faster than the classical algorithm at each noise level.

It can be seen from the data that the synchronous composition model used in this paper has lower complexity and higher computational efficiency. Next, the reasons are analyzed and explained.

First of all, the method presented in this paper preprocesses the log when generating the search space and removes noise. This step is also an indispensable step because this method is designed for clinical pathway, which can effectively filter unreasonable noise to prevent the algorithm from failure. Preprocessing is carried out for the log with low time complexity. Even if it cannot be directly reflected from the data, there is a subtle influence on the alignment efficiency and algorithm stability. The product model used in classical method has no need to do this operation because its higher complexity allows it to adapt to more situations at the cost of efficiency. Even if noise can be filtered by such preprocessing, it will not affect the stability of the classical algorithm and has little effect on the efficiency improvement.

Secondly, when generating the search space, the product model will keep the two kinds of original transitions, but the synchronous composition model will not. Assuming that the sum of the transitions in the log model and process model is t, the sum of the arcs is f, and the quantity of synchronous activities is s (assuming that the sum of the flow relationships associated with the log transition and the model transition corresponding to the i-th synchronous activity is fi); in the product model, the quantity of corresponding transitions is t + s, and the quantity of arcs is ; in the synchronous composition model, the quantity of transitions is t − x, and the quantity of arcs is f. That is, each synchronous activity will make product model have 2 more transitions and several arcs than the synchronous composition model. These additional transitions and arcs have no actual meaning. The preprocessing step of this method solves the problem in advance and simplifies the scale of the model as much as possible. Hence, the search space of this method is much smaller than that of the classical method. In addition, the higher the noise level, the lower the number of synchronous activities. Figures 7 and 8 show that with the gradual increase of the noise level, the scale of the two models is gradually close. However, the higher the noise ratio is, the closer the trace is to the real noise, the lower the significance of the analysis is, and the more it should be abandoned.

In the part of computing the optimal alignment, both methods use search algorithm. The time complexity of search algorithm in Algorithm 3 is O (n2log2n), and n is affected by the quantity of reachable markings. Even if the difference of the scale between the two models is not large, there will still be a large efficiency gap in the two methods. Figures 710 show that the quantity of transitions and arcs decreases less than 30%, and the mean queued nodes reduces by about 34%, and the mean computation time of optimal alignment reduces by about 40%.

According to the data and analysis in the experiments, when calculating the optimal alignment, since the efficiency of the optimal alignment increases geometrically with the model complexity, the selection of synchronous composition model can greatly improve the alignment efficiency.

4.4. Algorithm Effectiveness Verification

After the computation of the optimal alignment is completed, the optimal alignment results such as formula (2) are obtained. It can be clearly seen from the equation whether the activities of the patient’s diagnosis and treatment process are normal and whether there are missing or multiple activities. Therefore, the medical staff can master the whole clinical pathway diagnosis and treatment process. An example in the diagnosis and treatment log is illustrated below.

Trace σ2 = <…, exclusion of COVID-19 infection, …, CCU thrombolysis, transfer to isolation ward, recanalization of thrombolysis in emergency patients, …>, , and here, part of synchronous transitions are replaced by ellipses. The relationship between the activities and the transitions is shown in Table 4.

Trace σ2 is denoted as <…, e, …, c, t, r, …> according to Table 4, and the optimal alignment result is shown in the following:

According to the optimal alignment result of formula (3), it can be seen that the patient was excluded from COVID-19 infection at the beginning, but later transferred to the isolation ward which may be caused by many reasons. For example, as the epidemic situation intensifies, it is necessary to strengthen the protective measures for high-risk patients; or the patient contacted outside visitors with risk of infection during treatment, or the patient has concealed the travel information before admission to the hospital. The specific situation needs to be analyzed by the medical staff with the actual situation, which cannot be judged only from the logs.

The classical algorithm has been widely recognized and used for a long time, and its effectiveness is beyond reasonable doubt. Hence, the optimal alignments computed by the two algorithms need to be compared one by one. If all the results are the same, Algorithm 3 can be considered to be effective. Otherwise, if the results are different, it is possible that this algorithm be considered to be invalid.

Since the optimal alignment result is a sequence composed of nodes in the form of (α2 (), (α1 (ty), ty)), where corresponds to log model transition, α2 () is the activity corresponding to , ty corresponds to process model transition, and α1 (ty) is the activity corresponding to ty. Hence, after the optimal alignment is completed, the optimal alignment results obtained by the two algorithms are compared one by one to check whether the activity of the log model and the activity and the transition of the process model are exactly the same. According to the comparison results, it shows that all the optimal alignments are the same, which can illustrate that the algorithm is effective in deviation detection.

5. Conclusions

The deviation detection of clinical pathway is an important technology to standardize and study the diagnosis and treatment process and optimize the clinical pathway. The classical algorithm presented by Adriansyah et al. can synthesize the trace and process model and has a wide range of applications. Compared with the previous forward-looking methods, it has made a qualitative leap. However, with high complexity of search space, the efficiency in computing the optimal alignment is still unsatisfactory. Hence, this paper presents an optimization algorithm.

In this paper, the diagnosis and treatment log is preprocessed by the particularity of the diagnosis and treatment process. Considering that the alignment is logical based on the reachable graph to find the shortest path, even if the unnecessary transitions are only slightly reduced, the alignment efficiency can still be greatly improved. Hence, this paper uses the synchronous composition model for alignment computation, which greatly reduces the alignment time.

Finally, the clinical pathway of STEMI under COVID-19 is taken as an example for experimental analysis. It is illustrated that the newly proposed deviation detection method has the advantages of higher efficiency compared with the traditional algorithm, which can greatly shorten the time consumed in the deviation detection process. In the future work, this method can be applied to online deviation detection. Online deviation detection can timely detect the anomaly and remind patients of the deviation existing in the current completed diagnosis and treatment process. Meanwhile, it can predict the possible subsequent deviation and provide judgment basis for medical staff to carry out manual intervention to avoid adverse events. Online deviation detection requires higher efficiency, which can better reflect the advantage of the new method.

Data Availability

The data used to support the findings of this study are openly available at https://drive.google.com/file/d/1gkPRQXDNBzcAkoIcLdFLGMOGKluPhVdf/view?usp=sharing.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This study was supported in part by the National Natural Science Foundation of China under grant nos. 61973180 and 72101137, in part by the Tai’shan Scholar Construction Project of Shandong Province, in part by the Education Ministry Humanities and Social Science Research Youth Fund Project of China under grant nos. 20YJCZH159 and 21YJCZH150, in part by the Natural Science Foundation of Shandong Province under grant nos. ZR2019MF033, ZR2020MF033, and ZR2021MF117, and in part by the Shandong Key R&D Program (Soft Science) Project under grant no. 2020RKB01177.