Abstract

In order to improve the efficiency of conformance checking in business process management, a business alignment approach is presented based on transition systems between relation matrices and Petri nets. Firstly, a log-based relation matrix of the events is obtained according to the event log. Then, the events in the relation matrix are observed and the transitions in the model are firing, and the activities in the log and in the model are compared. Next, the states of the log and the model are recorded until no new state can be generated, so a transition system can be obtained which includes optimal alignments between the event log and the process model. Finally, two detailed algorithms are presented to obtain an optimal alignment and all optimal alignments between the trace and the model based on the given cost function, respectively. The availability and effectiveness of the proposed approach are proved theoretically.

1. Introduction

Business process management (BPM) aims to provide a unified modeling, running, and monitoring environment for business processes from information technology and management technology. As an important branch of BPM, process mining is to discover, monitor, and enhance the actual business processes by extracting valuable information from event logs [1]. The research on process mining is significant for the implementation, analysis, and improvement of business processes, so it is a hot topic in the related fields [2, 3]. Process mining mainly includes process discovery, conformance checking, and process enhancement [46]. Conformance checking is to compare the events in the event logs with the activities in the process models, and it can find the similarities and differences between the observed behaviors and the modeled behaviors [710].

Among many conformance checking approaches, aligning observed and modeled behaviors becomes an important means to measure the compliance of event logs and process models [1113]. The alignment approaches aim at finding the deviations between process models and event logs. Usually, the alignments with the least deviation are considered as the optimal alignments. The search algorithms of all the optimal alignments are NP-hard problems, whose time complexity and space complexity are very high. In the research field of process mining, alignment approaches have been deeply studied and widely applied [1419]. There are a large amount of literatures on alignment approaches to introduce their ideas, motivations, and problems resolved.

Through the analysis and conclusion of various alignment approaches [2024], we find the existing problems of the current ones, mainly including high complexity of the search algorithms, unable to find the required and accurate optimal alignments, and unable to find all the optimal alignments. To solve the abovementioned problems, we propose an alignment approach based on relation matrices and Petri nets. The proposed approach is completely different from the existing alignment approaches in which it does not deal with one trace but all the traces in the event log at once. In this approach, the relation matrix reflects all the partial order relations between events in the event log. A transition system can be obtained by comparing the events in the relation matrix with the activities in the Petri net, which includes all the alignments between all the traces and the process model.

Our approach includes two steps: one is to generate the search space; the other is to search the optimal alignments in the space. Compared with other alignment approaches, this approach has two advantages: one is that it can calculate the accurate alignment results but not approximate solutions; the other is that it can obtain all the optimal alignments based on the given cost functions. However, the approach presented in this paper can embody the alignment results between all traces in the event log and the process model in a transition system; thus, it can save time and space to calculate the search space.

The rest of this paper is organized as follows. Section 2 discusses the related work. Section 3 recalls some basic concepts. The generation algorithm of the alignment transition system is presented in Section 4. Section 5 presents the approaches of searching for an optimal alignment and all optimal alignments in the alignment transition system, respectively. Section 6 describes the scalability of our approach. Case studies are given to illustrate the superiority of our approach in Section 7. Section 8 draws the conclusion and the future work.

The computation of the optimal alignments is NP-hard, which has very high time and space complexity. Now, the existing alignment approach can only deal with one trace once. If we want to compute all the optimal alignments for the whole event log, we must do the same work for several times. As related work, the following approaches are introduced in this paper.

An approach is presented by Cook and Wolf, which compares the traces with process models in order to quantitatively measure their similarity [19]. However, the approach is analyzed using the state-space technology, and it does partially support the invisible transitions and duplicate transitions. In addition, a heuristics estimation function is used to deal with the high computational complexity in this approach. The function simplifies the search space. However, the application of the function may lead to the approximate optimal solutions.

An approach to align observed and modeled behaviors is proposed by Adriansyah [13]. The approach is a very classical one in the related work because it can obtain all the exact optimal alignments between the given trace and the process model. Its main idea is as follows: ① an event net is constructed based on the trace; ② the product of the event net and the process net is generated, which is also a Petri net; ③ part of the reachable graph of the product model is constructed while searching for the shortest path using algorithm, which is also conformed to the definition of the transition system; ④ an optimal alignment can be obtained when arriving at the final state. The time and space complexity is very high in order to obtain the solutions.

An approach of conformance checking based on partially ordered event data is proposed by Lu et al. [20, 21]. The main idea of the approach is as follows: ① the partially ordered traces are extracted from the existing logs; ② the partially ordered alignments are obtained through dealing with the partially ordered traces; ③ a quantitative-based quality metric is introduced to objectively compare the eventual results of conformance checking. Although the alignment procedure is simplified to some extent using the technology of partially ordered alignments, only the approximate solution of the optimal alignment can be obtained in some cases.

A workflow decomposition approach to align observed and modeled behaviors is proposed by Wang et al. [22]. The approach can divide the large process models and the relevant event logs into several separate parts that can be analyzed and aligned independently. However, the approach can only deal with the block workflow models which can be divided into several segments. Generally, it can only obtain some alignments, rather than all the optimal alignments.

An efficient alignment approach between event logs and process models is presented by Song et al. [23]. The approach leverages effective heuristics and trace replaying to significantly reduce the overall search space for seeking the optimal alignment. The approach improves the efficiency of computing the search space, including the time aspect and space aspect. However, there are still some redundant nodes in the search space, which are not on the paths to the optimal alignments. In addition, only part of the optimal alignments can be obtained due to the limitation of the preprocess, even in some cases only approximate optimal alignments can be obtained.

A reduced alignment approach between event logs and process models is presented by Tian et al., which is named as OAT approach [24]. An optimal alignment tree is generated through this approach. In the optimal alignment tree, the optimal alignments can be easily found by adding the final marks to the leaf nodes which is related to the optimal alignments. The approach largely reduces the time complexity when searching for the optimal alignments in the search space. However, there are some fatal problems, for instance, all the information is placed on the nodes, the duplicate nodes are not shared, and the invalid nodes are not pruned. Hence, the size of the optimal alignment tree is too large and even the search space will explode.

The approach proposed in this paper can obtain the log-based relation matrix according to the given event log. The relation matrix can illustrate all the precursor and successor relations between the events in the log. Then, an alignment transition system can be acquired through comparing the events in the log with the activities in the model and predicting the next move. The alignment transition system includes all the optimal alignments between all traces and the model. Two algorithms are presented to calculate an optimal alignment and all the optimal alignments based on the cost function, respectively.

Compared with other approaches, the approach in this paper has two advantages. One is that it can obtain the accurate alignment results rather than the approximate solutions. The other is that it can obtain all optimal alignments between traces and process models based on the given cost function. Especially, the search space of our approach includes the alignment results for all traces in the log, but not only one.

3. Preliminaries

This section briefly reviews some basic concepts, including traces [4], event logs [4], Petri nets [2531], transition systems [13], and alignments [13].

Definition 1. (trace and event log). Let A be a set of activities. A trace is a sequence of activities. An event log is a multiple set of traces on A.

Definition 2. (labeled Petri net system). Let A be a set of activities. A labeled Petri net system over A is a tuple N = (P, T; F, α, mi, mf), where(1)P is the set of places(2)T is the set of transitions, and (3) is an arc set between transitions and places, i.e., a flow relation(4) is a function that maps transitions to labels, and τ denotes the invisible transition, (5)mi and mf are the initial marking and final marking, respectivelyFor convenience, in the remainder of this paper, the labeled Petri net system is abbreviated as Petri net.

Definition 3. (perset and postset). Let N = (P, T; F, α, mi, mf) be a Petri net. For ,where represents the preset of x and represents the postset of x.
We describe the transition firing rules by using the multisets of places. For any reachable state , the transition firing rules of Petri net N = (P, T; F, α, mi, mf) are as follows:(1)For transition , if , t is enabled denoted by m[t>(2)If m[t>, it means that the transition t can occur under the marking m, and after the transition t is fired, a new marking m′ is generated, denoted by m[t>m′, where The set of all reachable states from state m is denoted as R(m), and . In Petri net N = (P, T; F, α, mi, mf), mi means the initial state of the system, and then represents the set of all reachable states in the running process of the system.

Definition 4. (transition system). Let A be a set of activities. A transition system is a triplet TS = (S, A, T), where S is the set of states, and is the set of transitions. is the set of initial states, and is the set of final states.

Definition 5. (alignment). Let A be a set of activities. is a trace over A and N = (P, T; F, α, mi, mf) is a Petri net over A. An alignment between σ and N is a legal movement sequence such that(1), i.e., its sequence of movements in the trace (ignoring >>) yields the trace(2), i.e., its sequence of movements in the model (ignoring >>) yields a complete firing sequence of NFor all tuples in an alignment, (a, t) is one of the following movements:(1)log move if a ∈ A and t = >>(2)model move if a = >> and t ∈ T(3)synchronous move if either a ∈ A and t ∈ T or a = >> and α(t) = τ(4)illegal move otherwiseWe consider all of the log moves, model moves, and synchronous moves as legal ones. An alignment is legal if it only contains legal moves.
is the set of all alignments between trace σ and model N.
There may be several different alignments between the trace and model. To get the most suitable alignments, a cost function c((a, t)) is used to assign a certain value to each move. According to the given cost function, the alignments with the least total cost are called optimal alignments.
Likelihood cost function c() determines the optimal alignment set between the given trace and model directly. In this paper, the standard likelihood cost function lc() is used to assign the cost to the moves, i.e., the cost value of the synchronous move, log move, and model move is 0, 1, and 1, respectively.
is the set of all optimal alignments between trace σ and model N based on the function lc().

4. Generation of Transition Systems

When measuring the fitness between the event log and the process model, the main work is to align the events in the trace with the activities in the model. In the current alignment procedure, assuming that the observed event in the log is x, the fired transition in the model is ti, and ti’s mapping activity is y. In the next procedure, one of the following three scenarios may occur: ① we can observe activity x in the log but x cannot be modeled by firing the transition in the model, then a log move (x, >>) is generated; ② when activity y is modeled by firing the transition ti in the model but cannot be observed in the log, a move (>>, ti) is generated; if y is equal to τ, (>>, ti) is a synchronous move; else, (>>, ti) is a model move; ③ if x is equal to y, a synchronous move (x, ti) can be generated when the activity observed in the log is the same as the one modeled in the model.

The proposed approach is derived from the abovementioned idea. It aims to observe the event log, run the process model, and compare the event in the log with the activity in the model. We record the states of the log and model, and then we can get an optimal alignment graph. The graph includes all the optimal alignments.

Next, we take the given event log and process model as an example to introduce the basic principles of the proposed approach.

Let A = {a, b, c, d} be a set of activities. There is an event log of a simple process on A, as shown in Table 1. We denote the event log as , where , , , and .

Given process model N1 = (P1, T1; F1, α1, mi,1, mf,1), as shown in Figure 1. Its place set is P1 = {p1, p2, p3, p4}; its transition set is T1 = {t1, t2, t3, t4}; its flow relation set is F1 = {(p1, t1), (t1, p2), (p2, t2), (p2, t3), (t2, p3), (t3, p3), (p3, t4), (t4, p4)}; the mapping relations between transitions and activities are α1(t1) = a, α1(t2) = b, α1(t3) = c, α1(t4) = d; the initial marking is mi,1 = [p1]; the final marking is mf,1 = [p4].

4.1. Log-Based Relation Matrices

Consider for instance L1 = [σ1, σ2, σ3, σ4] again. For this event log, the following log-based order relations can be found, as in (2)–(5):

Relation contains all pairs of activities with a directly following relation. because b directly follows a in trace σ1 = <a, b>. However, because a never directly follows b in any trace in the log. contains all pairs of activities in a causal relation, e.g., because sometimes b directly follows a and never the other way around ( and ). because and . because , i.e., a follows itself.

We can conclude the footprint of the log L1, as in (6). The subscripts have been removed in

According to the footprint in (6), we can get all the relations of the activities in the event log, but not only a trace. Although the footprint embodies the relations between the activities in the log, it cannot describe the current states of the log, especially the start and end of the traces. To illustrate all the information needed in the alignment process, we convert the traces in the log into the transition systems, as shown in Figure 2.

A trace in the log is corresponding with a transition system, which is called as trace-based transition system. Firstly, we create an initial state, denoted as s0, which means the start of the trace. Then, each activity in the trace is mapped into a state, e.g., a and b in trace σ1 are mapped to sa and sb, respectively. And the state mapped by the last activity is the final state, e.g., b in trace σ1 is the last activity so that sb is the final state of transition system TS1. The states are changed in the transition system when the activities are observed in the trace. Hence, adding the edge between the adjacent states, which is labeled by the observed activity.

Definition 6. (trace-based transition system). Let A be a set of activities. is a trace of length n over A. The trace-based transition system of σ is a transition system TS = (S, set(σ), T), where(1)(2)(3)Sstart = {s0}, i.e., s0 is the initial state(4)Send = , i.e., is the final stateHere, σ[i] refers to the ith element of trace σ, and converts a sequence into a set.
In Figure 2, the state with an arrow line is the initial state, and the one with two circles is the final state.
Transition system TS1 depicted in Figure 2(a) can be formalized as follows: S1 = {s0, sa, sb}, , ,set(σ1) = {a, b}, and T1 = {(s0, a, sa), (sa, b, sb)}.
Transition system TS2 depicted in Figure 2(b) can be formalized as follows: S2 = {s0, sb, sd}, , ,set(σ2) = {b, d}, and T2 = {(s0, b, sb), (sb, d, sd)}.
Transition system TS3 depicted in Figure 2(c) can be formalized as follows: S3 = {s0, sa, sa, sb}, , ,set(σ3) = {a, b}, and T3 = {(s0, a, sa), (sa, a, sa), (sa, b, sb)}.
Transition system TS4 depicted in Figure 2(d) can be formalized as follows: S4 = {s0, sa, sb, sd}, , ,set(σ4) = {a, b, d}, and T4 = {(s0, a, sa), (sa, a, sb), (sb, d, sd)}.
Through the analysis of the footprint in (6) and the transition systems in Figure 2, we can infer a relation matrix between states.

Definition 7. (log-based relation matrix). Let A be a set of activities. is an event log over A. TSi = (Si, ∂set(σi), Ti) is the trace-based transition system of trace σi, where 1 ≤ i ≤ n. The log-based relation matrix is a matrix LRM , where(1)LRMrow is the row mark of LRM(2)LRMcol is the column mark of LRM(3)If , and either sa = LRMcol[k] or ; else, (4)LRMstart = {s0}, i.e., s0 is the initial state(5), i.e., is the final stateHere, (S) is an inverse operation of ∂set(σ) and converts a set into a sequence. πi(x) refers to the ith element of x. The symbol # represents that the two states have no direct causal relation.
According Definition 7, the log-based relation matrix LRM1 of log L1 can be constructed, as in (7). The states labeled by the star in (7) are the final states:According to Definition 7, the log-based relation matrix can present the sequences of activities. We read the state in the relation matrix, beginning at the initial state, and ending at the final state. A complete sequence consists of all the activities between states. The set including all the sequences is denoted as . If LRM is the relation matrix of event log L, , i.e., ∀σi ∈ L: (1 ≤ i ≤ |L|). In contrast, .
Obviously, , , , and , then . But . Because there is the sequence <a, a, a, a, b>, where <a, a, a, a, b> ∈  but <a, a, a, a, b> ∉ L1.

4.2. Alignment Transition Systems

Given the process model N= (P, T; F, α, mi, mf) and the event log L= [σ1, σ2, σ3, …, σn], the relation matrix LRM[S][S − {s0}] is derived from the log firstly. We can measure the fitness between the log and model based on the relation matrix and Petri net.

From now on, we replace the event log as the relation matrix and use the matrix to embody the current state, observed activity, and next state of the log. Through the comparison between the relation matrix and Petri net, we can get an alignment transition system.

Definition 8. (alignment transition system). Let A be a set of activities. is an event log over A.LRM[S][S − {s0}] is the relation matrix of L. N=(P, T; F, α, mi, mf) is a Petri net over A. The alignment transition system between L and N is a transition system ATS = (S, M, T), where(1)(2)(3)Sstart = {(mi, s0)} is the initial state(4) are the final statesWe observe the log and run the process model. Meanwhile, we record the current state of the log and the current reachable marking of the model. Then, according to their current information, we infer that the activity can be observed in the matrix, the transition can be fired in the Petri net, and their relation can be compared. On the basis of the compared result, we can predict the next state of the matrix and the next reachable marking of the net.
For arbitrary Petri nets, their structures may be very complicated and diverse. Here, we discuss a special structure for the Petri net and its influence on the alignment transition system. The Petri net contains cycles in which the cost of the transitions is 0. As a result, the alignment transition system may contain cycles with cost 0. In the context of practical application of our paper, the transitions with cost 0 are the invisible transitions in the Petri nets. The invisible transitions represent the activities that can never be observed, so the cycles containing only the invisible transitions have little meanings to the alignment results. In order to reach the final node from the initial node in a limited number of steps, we delete this kind of cycles from the alignment transition system.
The abovementioned idea is translated into the concrete algorithm to realize the alignment transition system, as shown in Algorithm 1.
The computational complexity of the alignment transition systems based Petri nets and relation matrices is related to the number of the reachable states of the Petri nets and the length of the traces, which is a NP-hard problem. Its complexity is also very high just like the reachable marking graph of the Petri nets. Especially, when there are many transitions with the concurrent relations in Petri nets, the number of the reachable states increases exponentially and even causes state space to explode.
According to Algorithm 1, taking relation matrix LRM1 in (7) and process model N1 in Figure 1 as an example, we can get an alignment transition system, as shown in Figure 3.
Given the event log and the process model, the alignment between them is translated into the calculation of the move sequence in the alignment transition system. The moves are the weights of the edges in the system, so we can calculate the move sequence when traversing the branch of the system. Hence, the problem of the optimal alignment between the log and the model is translated into that of the minimum cost move sequence in the alignment transition system. However, the alignment transition system includes all the information of the event log, so the sequence consisted of the projection of the moving sequence onto the first column must be the given trace when calculating the optimal alignments between the trace and the model.
When visiting the states in alignment transition system ATS1 in Figure 3, we can get some shortest paths, as shown in Figure 4, and their corresponding optimal alignments, as shown in Figure 5.

Input: Petri net model N1=(P1, T1; F1, α1, mi,1, mf,1), and relation matrix LRM[S][S − {s0}].
Output: alignment transition system TS = (S, M, T).
Initialize: S ⟵ ∅, M ⟵ ∅, T ⟵ ∅, Sstart ⟵ {(mi,1, s0)}, Send ⟵ ∅.
(1)S ⟵ Sstart; n ⟵ 1;
(2)WHILE (n ≤ |S|) DO
(3) (mj, sx)⟵S[n];
(4) IF(mj = mf,1 AND ) THEN
(5)  Send ⟵ Send ∪ ;
(6) END IF
//judge the current state to be the final state;
(7) FOR (all sy ∈ ∂set(LRMrow)) DO
(8)  IF (LRM[sx][sy] ≠ #) THEN
(9)   M ⟵ M ∪ {(LRM[sx][sy],>>)};
(10)   S ⟵ S ∪ {(mj, sy)};
(11)   T ⟵ T ∪ {(mj, sx), (LRM[sx][sy], >>), (mj, sy)};
(12)  END IF
(13) END FOR
//the following log moves, related new states, and transitions that may be generated;
(14) IF(mj ≠ mf,1) THEN
(15)  FOR(all tk ∈ T1) DO
(16)   IF( ∈ mj) THEN
(17)    M ⟵ M ∪ {(>>, tk)};
(18)    mj[tk > my;
(19)    S ⟵ S ∪ {(my, sx)};
(20)    T ⟵ T ∪ {(mj, sx), (>>, tk), (my, sx)};
(21)   END IF
(22)  END FOR
(23) END IF
//the following model moves, related new states, and transitions that may be generated;
(24) FOR((all sy ∈ ∂set(LRMrow)) AND (all tk ∈ T1)) DO
(25)  IF ( ∈ mj) AND (LRM[sx][sy] = α(tk)) THEN
(26)   M ⟵ M ∪ {(α(tk), tk)};
(27)   mj [tk > my;
(28)   S ⟵ S ∪ {(my, sy)};
(29)   T ⟵ T ∪ {(mj, sx), (α(tk), tk), (my, sy)};
(30)  END IF
(31) END FOR
//the following synchronous moves, related new states, and transitions that may be generated;
(32)n ⟵ n + 1;
(33)END WHILE
//delete the cycles with cost 0 in the transition system;
(34)FOR (all cycles with cost 0 in TS) Do
(35) Delete all the edges with cost 0;
(36) FOR(all nodes in the cycle) DO
(37) FOR(all nodes have no out edge) DO
(38)   Delete nodes;
(39)   Set the parents of nodes to be nodes;
(40)  END FOR
(41) END FOR
(42)END FOR
(43)RETURN TS;
4.3. Properties of Alignment Transition Systems

In this section, we present the properties of the alignment transition system and theoretically demonstrate that it includes the alignments between all traces in the event log and the process model.

Theorem 1. Let N = (P, TN; F, α, mi, mf) is a Petri net, L is an event log, LRM is a log-based relation matrix of L, and ATS = (SA, MA, TA) is an alignment transition system of LRM and N. SA = R(mi) × LRMrow, where R(mi) is all the reachable states of N, and LRMrow includes all the states in LRM.

Proof. sA ∈ SA, π1(sA) ∈ R(mi) ∧ π2(sA) ∈ LRMrow. So, sA ∈ R(mi) × LRMrow. Then, SA ⊆ R(mi) × LRMrow.
, we examine whether a path can be established from (mi, s0) to according to Algorithm 1. mj ∈ R(mi), we suppose that a firing transition sequence t1, t2tj in N, which makes mi[t1t2tj > mj. Then, there is a transition sequence in ATS which makes (mj, s0) ∈ SA. The sequence is <((mi, s0), (>>, t1), (m1, s0)), ; we suppose that an activity sequence a1, a2, …, ak in LRMrow, which makes , and . Then, there is a transition sequence in ATS which makes . The sequence is , . Hence, . So, R(mi) × LRMrow ⊆ SA.
In conclusion, SA = R(mi) × LRMrow.
Theorem 1 shows that the state set of the alignment transition system is the Cartesian product of the reachable state set of the Petri net and the state set of the relation matrix. According to Theorem 1, no matter what states we use, they must be in alignment transition system.
For example, R(mi,1) of N1 in Figure 1 is {[p1], [p2], [p3], [p4]} and in (7) is {s0, sa, sb, sd}. SATS1 of ATS1 in Figure 3 is {([p1], s0), ([p2], s0), ([p3], s0), ([p4], s0), ([p1], sa), ([p2], sa), ([p3], sa), ([p4], sa), ([p1], sb), ([p2], sb), ([p3], sb), ([p4], sb), ([p1], sd), ([p2], sd), ([p3], sd), ([p4], sd)}, which is equal to R(mi,1) × .

Theorem 2. Let N = (P, TN; F, α, mi, mf) is a Petri net, L is an event log, LRM is a log-based relation matrix of L, and ATS = (SA, MA, TA) is an alignment transition system of LRM and N. is the set of all alignments between all traces in L and , and includes all the sequences that are π2(<(sA,1, mA,1, sA,2), (sA,2, mA,2, sA,3), …, (sA, n−1, mA,n, sA,n)>), where sA,1 ∈ SAstart and sA,n ∈ SAend.

Proof. , ignoring >>, σ = π1(γ) ∈ L and λ = π2(γ) ∈ . <γ[1]>, <γ[1], γ[2]>, …, <γ[1], γ[2], …, γ[i]>, …, and <γ[1], γ[2], …, γ[q]> are the prefix of γ, and the prefix alignments between L and N, where 1≤i ≤ q.
We prove the theorem by induction on |γ|, where |γ| = q:(1)When i = 1, ① if γ [1] is a log move, γ [1] = (ak, >>). ak is the first activity of trace σ. So, , . According to Steps 7–13 of Algorithm 1, ∃((mi, s0), (ak, >>), ) is a transition of ATS. ② If γ [1] is a model move, γ [1] = (>>, tj). tj is the first transition of λ.mj ∈ R(mi), mi[tj > mj. According to Step 14–Step 23 of Algorithm 1, ∃((mi, s0), (>>, tj), (mj, s0)) is a transition of ATS. ③ If γ [1] is a synchronous move, γ [1] = (ak, tj). According to Step 24–Step 31 of Algorithm 1, ∃ ((mi, s0), (ak, tj), ) is a transition of ATS, where mi[tj > mj. Hence, <γ [1]> is the prefix of some sequences in .(2)When i = q − 1, it is supposed that <γ [1], γ [2], …, γ[q − 1]> is also the prefix of some sequences in . We suppose the last state in ATS is .Let i = q, ① if γ[q] is a log move, γ[q] = (ak+1, >>). ak+1 is the last activity of trace σ and mj = mf. So,  ∈ LRMend, . According to Steps 7–13 of Algorithm 1, is a transition of ATS. ② If γ[q] is a model move, γ[q] = (>>, tj+1). tj+1 is the last transition of λ and .mj+1 ∈ R(mi) ∧ mj+1 = mf, mj[tj+1 > mj+1. According to Steps 14–23 of Algorithm 1, is a transition of ATS. ③ If γ[q] is a synchronous move, γ[q] = (ak+1, tj+1). According to Step 24–31 of Algorithm 1, is a transition of ATS. Hence, <γ [1], γ [2], …, γ[q]> is a sequence in .
In conclusion, γ ∈ . Then, .
Theorem 2 shows that all the alignments between the traces in the log and the model can be found in the alignment transition system.

Corollary 1. Let N = (P, TN; F, α, mi, mf) is a Petri net, L is an event log, LRM is a log-based relation matrix of L, and ATS = (SA, MA, TA) is an alignment transition system of LRM and N. is the set of all the optimal alignments between the traces in L and N based on the standard likelihood cost function lc(). .

Proof. According to Definition 5, . According to Theorem 2, .
Hence, .
Corollary 1 shows that all the optimal alignments between the traces in the log and the model can be found in the alignment transition system. This corollary provides the theoretical foundation for the search of the optimal alignments in the alignment transition system.

5. Calculation of Optimal Alignments

We can get an alignment transition system between the event log and the process model by Algorithm 1. The optimal alignments between the trace and model based on the standard likelihood cost function can be calculated through finding the shortest path from the initial state to the final state in the system. This section presents two algorithms to realize the calculations of an optimal alignment and all optimal alignments, respectively.

Since most of the states in the alignment transition system have more than one parent and even some states have self-cycles, it is possible to reach the same state from different branches in the process of finding the optimal alignment. In the search process, we should record not only the current state and cost but also the prefix alignment. The unit that stores the related information is referred to as the search node.

5.1. Computing an Optimal Alignment

Because there are more than one trace in the event log, a path between the initial state and the final state in the alignment transition system may not be the alignment of the given trace, let alone its optimal alignment. In other words, the shortest path may not be mapping to the optimal alignment for the specified trace. So, not only the length of the path but also the specified trace must be considered when computing the optimal alignment between the trace and the model.

In order to get the optimal alignment of the given trace, not only the short path should be searched as far as possible but also it should be guaranteed that the sequence consisted of the projection onto the first column of the moves on the path is equal to the prefix of the trace. Algorithm 2 is presented to compute an optimal alignment between all traces in the event log and the process model based on the standard likelihood cost function.

Input: event log L1= [σ1, σ2, σ3, …, σn], and alignment transition system TS2 = (S2, M2, T2) between L1 and N1= (P1, T1; F1, α1, mi,1, mf,1).
Output: OA_one[n](OA_one[i] stores an optimal alignment γi between the ith trace and the model, where 1 ≤ i ≤ n).
(1) FOR(all σi ∈ L1) DO
(2)  queue ⟵ ∅;
(3)  firststate ⟵ (mi,1, s0);
(4)  firstalign ⟵ <>;
(5)  firstcost ⟵ 0;
(6)  firstnode ⟵ (firststate, firstalign, firstcost);
//add the initial node to the queue;
(7)  queue ⟵ queue ∪ {firstnode};
(8)  WHILE(queue ≠ ∅) DO
(9)  FOR (all node ∈ queue) Do
(10)   choose the node with the minimum cost as curnode;
(11)   queue ⟵ queue − {curnode};
(12)  END FOR
//delete the current node from the queue;
(13)  FOR(all tj ∈ T2) DO
(14)   IF(π1(tj) = π1(curnode)) THEN
(15)    γi = π2(curnode) ⊕ <(π2(tj)>;
(16)    IF is the prefix of σi THEN
(17)     IF AND (π3(tj) ∈ )THEN
(18)      OA_one[i] ⟵ γi;
(19)      JUMP TO Step 1;
 //obtain the optimal alignment for the given trace and be ready to do the search for the next one;
(20)     ELSE
(21)      sucstate ⟵ π3(tj);
(22)      sucalign ⟵ γi;
(23)      succost ⟵ π3(curnode) + lc(π2(tj));
(24)      sucnode ⟵ (sucstate,sucalign, succost);
(25)      queue ⟵ queue ∪ {sucnode};
 //add the successor to the queue;
(26)     END IF
(27)    END IF
(28)   END IF
(29)  END FOR
(30)  END WHILE
(31) END FOR
(32) RETURN OA_one;

The complexity of Algorithm 2 is related to that of the alignment transition system and the given trace, which is a NP-hard problem. In Algorithm 2, each node stores the prefix alignment. When calculating the optimal alignment, it must meet the two following criteria: one is that the last visited state of the alignment transition system must be the final state; the other is that the projection of the eventual prefix alignment onto the first column is the given trace. Most of the nodes generated in the search process can be discarded, only the current node needs to be stored, so the storage space of this algorithm is greatly saved.

According to Algorithm 2, taking alignment transition system ATS1 in Figure 3 and event log L1 in Table 1 as an example, we can get an optimal alignment for each trace in log L1. When visiting the states in alignment transition system ATS1 in Figure 3, the key nodes generated, as shown in Table 2. The prefix alignment of the last node for each trace is its optimal alignment. The search results of each step in Algorithm 2 can ensure that the prefix alignment is with the least cost at the current cost, and it is suitable for the given trace. The eventual results are certain to be the optimal alignments of the given traces due to the current costs and the prefix alignments.

5.2. Computing all Optimal Alignments

The alignment transition system includes all the alignments between all the traces and the model. When calculating the optimal alignments based on the system, we can get more alignments besides the optimal ones if ignoring the constraint of the least cost. In fact, when having found the optimal alignment for the first time, its cost is the minimum value between the given trace and the model based on the standard likelihood cost function. The cost of any other optimal alignments cannot be greater than this value.

In addition, all the nodes whose cost is less than or equal to the cost of the optimal alignment are checked, which can determine whether we get all optimal alignment. However, if the cost of the node is greater than the optimal cost, it will never arrive at the final node which stands for the optimal alignment.

The main idea to compute all optimal alignment is similar to that of Algorithm 2. The successor of the current node can be entered into the queue when it meets the two criteria: one is that the current cost is not greater than the optimal cost; the other is that the projection of the current alignment onto the first column conforms to the prefix of the given trace.

Algorithm 3 is presented to achieve all optimal alignments between all traces in the event log and the process model based on standard likelihood cost function.

Input: event log L1= [σ1, σ2, σ3, …, σn], and alignment transition system TS2 = (S2, M2, T2) between L1 and N1= (P1, T1; F1, α1, mi,1, mf,1).
Output: OA_all[n] (OA_all[i] stores all optimal alignments between the ith trace and the model, where 1 ≤ i ≤ n).
(1) FOR(all σi ∈ L1) DO
(2)  queue ⟵ ∅;
(3)  cost ⟵ +∞;
(4)  firststate ⟵ (mi,1, s0);
(5)  firstalign ⟵ <>;
(6)  firstcost ⟵ 0;
(7)  firstnode ⟵ (firststate, firstalign, firstcost);
//add the initial node to the queue;
(8)  queue ⟵ queue ∪ {firstnode};
(9)  WHILE(queue ≠ ∅) DO
(10)   FOR(all node ∈ queue) Do
(11)    choose node with the minimum cost as curnode;
(12)    queue ⟵ queue − {curnode};
(13)   END FOR
//delete the current node from the queue;
(14)   FOR(all tj ∈ T2) DO
(15)    IF (π1(tj) = π1(curnode)) THEN
(16)     γi = π2(curnode) ⊕ <(π2(tj)>;
(17)     ci = π3(curnode) + lc(π2(tj));
(18)     IF(ci > cost) THEN
(19)      CONTINUE;
(20)     ELSE
(21)      IF is prefix THEN
(22)      IF( AND (π3(tj) ∈ S2end)) THEN
(23)     cost = ci;
//update the minimum cost;
(24)     OA_all[i] ⟵ OA_all[i] ∪ γi;
//obtain an optimal alignment for the ith trace;
(25)     ELSE
(26)      sucstate ⟵ π3(tj);
(27)      sucalign ⟵ γi;
(28)      succost ⟵ ci;
(29)      sucnode ⟵ (sucstate, sucalign, succost);
(30)      queue ⟵ queue ∪ {sucnode};
//add the successor to the queue;
(31)     END IF
(32)    END IF
(33)   END IF
(34)  END IF
(35) END FOR
(36)END WHILE
(37)END FOR
//find all the optimal alignments for trace σi;
(38)RETURN OA_all;

This algorithm can be further optimized. For instance, if the successor is equal to the existing node, we can abort to add the successor and share the existing one. After optimizing the algorithm, the number of the nodes in the queue will be reduced, so the efficiency of the algorithm can be improved.

The execution of Algorithm 3 mentions to the traverse of the alignment transition system, and some states in the system will be visited for several times. Hence, its time complexity is very high. The complexity of Algorithm 3 is still a NP-hard problem.

According to Algorithm 3, taking alignment transition system ATS1 in Figure 3 and event log L1 in Table 1 as an example, we can get all optimal alignments for each trace in log L1. σ1, σ2, and σ4 have just only one optimal alignment. However, trace σ3 has two optimal alignments, as shown in Figure 6.

6. Scalability of Our Approach

In most of the cases, the existing approaches can only compute the alignment between one trace and the model each time. When computing the optimal alignments between a new trace and the model, we must completely execute the alignment approach again and get a different search space. However, the proposed approach in this paper has certain scalability. We just adjust and modify the alignment transition system, and then it can be used for the new trace.

Next, we introduce the possible changes for the alignment transition system when aligning a new trace with the model.

6.1. Remaining Unchanged

When the relations of the activities in the new trace conform to that of the original log-based relation matrix and the last activity is identical to that of some existing traces, the alignment transition system remains unchanged.

Taking process model N1 in Figure 1, event log L1 in Table 1, and alignment transition system ATS1 in Figure 3 as an example, the log-based relation matrix of L1 is LRM1 in (7). We suppose that the new trace is σ5 = <a, a, a, b>, which is different from any trace in L1. And its trace-based transition system is shown in Figure 7.

Transition system TS5 depicted in Figure 7 can be formalized as follows: , S5start = {s0}, ,set(σ5) = {a, b}, and . Compared with LRM1, , S5 − {s0} ⊆ , S5start ⊆ , and S5end ⊆ . And T5 can be depicted by LRM1 as follows: LRM1[s0][sa] = a, LRM1[sa][sa] = a, and . Hence, we can align trace σ5 with model N1 by system ATS1.

We can get an optimal alignment between trace σ5 and model N1 by Algorithm 2 from transition system ATS1 and all the optimal alignments by Algorithm 3. The optimal alignments are corresponding to the paths from the initial node to the final nodes in ATS1, as shown in Table 3.

6.2. Setting Another New Final State

When the relations of the activities in the new trace conform to that of the original log-based relation matrix, but the last activity is different from that of any existing trace, the alignment transition system needs to set another new final state. Supposing that the last activity is x, we must set the state to be another new final state in the alignment transition system.

We suppose the new trace is σ6 = <a, a>, which is different from any trace in L1. And its trace-based transition system is shown in Figure 8.

Transition system TS6 depicted in Figure 8 can be formalized as follows: S6 = {s0, sa},  = {s0},  = {sa}, ∂set(σ6) = {a}, and T6 = {(s0, a, sa), (sa, a, sa)}. Compared with LRM1, S6 ⊆ , S6 − {s0} ⊆ , and  ⊆ . And T6 can be depicted by LRM1 as follows: LRM1[s0][sa] = a and LRM1[sa][sa] = a. But  ⊄ . Hence, we must set sa to be another new final state in LRM1 as follows:

Accordingly, we set the state to be another new final state in the alignment transition system. Supposing that the modified system is named as , we can align trace σ6 with model N1 by system . We can get an optimal alignment between trace σ6 and model N1 by Algorithm 2 from transition system and all the optimal alignments by Algorithm 3. The optimal alignments are corresponding to the paths from the initial node to the final nodes in , as shown in Table 4.

6.3. Adding New Transitions

When the relations of the activities in the new trace cannot be found in the original log-based relation matrix, the alignment transition system needs to add new transitions. We suppose that the new relation is , then we must add new transitions to the alignment transition system. The new transitions include {((mk, sx), (y, >>), (mk, sy))|mk ∈ R(mi) ∧ (mk, sx) ∈ ATS} ∪ {((mk, sx), (y, tj), (mk+1, sy))|mk ∈ R(mi) ∧ (mk, sx) ∈ ATS ∧ mk[tj > mk+1 ∧ α(tj) = y}.

We suppose the new trace is σ7 = <b, a, b>, which is different from any trace in L1. And its trace-based transition system is shown in Figure 9.

Transition system TS7 depicted in Figure 9 can be formalized as follows: , , ,set(σ7) = {a, b}, and . Compared with LRM1, S7 ⊆ , S7 − {s0} ⊆ ,  ⊆ , and  ⊆ . And T7 can partly be depicted by LRM1 as follows: and . But transition in T7 cannot be expressed in LRM1. Firstly, we set as follows:

Then, we add new transitions to ATS1, including and . Supposing that the modified system is named as ATS12, we can align trace σ7 with model N1 by system . The adding transitions to ATS1 are shown in Figure 10.

We can get an optimal alignment between trace σ7 and model N1 by Algorithm 2 from transition system ATS12 and all the optimal alignments by Algorithm 3. The optimal alignments are corresponding to the paths from the initial node to the final nodes in , as shown in Table 5.

6.4. Adding New States and Transitions

When the activities in the new trace cannot be found in the event log, the alignment transition system needs to add new states and transitions. Supposing that the activity is x, we must add several new states and transitions to the alignment transition system. The new states are as follows: {(mk, sx)|mk ∈ R(mi)}. The pattern that the new transitions are added is similar to the example in Section 6.3. We suppose the new trace is σ8 = <a, c, b>, which is different from any trace in L1. And its trace-based transition system is shown in Figure 11.

Transition system TS8 depicted in Figure 11 can be formalized as follows: ,  = {s0}, ,set(σ8) = {a, b, c}, and . Compared with LRM1,  ⊆  and  ⊆ . But, S8 ⊄  and S8 − {s0} ⊄ . T8 can partly be depicted by LRM1 as follows: LRM1[s0][sa] = a. But transitions (sa, c, sc) and in T8 cannot be expressed in LRM1. Firstly, we set LRM1[sa][sc] = c and , as follows:

Then, we add new states to ATS1, including {(mk, sc)|mk ∈ {[p1], [p2], [p3], [p4]}}. Next, we add new transitions, including {. We suppose that the modified system is named as ATS13, and the adding states and transitions to ATS1 are shown in Figure 12.

We can align trace σ8 with model N1 by system . We can get an optimal alignment between trace σ8 and model N1 by Algorithm 2 from transition system and all the optimal alignments by Algorithm 3. The optimal alignments are corresponding to the paths from the initial node to the final nodes in , as shown in Table 6.

No matter what activities the new trace contains and how the relations between them are, it can be decomposed into one of the four cases mentioned above. Then, we can deal with the alignment transition system according to the relation between the new trace and the log-based relation matrix. Eventually, the new system includes the alignments between the new trace and the model.

When a new trace not in the original event log is aligned with the process model, our approach does not need to generate a new search space completely but only needs to expand the existing search space appropriately. Hence, our approach has better scalability and applicability.

7. Case Studies

The greatest advantage of our approach is to deal with all traces in the event log at a time. It is more efficient than the approaches that can only deal with one trace every time. We integrate the following relations between events via the log-based relation matrix, so we can obtain all the alignments between batch traces and the model by an alignment transition system. In order to express the superiority of our approach, we compare it with Alignment-One approach. Alignment-One approach considers that the event log just has a trace and computes the alignments between a trace and the model via Algorithms 13. Taking a relatively complex process model for an example, this section provides some analysis results of our promoted approach. Compared with the results of Alignment-One approach, the superiority of our approach can be illustrated.

Both Alignment-One approach and our approach can be divided two steps: one is the generation of the search space; the other is the search for the optimal alignments. The scale of the search space determines the complexity of the search work to a certain extent. So, the complexity of the search space is critical for the performance of the alignment approaches. In this paper, it is deeply studied that the number and the size of the search spaces are generated during the execution of the alignment approaches, which indirectly illustrate their complexity. Hence, the case studies focus on the number and size of the search spaces. Through the research, the superiority of our approach is verified.

In order to enhance the safety of the transportation of the coal mine, a distributed control system for the inclined shaft of the coal mine is introduced [32]. Petri nets are adopted to build the model for this system, as shown in Figure 13.

The model has a place named as p1 that represents the start of the workflow and a place named as p14 that represents the end. Any place or transition in the model is on a path from p1 to p14. Moreover, the model has the properties, such as security, option to complete, proper completion, and no dead transitions. Hence, the model is considered to be a sound Petri net.

The actual meanings of places and transitions in Figure 13 are shown in Tables 7 and 8, respectively. In addition, each transition is mapped to an activity, as shown in Table 8. The name of each activity is a character, which is suitable for the representation of the control flow, i.e., trace.

Each event log is composed of some activity sequences generated randomly from the process model, and then we make several man-made noises for these sequences. The event logs are shown in Table 9.

The process model generates completely fit traces with different lengths, each of which contains about 6 to 15 activities. Noises are created by randomly deleting the activities from or adding them to the traces. However, when adding an activity, no activity beyond the given activity set appears. In this example, each activity in the traces is an element of the set A = {a, b, c, d, e, f, , h, i, j, k}. Then, based on the standard likelihood cost function, the optimal alignments between all traces and the model are calculated. The number and the size of the search space are counted to compare Alignment-One approach with our approach.

This instance contains four event logs, and each one contains five different traces. The average lengths of five traces in the four event logs are 6, 9, 12, and 15, respectively. The comparison results between Alignment-One approach and our approach are shown in Table 10.

According to Table 10, when an event log has five different traces, five alignment transition systems need to be established to get the optimal alignments between the traces in the log and the process model. However, the proposed approach in this paper needs only to establish an alignment transition system.

In Alignment-One approach, the number of states of the alignment transition system is related to not only the average length of the trace but also the number of the reachable states of the model. There is only one model in this instance, so the number of the reachable states is unchanged. When the average lengths of the traces in the log increase, the number of states of alignment transition systems obtained by Alignment-One approach increase linearly, which are proportional to the average lengths of the traces.

In our approach, the number of states of an alignment transition system is determined by that of different activities in the event log and that of reachable states of the process model. When the average lengths of the traces in the log are short, the number of states of the alignment transition systems in our approach is greater than that of Alignment-One approach. However, when the average lengths of the traces are greater than |A| (A is the set of activities, and |A| is the length of A.), the number of states of the alignment transition systems in our approach is a constant and its value is much less than that of Alignment-One approach. As shown in Table 10, when the average lengths of the traces are 12 and 15, the number of states of the alignment transition systems in our approach is fixed at 144. Even if the average lengths of the traces continue to increase, the number of states will remain 144 as long as each activity in the traces is the member of set A.

According to the abovementioned analysis, when there are m different traces in the event log, our approach only needs to compute an alignment transition system, while Alignment-One approach needs to compute m ones. Generally, our approach yields much fewer states than Alignment-One approach.

The comparison results show that our approach generates much smaller search spaces than Alignment-One approach. Hence, our approach outperforms Alignment-One approach.

8. Conclusions

As more and more event logs are recorded in enterprise organizations, conformance checking between event logs and process models plays an increasingly important role in process mining. At present, as a significant technique of conformance checking, alignment is widely used in process discovery, precision checking, and process enhancement. Alignment can accurately locate the deviations and measure the fitness between the observed and modeled behaviors. Most of the existing alignment approaches can only compute an optimal alignment between the trace and the model, or even just a suboptimal alignment.

In order to solve the problems that the state space only includes the alignments between one trace and the model, this paper proposes a business alignment approach based on the transition system between relation matrices and Petri nets. This approach can generate an alignment transition system, which includes the alignments between all traces in the log and the model. No matter how many traces are involved in the log, this approach only needs to generate one alignment transition system. According to the search results of prefix alignments in the system, two algorithms are proposed to find an optimal alignment and all optimal alignments between all traces in the log and the model based on the given cost function, respectively.

The proposed approach in this paper effectively solves problems such as low efficiency and high memory occupancy. It improves the efficiency of calculating optimal alignments. The alignments between all traces in the log and the model are embodied in the alignment transition system. Our approach simplifies the search space which includes all the optimal alignments. In the future work, we will further study the relations between activities in the log so that the more reasonable relation matrices will be established. On these grounds, the more efficient alignment approaches can be proposed.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the Natural Science Foundation of China under Grant 61973180, Taishan Scholar Construction Project of Shandong Province, Key Research and Development Program of Shandong Province under Grant 2018GGX101011, and Natural Science Foundation of Shandong Province under Grants ZR2018MF001 and ZR2019MF033.