Abstract

Continuous subgraph matching problem on dynamic graph has become a popular research topic in the field of graph analysis, which has a wide range of applications including information retrieval and community detection. Specifically, given a query graph , an initial graph , and a graph update stream , the problem of continuous subgraph matching is to sequentially conduct all possible isomorphic subgraphs covering of on (=). Since knowledge graph is a directed labeled multigraph having multiple edges between a pair of vertices, it brings new challenges for the problem focusing on dynamic knowledge graph. One challenge is that the multigraph characteristic of knowledge graph intensifies the complexity of candidate calculation, which is the combination of complex topological and attributed structures. Another challenge is that the isomorphic subgraphs covering a given region are conducted on a huge search space of seed candidates, which causes a lot of time consumption for searching the unpromising candidates. To address these challenges, a method of subgraph-indexed sequential subdivision is proposed to accelerating the continuous subgraph matching on dynamic knowledge graph. Firstly, a flow graph index is proposed to arrange the search space of seed candidates in topological knowledge graph and an adjacent index is designed to accelerate the identification of candidate activation states in attributed knowledge graph. Secondly, the sequential subdivision of flow graph index and the transition state model are employed to incrementally conduct subgraph matching and maintain the regional influence of changed candidates, respectively. Finally, extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.

1. Introduction

The problem of subgraph matching is one fundamental issue in graph search, which is NP-Complete problem [1]. Specifically, given a query graph and a large data graph , the problem of subgraph matching is to extract all isomorphic subgraphs of on . In real world, data is usually emerged as a streamlined feature in social networks, which is formed as a graph stream. Recently, continuous subgraph matching on dynamic graph has become a popular research topic in the field of graph analysis, which has a wide range of applications including query answering [2], information retrieval [3, 4], and community detection [5, 6]. Specifically, given a query graph , an initial graph , and a graph update stream , the problem of continuous subgraph matching is to sequentially conduct all possible isomorphic subgraphs covering of on (=). In this paper, we study the continuous subgraph matching on a special graph structure of knowledge graph (KG-CSM).

Despite the complex multigraph characteristic of knowledge graph and the polynomial-time complexity of continuous subgraph matching [1], recent existing research studies have made significant advances in developing computational paradigm of KG-CSM.

One aspect is to storing and indexing RDF triple data based on relational approaches. Weiss et al. [7] and PĆ©rez et al. [8] employed an index-based solution to storing triples directly in an index of -tree over multiple redundant permutations. Abadi et al. [9] vertically partitioned the RDF triples into a set of tables bounded by the labels of patterns and used an index structure on top of it to locate the required tables. Broekstra et al. [10] were based on the idea of graph database and abstract concepts of RDF triples with multiple properties. The same pattern matching strategy was used to provide a pattern selectivity approach, which can determine the search space for data tables. This strategy used a tree-pattern structure to filter RDF data into tables, which stored partial operated data units. Then, the partial operated data units were incrementally joined by searching the tree-pattern structure. However, relational approaches result in extensive indexing and data preprocessing because the approaches are coupled with sophisticated statistics and highly joining depth and query-optimization techniques.

Another aspect is to resolving the recalculations of matches with the aid of intermediate results. The incremental solutions have been employed in a variety of applications [11ā€“13]. The solutions aim at the incremental strategies for generating results without incurring the expensive cost of recalculated data resources. However, most incremental methods are approximate algorithms based on relaxed graph simulations and only work for small numbers of graphs. And the incremental solutions are hard to be presented in the context of KG-CSM because of the inherent complexity and large-scale nature of knowledge multigraph structure.

1.1. Challenge 1: Multigraph Characteristic of Knowledge Graph Intensifies the Complexity of Candidate Calculation

Knowledge graph is a directed labeled multigraph having multiple edges between a pair of vertices, each vertex represents an entity with attributes and each edge denotes an interentity relationship. Considering the model of knowledge multigraph in Figure 1, it is composed of attributed and topological structures. The attributed structure describes the attribute and type of entity, where attribute is taken as the label of edge coupled with a value and type is taken as the label of entity. The topological structure describes the relationship between a pair of entities and some relationships are coexistent, e.g., partnerships and couple relationship between persons. The multigraph characteristic of knowledge graph leads to a more dense adjacent structure than general graph, and it brings a new challenge to the research of KG-CSM problem. Furthermore, KG-CSM problem still contains the traditional challenge on general graph.

1.2. Challenge 2: Subgraph Isomorphic Mappings Covering a Given Region Are Conducted on a Huge Search Space of Seed Candidates

The traditional challenge on general graph is that the isomorphic subgraphs covering a given region are conducted on a huge search space of seed candidates, which causes a lot of time consumption for searching the unpromising candidate. Considering query graph and data graph in Figure 2, an edge () is inserted into . An isomorphic subgraph is defined as a subgraph isomorphic mapping and conducted as (, , , , and ). The basic strategy is to search the global space of without the reduction of unpromising vertices , , , , and .

1.3. Contributions

Three empirical studies motivate us to develop an efficient subgraph matching method on dynamic knowledge graph. The first empirical study demonstrated [5, 14] that the tree-based index can reduce the noncandidates of dynamic graph by the influenced analysis of anchored and followed relationships. The second empirical study [15] demonstrated that the sequential technology can effectively limit the search space of graph update stream. The third empirical study [16] was our prior research of subgraph index on static knowledge graph, which demonstrated that the subgraph index can effectively accelerate the subgraph matching on static knowledge graph. In this paper, we propose a method of subgraph index-based sequential subdivision to accelerating the continuous subgraph matching on dynamic knowledge graph. Our contributions are described as follows:(1)We develop a flow graph index to pruning the noncandidates of query vertices on topological knowledge graph. The flow graph index is defined as a flow graph (FG), which is a directed multigraph, constructed from the initial data graph and guided by a matching order of query graph. Each vertex of FG denotes a candidate of one query vertex, which is taken as the label of candidate. Each edge of FG corresponds to the relationships of nodes in the matching tree of query graph. The flow graph index can effectively reduce the scale of original data graph.(2)We design an adjacent index to accelerate the identification of candidate activation states on attributed knowledge graph. The three benefits are discovered from our adjacent index. The first benefit is that the adjacent index can improve the time-efficiency of comparison of the inclusion relationships of node pair. The second benefit is that the adjacent index can quickly verify the transformed state of seed candidate as graph update stream is incrementally inserted. The third benefit is that the adjacent index can quickly search the adjacent candidate region.(3)We propose a sequential subdivision technology of the flow graph to limit the search derivation of graph update stream. The sequential numbers of root candidates are assigned to the vertices of subdivided flow graphs and limit the search space of originating changed candidate of FG.(4)We design a state transition model to describe the transition states of changed candidates, which consists of three states and six transition rules. Based on the state transition model, we analyze the influence of changed candidates to the adjacent region and design our incremental maintenance strategy.(5)We design an incremental subgraph matching algorithm based on the sequential subdivided flow graph. The consistency of subgraph matching is guaranteed by two verifications of selected candidates, relational verification and sequential verification. The relational and sequential verifications are used to verify the local isomorphism and the equivalence of sequential numbers between one local subgraph mapping and selected candidates, respectively. The isomorphic subgraphs are incrementally and effectively conducted with the aid of relational and sequential verifications.

Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.

The rest of this paper is organized as follows. Section 2 introduces the preliminaries about problem definitions and related works. Section 3 provides a flow graph index of knowledge graph, including the definition and construction of the flow graph index. Section 4 presents an incremental subgraph matching on the flow graph, including the sequential division technology, incremental maintenance, and incremental subgraph matching on graph update stream. Experimental results are reported in Section 5. A conclusion is given in Section 6.

2. Preliminaries

In this section, the definitions of knowledge graph and subgraph matching are first given. Then, the related research studies are introduced.

2.1. Problem definition

Knowledge graph KG is a directed labeled multigraph having multiple edges between a pair of vertices. The labels of KG are extracted from RDF information. Resource Description Framework (RDF) [17] is a standard semantic model designed by W3C group 2, which is represented by a set of triples . Each triple consists of three components: a subject, a predicate, and an object. Furthermore, a triple is formed as , where denotes an IRI (Internationalized Resource Identifier) and represents a literal. Through the extension of RDF triple model with timestamp, the model can be used to represent RDF stream, denoted as (: ) [18]. Here, is an RDF triple and is a timestamp.

The labels of KG are classified as instance-label, relation-label, attribute-label, and type-label according to the resource and inter-resource relationship of RDF data. Considering an RDF triple , is named as type-label if and only if both and are IRIs and is a typed predicate, e.g., rdf:type and rdf:subclassof. and are called instance-label and is named as relation-label if and only if both and are IRIs and is not a typed predicate, and is called as attribute-label if and only if is a literal.

Definition 1. (knowledge graph). A knowledge graph is a directed labeled multigraph, formed as . Here, is a set of vertices, is a set of directed edges, is a labeling function, assigns type-labels and attribute-labels to vertices, and assigns relation-labels to edges.
In a KG, each vertex can be assigned by multiple labels and each edge also can be assigned by multiple labels. Considering a vertex and an edge of KG in Figure 3(b), type-labels or attribute-labels and are assigned to and relation-labels and are assigned to and relation-label is assigned to . It can be found that KG has a denser intervertex relation than the general graph.

2.1.1. Subgraph Matching

The problem of subgraph matching is to search all possible subgraphs of data graph that are isomorphic to query graph . The subgraph matching is formally defined as a problem of subgraph isomorphism, described in Definition 2.

Definition 2. (subgraph isomorphism). Given a data graph (, , ) and a query graph (, , ), is subgraph isomorphic to if and only if there exists a bijective mapping from to such that , : and , (, ) : .
A query graph is subgraph isomorphic to a data graph if there exists a subgraph isomorphic mapping (subgraph mapping for short) of on . Simply, considering the labeled query graph and data graph in Figures 3(a) and 3(b), respectively, is a set of vertex-labels. is subgraph isomorphic to since there exist subgraph isomorphic mappings and .
Similar to the subgraph isomorphism on the static graph, the subgraph isomorphism on the dynamic graph is extended with a graph update stream. The problem definition of continuous subgraph matching problem is denoted in Definition 3.

Definition 3. (KG-CSM). Given a query multigraph , an initial data multigraph , and a multigraph update stream , the continuous subgraph matching problem identifies all positive/negative subgraph mappings for each update edge in .
In this paper, our research of KG-CSM problem focuses on constructing an effective lightweight index to arrange the search space of seed candidates. Then, a region-limited technology is used to constrain the derivation of search spatial scale of graph update stream.
In this paper, we focus on a directed labeled graph . Here, is a set of vertices, is a set of edges, and () is a labeling function which assign a label or multiple labels to vertex and edge. Both and are directed labeled graphs, and the directed or undirected edges cannot affect the execution scheduling of subgraph matching. The detailed notations and meanings are described in Table 1.

2.2. Related Works

In this section, we mainly review the related works on index and subgraph matching algorithms of knowledge graph and general graph and then outline their limitations.

2.2.1. Subgraph Matching of Knowledge Graph

The storage structure of RDF data should be introduced before discussing the index of the knowledge graph. The knowledge graph is modeled by Resource Description Framework (RDF), which is a standard semantic data model designed by W3C group. The storage structure of RDF data generally accepted by research studies mainly includes relational store. The relational store is involved into many systems, i.e., SW-store [9], Sesame [10], Jena [19], RDF-3X [20], etc. The relational approach can be classified as vertical representation and horizontal representation.

In the vertical representation approach, RDF data is conceptually stored in a single table over the relational schema. Due to the large size of RDF data and the potential large number of self-joins required to answer queries, it must be taken to devise an efficient physical layout with suitable indexes to support query answering. However, there are a mount of overlapped copies of RDF triples. For avoiding storing multiple copies of RDF triples, there are many triple stores [21, 22] that aggressively store the triple table in multiple sorted orders. A clustered B-tree is constructed and the desired triple ordering is available in the leaves of B-Tree. However, the approaches of B-Tree are more demanding in terms of storage space because the effective query answering should support the availability of various sorted recording for fast merge joins.

In the horizontal representation approach, RDF data is stored in one or more wide tables by interpreting predicate as column name. To minimize the storage overhead caused by empty cells, property table approaches [23, 24] were proposed and concentrated on dividing the wide table in multiple smaller tables containing related predicates. However, this approach actually creates many small tables, which are harmful on query evaluation performance. Specially, a vertically partitioned approach was proposed and the decomposition was taken to its extreme. Both the issues of empty cell and the multiple objects are solved at the same time. Abadi et al. [9] and Sidirourgos et al. [25] noted that the performance of this approach is best when sorting the binary tables lexicographically to allow fast joins.

2.2.2. Subgraph Matching on RDF Stream

Most studies on index and pattern matching of streaming RDF data are designed to leverage existing solutions for nonstreaming RDF data. The standardization of streaming RDF data is still an ongoing debate, and W3C RSP community group3 is an important initiative. These studies utilized various custom query languages that are extended from SPARQL to answer the queries. Additionally, these studies still followed a relational approach to storing and indexing RDF data.

For instance, C-SPARQL [3] used the underlying Jena architecture to store and index triples within property tables, whereas CQELS employed index-based solutions by storing triples directly in -trees over multiple redundant permutations and used Eddy operators and query optimizations to index triples. SparkWave [4] used the RETE network to determine the set of triggers to fire when a new triple arrives, and it materialized intermediate results to reduce the amount of work that is required for each update. Those systems stored RDF data in relational tables and process queries using relational operators, such as scan and join operators.

However, the relational stores need too many join operations for evaluating the queries, especially those queries having complex and large graph patterns. Meanwhile, the index-based relational approaches represent progress towards the more dynamic environments by allowing continuous monitoring and periodically evaluating the index design. The majority of these approaches employed the re-evaluation strategies for optimizing execution plan, which requires an incremental indexing technique to maintain intermediate results automatically and incrementally.

The majority of RDF Stream Processing (RSP) systems are based on a recalculated model. The recalculation of matches can result in unnecessary utilization of computational resources once the data are updated within a window. For instance, Eddy operators [26] that were employed by CQELS [18] resulted in expensive computations and continuous usage of resources to explore all plans, thus requiring a fully pipelined execution for RDF streams. Furthermore, caching the statistical measure of triples and choosing the correct order for every triple update causes considerable overhead.

A prominent contribution for RDF stream is the study entitled SPECTRA [27], in which a set of vertically partitioned views was used to collect the summarized data from each event and sibling lists were employed to incrementally index the joined triples between views. The matched results were shown in a set of final views, thus enabling an incremental evaluation with the arrival of new events. Although the combination of RSP and incremental algorithms improve the execution time efficiency for streaming RDF data, however, the relational approach with a higher joined depth and a greater focus on independent events was presented by SPECTRA, which provides a motivation for our study. We argue that the incremental evaluations can greatly reduce the computation tasks and improve the execution performance.

2.2.3. Subgraph Matching on General Graph

The problem of pattern matching for RDF graph is similar as the problem of subgraph isomorphism for the general graph. Ullmann [28] proposed a backtracking algorithm that significantly reduces the size of search space. VF2 [29] was a well-known state-of-the-art algorithm, which proposed a state space representation to deal with different exact graph matching problem: each state is a partial mapping between two given graphs, while goal states are complete mappings consistent with the problem constraints. Hence, the search space was explored through a depth-first strategy with backtracking, which is driven by a set of feasibility rules to prune unfruitful search paths. SPath [30] implemented path-at-a-time pattern during the searching process. It decomposed the query graph into several paths and found the embeddings of each path which would be joined later. TurboISO [31] and BoostISO [32] tried to find greater matching order to make subgraph matching more efficient, where the graph-compressed method is implemented to reduce the space complexity. For reducing the duplicate adjacent candidates, a compact path index was proposed by CFLMatch [33]. The compact path index is a multipath index induced by a spanning tree of query graph, and it is composed of multiple clusters and intercluster relations. Each cluster collects the candidates of one query vertex. A data-centric path index was proposed by TurboFlux [13], which can further eliminate the storage of duplicate candidates. The data-centric path index attaches query vertices as a label set to data vertices. TurboFlux employed a data-centric path index to accelerate the continuous subgraph matching on dynamic graph. However, the algorithms of subgraph isomorphism hardly migrate to the problem of patterning matching on RDF graph, which was proved by [34], due to the unsymmetrical structural characteristic of RDF graph essentially.

The indexing and machine learning technologies employ semantic and structural characteristics to enhance the semantic equivalence of KG. R-tree [35] was a pivot-based hierarchical indexing structure to integrate spatial and semantic information in a seamless way, which used a space mechanism to transform the high-dimensional semantic vectors to a low-dimensional space. A predictive model of future star was proposed by FS-ELM [36], which studied a rising star evaluation by exploiting social topology characteristics and user behavior patterns in geo-social networks. MTLM [37] proposed a multitask learning model for traversal time estimation, which first recommended the appropriate transportation mode for users and then estimated the related traversal time of path in tree pattern. RQL [38] designed a reinforcement learning-based algorithm for the dynamic bipartite graph matching problem, which made near-optimal decisions on batch splitting with a constant competitive ratio. Gao et al. [39] proposed a novel framework to achieve the privacy-preserving subgraph pattern matching in cloud. The framework used a label-generated privacy model to protect and label the potential privacy in both data graphs and pattern graphs.

2.2.4. Subgraph Matching on Dynamic Graph

A dynamic graph is modeled as a graph, whose edges are activated by sequences of time-dependent elements. Wang et al. [6] discussed the definition and topological structure of time-dependent graphs, as well as models for their relationship to dynamic systems. In addition, they reviewed some classic problems on time-dependent graphs and studied the weight-constrained route planning problem over a large time-dependent graph coupled with continuous time and weight functions [40]. Choudhury et al. [41] provided a subgraph selectivity approach to determine subgraph search strategies and used a subgraph tree structure to decompose the query graph into smaller subgraphs, which are responsible for storing partial results. However, retaining and querying thousands of edges within a large window requires considerable amount of space and computational resources. Moreover, this approach only supports simple path-based queries, and it is optimized for homogeneous graphs using an edge stream model. Fan et al. [12] presented algorithms for graph pattern matching over evolving graphs by employing a repeated search strategy to calculate matches until a fixed point reached with each graph is updated and removed. However, the repeated search strategy can enlarge the time consumption of subgraph matching.

The more related technologies employed semantic and structural characteristics to improve the performance of dynamic problem. INC-GPM [42] built an index to incrementally record the shortest path length range between different label types and then identified the affected parts of graph update stream. DCSGR [43] exploited the connections between group users in community detection and proposed an aggregation function to integrate the recommended media lists of all interest subgroups as the final group recommendation results.

3. Flow Graph Index of Knowledge Graph

In this section, a flow graph index (FG) of the knowledge graph is proposed to arrange the search space of seed candidates. Before the introduction of FG, our solution for KG-CSM problem is first given to clarify the core role of FG in our algorithm (Algorithm 1).

ā€‰Input: a query graph , an initial data graph and a graph update stream
ā€‰Output: the set of all subgraph mappings of in
(1)-Generation ();
(2)Ifthen
(3)ā€ƒ
(4)else
(5)ā€ƒ incrementalMaintenance
(6)IncrementalMatching ( );
(7)Return;

A pseudocode of continuous subgraph matching is described in Algorithm 1, named as incremental pattern matching algorithm (iPM). A matching tree orchestrates a matching order to iteratively conduct subgraph mappings (Line 1). In this paper, we employ the matching order generated by a depth-first traversal without considering the calculated paradigms of near-optimal matching order because we are committed to the incremental calculated paradigms of graph update stream. The flow graph index FG of the knowledge graph is constructed by sequential and mapping relationships of on (Line 3 and Section 3.2) and incrementally maintained by a graph update stream (Line 5 and Section 4.1). Then, all subgraph mappings covering are directly conducted by the iterative traversal on FG (Line 6 and Section 4.2).

The core role of FG in our algorithm consists of three parts, described as follows. The first part is the initial construction of FG, defined as , which is guided by a matching tree on the initial data graph . The second part is the maintenance of adapting to graph update stream , defined as . The third part is the incremental matching of in the adaptive matching order . Thus, FG is the core role of our designed approach, introduced in Section 3.1.

3.1. Data Index of Knowledge Graph

Knowledge graph is a directed labeled multigraph, which is the combination of complex topological and attributed structures. The data index of knowledge graph is composed of flow graph index and adjacent index. The flow graph index is constructed from the topological structure of knowledge graph, which is used to arrange the search space of seed candidates.

3.1.1. Flow Graph Index of Topological Knowledge Graph

The flow graph index is defined as flow graph (FG), which is designed to arrange the binary relationships between a pair of data vertices in . The binary relationship of FG follows the parent-child relationship of spanning tree of . We divide the edges of into tree edge and nontree edge according to the parent-child relationship of spanning tree of . A spanning tree containing both tree edges and nontree edges is called as matching tree, formed as . Considering a query graph in Figure 2(a), a matching tree is described in Figure 4(a), which is ordered by a depth-first traversal on . The solid line denotes the tree edge and dotted line indicates the nontree edge. Regarding a tree edge (, ), is a parent of , described as . A flow graph is constructed in the guide of matching tree, described in Definition 4.

Definition 4. (flow graph). A flow graph is a directed labeled multigraph, formed as (, , ). Here, is a set of vertices, is a set of edges, and is a labeling function that assigns one or multiple labels to vertices.
Here, each vertex of refers to a query-data vertex pair (node pair for short) of on . Regarding a node pair , satisfying and , then is a vertex labeled by in FG. Each edge of indicates the tree edge or the nontree edge similar as the matching tree. Regarding node pairs and , satisfying ā€‰ā€‰ and is a neighbor of , then is a parent of , formed as . Considering the data graph in Figure 2(b) and the matching tree in Figure 4(a), a flow graph is described in Figure 4(b). Regarding vertices and , which are labeled by and , respectively, satisfying , because and is a neighbor of .
The unconstrained quantity of node pairs may cause a huge space scale of vertices in FG. Considering a query graph of size and a data graph of size , the quantity of node pairs are calculated as . The two strategies are used to solving the unconstrained quantity of node pairs. One strategy is to employ a labeling function that assigns multiple labels to vertices, which avoids the repeated storage of vertices in FG. Another strategy is to design the constraint rules of node pairs. The constraint rules are denoted in the definition of candidate verification, as described in Definition 5.

Definition 5. (candidate verification). Given a node pair , data vertex is the candidate of query vertex if and only if it satisfies the following constraints: (1) , (2) , : , and (3) .
Here, and denote the labeling functions of vertex and edge, respectively. The constraints of candidate verification can effectively reduce the scale of node pairs. A node pair is deleted if it does not satisfy constraint (1). Furthermore, we divide the node pair as positive and negative node pair according the relax and strict constraints. Considering a node pair np:, np is a negative node pair if and only if it satisfies constraint (1), and np is a positive node pair if and only if it satisfies constraints (1), (2), and (3). Considering the flow graph in Figure 4(b), solid cycle denotes the positive candidate and dotted cycle indicates the negative candidate.
The negative candidate may be changed as a positive one when graph update stream is inserted into FG. To intuitively express the transformed state, a node pair state is defined to denoting the active and silent states of node pairs, formed as State (). Node pair np: satisfies the relax and strict constraints, formed as State (np) and State (np) , respectively, then np is encapsulated into a labeled vertex in FG, otherwise it is pruned.
A node pair state is composed of candidate state CS and following state FS, denoted as State (np)ā€‰ā€‰CS (np) FS (np). The candidate state CS describes the negative and positive node pairs. A node pair np is positive if CS (np) . The following state FS is used to describe the candidate states of followers. We define the descendants of query vertex as Des (). Given a node pair np: , a candidate of Des () is the follower of np if it is reachable from , then is named as the dominator of . Regarding a node pair np: , FS (np) if it satisfies the condition Des (), Des (np): CS () .
The node pair state, candidate, and following states of node pair np: can be abbreviated as State (), CS (), and FS (), which are denoted by the common query vertex .

3.1.2. Adjacent Index of Attributed Knowledge Graph

In this paper, we focus on the problem of continuous subgraph matching on a special knowledge graph. Knowledge graph (KG) is a directed labeled multigraph having multiple edges between a pair of vertices. The labels of knowledge graph can be classified as type label and attribute label. Actually, the vertex of KG can be coupled with one or multiple labels.

To deal with the challenge of multigraph characteristic, the adjacent indexes of query and data vertices are proposed to accelerating the time-efficiency of candidate verification between initial data graph and graph update stream. Considering query and data multigraphs in Figure 3, the adjacent indexes of query vertex and data vertex are described in Table 2. Here, AL, OEL, and IEL denote the labels of neighbors, inner edge, and outer edge of query or data vertex, respectively.

The first benefit is that adjacent indexes can improve the calculated time efficiency of the inclusion relationships of node pair. Considering the common adjacent label B of and , the counts can be used to quickly calculate the inclusion relationships of and with O (1) time complexity. The adjacent label B of is included by the one candidate of if the adjacent label count of is not less than the count of . Thus, the verification of inclusion relationship of node pair is O (n) time-complexity, where is the count of unique number about adjacent vertex, inner and outer edge labels.

The second benefit is that adjacent indexes can quickly verify the transformed state of node pair as graph update stream is inserted. The candidate state of node pair can be calculated as the intersection operation of the three label index, formed as CS (np)ā€‰ā€‰AL (np) OEL (np) IEL (np). The candidate state of node pair is verified with O (1) time-complexity if one label index of node pair is changed.

The third benefit is that adjacent indexes can quickly search the adjacent candidate region. Through the verification of common adjacent label B of and , it can be found that and are the negative candidates of , that satisfy constraint (1) in Definition 5. Furthermore, the final adjacent candidate region can be reduced through the intersection operation of common neighbor of different unique labels.

3.1.3. Time and Space Complexity of Knowledge Graph Index

The data index of knowledge graph is composed of flow graph index and adjacent index.

The time and space of flow graph index is described as follows. The worst-case space-complexity of FG is . The first reason is that the size of vertices in FG is at most when each vertex is the candidate data of one query vertex. The second reason is that the size of edges in FG is at most , when each vertex pair has an edge relationship coupled with two tree edges and two nontree edges. The third reason is that the size of edge labels in FG is at most when each tree-edge is assigned by all query vertices. Actually, the tree edge-labels can be encoded by a -bit string.

The worst-case time complexity is about insertion and deletion of one vertex on FG. Regarding vertex , if has the worst-case time complexity, it should satisfy that is the common candidate of all query vertices in , is connected with all other vertices in , and .

The time and space of the adjacent index is described as follows. The space complexity is equivalent to the one of doubly linked list. The worst-case time complexity is about the insertion and deletion of one vertex on FG if the vertex is connected with all other vertices in and .

In this paper, our research of KG-CSM problem focuses on constructing an effective lightweight index to search space of seed candidates. Through the analysis of time and space complexities, our knowledge graph index is a linear consumption and it is beneficial to indexing the single lager-scale data graph.

3.2. Construction of Flow Graph

In this section, our construction algorithm of the flow graph is introduced in Algorithm 2. The inputs are an initial data graph , a query graph , and its matching tree . The output is the subgraph index of flow graph . A matching tree orchestrates a matching order (, , , ) and the parent-child relationships of query vertices in . The construction algorithm of the flow graph contains three modules (Algorithm 2).

ā€‰Input: a matching tree , a query graph and a data graph
ā€‰Output: the flow graph
(1), ,
(2)FG;
(3)for and
(4)ā€ƒif NegCandVerify ( ) then. CS (), ,
(5)ā€ƒif PosCandVerify ( ) then. CS (), ,
(6)ā€ƒif PosCandVerify ( ) and is leaf then. State ()
(7)ReSet as unvisited, as visited;
(8)for and from to 0 do
(9)ā€ƒfor and is visited do
(10)ā€ƒā€ƒfor and do
(11)ā€ƒā€ƒā€ƒif and then
(12)ā€ƒā€ƒā€ƒā€ƒ.FS () .State () .FS ()
(13)ā€ƒā€ƒā€ƒā€ƒ.State () .CS () .FS ()
(14)ā€ƒMark as visited;
(15)Set as unvisited, as visited;
(16)for and from to do
(17)ā€ƒfor and is visited do
(18)ā€ƒā€ƒfor and do
(19)ā€ƒā€ƒā€ƒif and then (, )
(20)ā€ƒā€ƒā€ƒif and then (, ) ;
(21)ā€ƒā€ƒMark as visited;
(22)return

The first module is used to verify the candidate state of node pair with the aid of adjacent indexes (Lines 3ā€“6). The following and candidate states of node pair are initialized as 0 and āˆ’1, respectively. Regarding a node pair np: , is negative candidate of if np satisfies constraint (1) in Definition 5, then the candidate state of np is marked as 0 (Line 4). If np satisfies constraints (1), (2), and (3) in Definition 5, is positive candidate of , then the candidate state of np is marked as 1 (Line 5). Considering a node pair np: , satisfying is a leaf node in and is positive candidate of , then the following state of np is marked as 1 (Line 6) because there is not a descendant can be included into leaf nodes. All negative and positive candidates are added into candidate set and vertex set (Lines 4-5), and denotes a set of candidates of query vertex .

The second module is used to verify the node pair state through the calculation of following state in bottom-up matching order (Lines 7ā€“14). Regarding node pairs and , satisfying and , are the positive of and , respectively, then .FS ()ā€‰ā€‰.State () .FS () and .State ().CS () .FS (). .FS () .State () .FS () denotes that .FS () is true once the boolean of one descendant of is true. .State () .State () .FS () refers that .State () is true if and only if both .State () and .FS () are true.

The third module is used to insert the edges to FG in top-down matching order. Considering node pairs and , satisfying and is a candidate of , then (, ) is a tree edge and inserted into FG. Otherwise, (, ) is a nontree edge and inserted into FG. The function (, ) denotes that (, ) is inserted into and the minimum sequential number of is assigned to , which is described clearly in Section 4. The tree edge and nontree edge are used to distinguish the operations of node pair in continuous subgraph matching.

3.2.1. Example for Algorithm 2

Considering the matching tree and flow graph in Figure 4, an example of FG construction algorithm is described in Table 3. The matching order of is orchestrated as a sequence of query vertices , , , , and and the query vertex is marked as 1 if it is visited. In the first module of Algorithm 2, the following and candidate states of node pair are initialized as 0 and -1, respectively. Since is a negative node pair, CS () is marked as 0. The candidate states of other node pairs are marked as 1. In the second module, the node pair state is verified through the calculation of candidate and following states in the bottom-up matching order. Regarding node pair , satisfying , it cannot find a candidate of that is adjacent to , then the following state FS () is marked as 0. In the third module, the edges are inserted into FG and minimum sequential numbers of parent-nodes are assigned to child-nodes in top-down matching order. Regarding node pair , satisfying and , the minimum numbers of , , and are transformed to , thus the sequential number of are 1, 2, and 3. The detailed description of sequential number is described in Section 4.

4. Incremental Subgraph Matching on Flow Graph Index

In this section, a sequential subdivision technology of flow graph is first given to limit the search derivation of graph update stream. Then, the strategies of incremental subgraph matching and incremental maintenance are proposed based on the divided flow graph.

4.1. Sequential Subdivision of Flow Graph Index

The sequential subdivision of flow graph divides a flow graph into multiple flow subgraphs and sequentially encodes the vertices of flow subgraphs. The flow graph is divided on the basic of candidates of root node in , described as FG () and . Here, is an originating node of query vertices in matching order and is named as root candidate of root node in . Considering the flow graph in Figure 4(b), the divided flow subgraphs are denoted in Figure 5(a), described as FG (), FG (), and FG ().

All subgraph mappings can be conducted on the traversal of flow subgraphs, defined in Theorem 1.

Theorem 1. All subgraph mappings of on must be included into one flow subgraph.

Proof. For Theorem 1, regarding a subgraph mapping , it must be found in a flow subgraph FG ().
In the subdivision of flow graph, the root candidates of FG are sequentially arranged to identify the relationship of flow subgraphs through a unique encoding technology. The unique encoding of vertices can effectively avoid the redundant allocation of common vertices of multiple flow subgraphs. Considering the flow graph in Figure 4(b), the sequential encoding on the flow graph is described in Figure 5(a). The sequential number of root node pair is passed and copied to all its negative and positive followers. If a node pair is the common follower of multiple root node pair, only the node pair is marked as the numbers of multiple root node pairs and its followers are not marked repeatedly. Regarding the follower of vertex in Figure 5(b), is not be marked repeatedly in FG ().
A phenomenon of sequential flow subgraph is founded to effectively limit the search derivation of graph update stream, as described in Lemma 1.

Lemma 1. Given an inserted node pair of FG, if can conduct the new subgraph mappings, it should satisfy the condition, Des (), F (): CS () .

Proof. For Lemma 1, considering a subgraph mapping originating from candidate of query vertex , if exists a descendant of , such that it cannot find a positive candidate of , then a subgraph mapping cannot be conducted by node pair of .
Benefit from sequential subdivision of FG, the first aspect is that it can avoid the repeated encoding of vertices in the following region. Regarding the sequential flow subgraphs FG () and FG (), the follower does not to be encoded in FG () because is a follower of that has been encoded with a new number 3.
The second aspect is that it can previously verify the common flow subgraphs of inserted edges. Regarding an inserted edge (, ) in Figure 5(a), the incremental subgraph matching of (, ) does not need to be executed because and are included into different flow subgraphs, where is located into FG () and FG () and is located into FG (). The verification of common flow subgraph of inserted edge is defined in Lemma 2.

Lemma 2. Given vertices and of FG if and are the followers of a common dominator, then and are included into a common flow graph.

Proof. For Lemma 2, according to Theorem 1, all subgraph mappings of on must be included into one of flow subgraphs. Since a flow subgraph is composed of a root candidate and it followers, a flow subgraph at least contains one dominator, that is, root candidate. Thus, vertices and are not included into a subgraph mapping if and have not a common domination vertex.
The third aspect is that the deleted edges can block the dominating relationships of flow subgraphs. Regarding a deleted vertex in Figure 4(a), it cannot find a subgraph mapping consisting of after vertex is deleted. The influence of deleted vertex on dominating relationship is described in Lemma 3.

Lemma 3. Given a deleted vertex of flow subgraph FG (), may block the dominating relationships of FG ().

Proof. For Lemma 3, we prove the influence of deleted vertex on dominating relationship through an argument of contradiction. If the domination relationship can be remained, it must find a subgraph mapping composed of other vertices instead of the deleted vertex. However, a subgraph mapping consisting of a unique in subgraph matching and an answer containing a unique vertex are a common phenomenon in subgraph matching.
Benefit from Lemmas 1ā€“3, the algorithm of incremental subgraph matching and incremental maintenance on FG are designed in Sections 4.2 and 4.3.

4.2. Incremental Maintenance

The incremental maintenance of FG employs a state transition model to effectively identify the influence of update candidate state on subgraph index of the flow graph and contribute to the incremental subgraph matching on the flow graph.

The state transition model consists of three candidate states (āˆ’1, 0, 1) and six transition rules (Transitions 1ā€“6), which demonstrates the adjacent influence of changed candidate from one state to another one. The vertices of states 0 and 1 indicate the negative and positive candidates respectively that are included into previous FG. A vertex of state āˆ’1 denotes an inserted negative or positive candidate, which is not included into previous FG. The six transition rules describe the state changing of candidates as vertices are inserted and deleted in the graph update stream. Figure 6(a) describes the state transition model, where the solid lines denote the transition rules (Transitions 1ā€“3) of deleted vertices and the dashed lines indicate the transition rules (Transitions 4ā€“6) of inserted vertices.

Furthermore, we analyze the influence of six transition rules on the adjacent region of changed candidate state in FG. In order to reflect the impact of changed candidate state on structural characteristic of our flow graph, we divide the adjacent region of changed candidate into three subregions: parent, child, and nontree subregions. The changed candidate may lead to the state transitions of vertices in three subregions through Transitions 2 and 5. The three subregions of changed vertex are described in Figure 6(b) and formed as , , and .nt. The parent, child, and nontree subregions of are filled with blue, yellow, and purple colors, respectively.

The influence of changed candidate to the vertices in parent subregion is described as follows:(1)For the changed vertex by Transitions 2, 3, 5, and 6, regrading node pairs and , satisfying , then is a follower of . Thus, the state changing of may reverse the following state FS () of . Since the State () ( CS () FS ()), State () is also may be reversed by the state changing of .(2)For the changed vertex by Transitions 1 and 3, considering node pairs and , satisfying and is marked by a minimum sequential number min of root candidate, min is copied to .

Regarding an active candidate by transition rule 2 in Figures 5(b) and 5(c), the State () is true because its follower is a positive candidate. Regarding an active candidate by transition rule 1 in Figures 5(a) and 5(b), is marked by a minimum sequential number 2.

The influence of changed candidate to the vertices in child subregion is described as follows:(1)For the changed vertex by Transitions 4 and 6, regarding node pairs and , satisfying , then the sequential number of may be deleted through Lemma 3 because may block the dominating relationships of FG.

An incremental maintenance algorithm is described in Algorithm 3. A previous processing first merges graph update stream to the flow graph FG of initial graph . Then, the changed vertices of are analyzed by transition rules 1ā€“6. The incremental maintenance algorithm is to search all changed vertices affected by state transition rules until there is no candidate transition in FG (Algorithm 3).

(1)repeat
(2)ā€ƒApply State Transition Rules 1ā€“6
(3)until No Candidate Transition in FG;
(4)ReturnFG
4.3. Incremental Subgraph Matching

In this section, the incremental subgraph matching is given to conduct the subgraph mappings of on the initial graph and graph update stream .

Two matching order is designed to orchestrate the traversal sequence of node pairs in the initial graph and the graph update stream . The matching order of initial graph is used to orchestrate the traversal order of flow graph construction and matching sequence of subgraph mappings. The matching order is fixed in the subgraph matching of initial graph, and the originating nodes of are the candidates of root node in .

The matching order of graph update stream is used to orchestrate matching sequence of subgraph mappings on the incremental maintenance of graph update stream. The matching order is changed by each active candidate in the subgraph matching of graph update stream. Given an active node pair , it first traces the root candidate of in backward order and then traverses the nodes of other paths in forward order. Regarding an active node pair by transition rules 2 and 3, in Figure 4(a) and is the root of .

The consistency of subgraph matching is guaranteed by two verifications of inserted node pairs, relational verification and sequential verification. The relational verification is to verify the local isomorphism of local subgraph mapping and selected node pair, as described in Verification 1.

Verification 1. (relational consistency). Given a partial subgraph mapping and a selected node pair , data vertex is the relational consistency with query vertex if and only if it satisfies the following constraint: (1) State () 1, (2) , : , and (3) (or ).
Here, State () denotes the state of node pair . State () is true if and only if CS () is true and Des (), F (): CS () 1, according to Definition 5 and Lemma 1, that is, State ()ā€‰=ā€‰CS () FS (). A subgraph mapping is composed of multiple node pairs, formed as and the number of node pairs is equivalent to the number of query vertices, denoted as . Given a subgraph mapping , a partial subgraph mapping is a subset of sequential node pairs, defined as , and . Here, denotes a sequential vertex in matching tree .
The sequential verification is to verify the equivalence of sequential numbers between local subgraph mapping and selected node pair, as described in Verification 2.

Verification 2. (sequential consistency). Given a partial subgraph mapping and a selected node pair , data vertex is the sequential consistency with query vertex if and only if it satisfies the following constraint: : SN () SN () .
Here, SN () refers to the sequential numbers of and sequential numbers are transitively assigned by the number of root candidate. Regarding the sequential number 2 of in Figure 5(b), there is a reachable path from to .
Given vertices and of FG if and are the followers of different common dominators, then and cannot conduct the subgraph mappings according to Lemma 2. Regarding sequential number 1 of and sequential numbers 2 and 3 of in Figure 5(b), it cannot find a subgraph mapping conducted by and because and are located into different sequential flow subgraphs FG () and FG (, ) (Algorithm 4).
An incremental subgraph matching is described in Algorithm 4. The inputs are a merged flow graph and a matching tree . The outputs are the subgraph mapping of on . The subgraph mappings are iteratively conducted if and only if (Lines 1-2).
One module is to iteratively conduct all subgraph mappings of initial graph (Lines 5ā€“7). Another module is to iteratively conduct all subgraph mappings of graph update stream (Lines 11ā€“14). SNVaild (, ) and RCVaild (, ) are used to verify the relational and sequential consistencies of selected node pairs (Lines 6 and 12). The selected node pairs are inserted into the subgraph mapping if the verifications of selected node pairs are valid (Lines 7 and 13). FG (.successor) and FG (.successor) are used to acquire the successor of in and for traversing the node pairs of query vertices in forward order. FG (.precursor) and FG (.precursor) are used to backtrack the precursor of in the backward order of and , respectively.

ā€‰Input: a merged flow graph and a matching tree
ā€‰Output: the subgraph mappings of on
(1)if then
(2)ā€ƒOutput ;
(3)else
(4)ā€ƒā€ƒif then
(5)ā€ƒā€ƒā€ƒfor each do
(6)ā€ƒā€ƒā€ƒā€ƒif SNValid and RCValid then
(7)ā€ƒā€ƒā€ƒā€ƒā€ƒ;
(8)ā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒIncMatching (FG (.successor));
(9)ā€ƒā€ƒā€ƒIncMatching (FG (.precursor));
(10)ā€ƒā€ƒelse
(11)ā€ƒā€ƒā€ƒfor each do
(12)ā€ƒā€ƒā€ƒā€ƒif SNValid and RCValid then
(13)ā€ƒā€ƒā€ƒā€ƒā€ƒ;
(14)ā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒIncMatching (FG (.successor))
(15)ā€ƒā€ƒā€ƒIncMatching (FG (.precursor))

5. Experimental Evaluation

We conduct extensive performance studies to evaluate our incremental subgraph matching (iPM). All the experiments are preformed on an Intel Xeon E7520 processor with 12ā€‰MB of L3 cache. The system is equipped with 32ā€‰GB of main memory and it runs a 64ā€‰bit Linux 3.13.0 kernel.

5.1. Experimental Settings

The performance evaluation of algorithm mainly depends on two aspects, query graph and sliding window. The influencing factors of algorithm include Query Factor (QF), Query Shape Factor (QSF), and Data Window Factor (DWF).

Since flow graph index employs the structural feature and semantic label of query graph to pruning the noncandidates of the data graph, it is closely related to the size of the flow graph index. The size of the query graph is denoted as Query factor QF. The factor of query shape QSF refers to the shape of query graph, such as chain, star, cyclic and chain-star shapes [44], as described in Figure 7. The QSF is closely related to the density of adjacent structure. Considering the chain and star queries, the adjacency structure of star-shape query is more dense than the chain-shape query. Then, the star-shape query can prune the more noncandidates than chain-shape query because the query within denser structural feature can prune the more noncandidates than simple one in the original data graph.

Data Window Factor DWF illustrates the influence of sliding window on conducting subgraph mappings. Sliding window is associated with the changed size of graph update stream locked in the quantified size of initial data graph.

The impact factors of iPM are described in Table 4. The initial data graph contains the RDF data of size 1.0, which is encapsulated into the quantified window in the initial traversal processing. The query graphs of different shapes (star, chain, and cycle) and scales (1ā€“22 triple patterns) are used to evaluate the influence of query factors on conducting subgraph mappings. The analysis of initial data graph and subgraph mappings are illustrated in Figure 8.

5.1.1. DataSet

A real-world dataset and a synthetic one are used in this paper.(1)The NY Taxi Dataset4 is a publicly available real-world dataset with total of 1 billion taxi related RDF stream data. The dataset contains 17 different measurement values for taxi fares, locations, triple distance, triple time, etc. A query graph can be corresponded to at most 24 triple patterns.(2)The Social Network Benchmark (SNB) [22] is a synthetic dataset, which contains social data distributed into streams of GPS, posts, comments, photos, and user profiles. The dataset collects the information of persons about their friendship network and content data of messages between persons, e.g., posts, comments, and likes. We generated a total of 50 million RDF triples containing data for 30,000 users. The scale of data graphs is extended from 10,000 sets of data graphs, which are extracted randomly from SNB dataset and the size of data graphs are illustrated in Figure 8(a).

5.1.2. Query Graphs

The query graphs of different shapes (star, chain, and cycle) and scales (1ā€“22 triple patterns) are designed to evaluate the matching influence of query on data graph.(1)Query-Star. A star query refers to the graph containing an instance node with multiple attributes. Thus, the scale of star query depends on the number of attributes and the performance evaluation of query-star is presented in Figures 9(a) and 9(d).(2)Query-Chain. A chain query refers to the graph containing multiple instance nodes linked in a line. Thus, the scale of chain query depends on the number of instance nodes and the performance evaluation of query-chain is presented in Figures 9(b) and 9(e).(3)Query-Cycle. A cycle query refers to the graph containing multiple instance nodes linked in the form of cycles. The smallest cycle contains at least three instance nodes and three undirected edges. The processing of continuously embedding a query vertex and two edges into the cycle query is used to increasing the scale of query graph. And the performance evaluation of star-chain-cycle queries is presented in Figures 9(c) and 9(f).

5.2. Analysis of Algorithms

In this section, we mainly look at the total execution and traversal time-efficiency of iPM.

In the experimental evaluation, we focus on the count-based sliding window since it can be adapted to a time-based one using a simple transformation. The initial sliding window contains RDF data of size 10,000 which slides one data at a time. The performance evaluation of iPM is executed on the dataset containing 0.5 million RDF data (About 10,000 data graphs).

The compared algorithm of SPECTRA [27] is chosen to evaluating the experimental performance with our algorithm iPM because SPECTRA is a competitor of our methods, which employ a set of vertically partitioned views to collect the summarized data from each event, and sibling lists are employed to incrementally index the joined triples between views. The matched results are shown in a set of intermediate view for ease of enabling an incremental evaluation with the arrival of new events.

SPECTRA [27] is a competitor for comparison experiments with our methods. A prominent contribution for PM-S is the study entitled SPECTRA, in which a set of vertically partitioned views is used to collect the summarized data from each event and sibling lists are employed to incrementally index the joined triples between views. The matched results are shown in a set of final views, thus enabling an incremental evaluation with the arrival of new events.

The ten thousand data graphs are extracted from SNB dataset, which is described in Figure 8(a). The scales of most data graphs are located in the range of 15 to 20 RDF triples.

The performance evaluation of subgraph results is presented in Figure 8(b). The quantity of subgraph result is evaluated within a sliding window designed as the sliding interval of 500 RDF triples. In the trend of experimental graphs, the results are incremental increasing before because RDF data is constantly filled into the fixed window in the initial execution processing. Then, the experimental graph is presented by a wavy line because the subgraph results are incrementally produced with graph stream updates.

The total matching time of star, chain, and cycle queries are measured through different quantities of triple patterns, which are presented in Figures 9(a)ā€“9(c), respectively. In the trend of experimental graphs, our methods (iPM) have a more significant advantage than SPECTRA (SPE) in star and cycle queries. As the quantity of triple patterns increases, iPM approximates a linear growth trend, while SPE is closer to the exponential growth trend.

The traversal time of star, chain, and cycle queries are measured through different quantities of triple patterns, which are presented in Figures 9(d) ā€“9(f), respectively. In the trend of experimental graphs, the total matching time of iPM increases first and then decreases as the scale of query graph enlarges, while SPE is closer to the linear or exponential growth trend. The variant traversal time indicates that massive RDF triples are filtered through candidate verification. Thus, the structure and label of query graph are beneficial to reduce the noncandidates in the flow graph index.

The performance evaluations on NY taxi dataset are described in Figures 10(a) and 10(b). Figure 10(a) depicts the trend of different sliding size on intermediate results. The influenced trend can find a most suitable sliding size for continuous subgraph matching. In NY taxi dataset, the most suitable sliding size is 200.

Figure 10(b) depicts the matching time with different query scales coupled with a most suitable sliding size. In the trend of experimental graphs, the matching time is increasing first and then decreasing as the quantity of triple patterns enlarges. Intuitively, the massive RDF data is filtered after .

The experimental results show that our methods are able to address the complex graph (i.e., star and cycle queries) and large datasets. Meanwhile, our method also provides better benefits with chain query.

6. Conclusions

In this paper, a flow graph index is first proposed to pruning the noncandidates of query vertices. The flow graph FG is a directed multigraph, which is constructed from the initial data graph and guided by a matching order of query graph. Then, a sequential subdivision technology of the flow graph is employed to limit the search derivation of incremental subgraph matching. The sequential numbers of root candidates are assigned to the vertices of divided flow graphs and limit the search space of originating changed candidate of FG. For incrementally conducting the subgraph mappings, a state transition model is first used to illustrate the transition state of changed candidates, which consists of three states and six transition rules. Based on the state transition model, we analyze the influence of changed candidates to adjacent region and design our incremental maintenance strategy. Then, an incremental subgraph matching algorithm is executed on the sequential divided flow graph. The consistency of subgraph matching is guaranteed by two verifications of selected candidates, relational and sequential verifications. Finally, extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.

Data Availability

The NY Taxi data used to support the findings of this study have been deposited in the repository http:/chriswhong.com/open-data. Previously reported Social Network Bechmark (SNB) data were used to support this study and are available at DOI: 10.1145/2723372.2742786. These prior studies are cited at relevant places within the text as references [22, 27].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant no. 61976032.