Abstract

Some cloud services may be invalid since they are located in a dynamically changing network environment. Service substitution is necessary when a cloud service cannot be used. Existing work mainly concerned on service function and quality in service substitution. To select a more suitable substitutive service, process collaboration similarity needs to be considered. This paper proposes a cluster and process collaboration-aware method to achieve service substitution. To compute the process collaboration similarity, we use logic Petri nets to model service processes. All the service processes are transformed into path strings. Service vectors for cloud services are generated by Word2Vec from these path strings. Process collaboration similarity of two cloud services is obtained by computing the cosine value of their service vectors. Meanwhile, similar cloud services are classified as a service cluster. By calculating function similarity and quality matching, a candidate set for services substitution is generated. The service with the highest process collaboration similarity to invalid one in the candidate set is chosen as the substitutive one. Simulation experiments show the proposed method is less time-consuming than traditional methods in finding substitutive service. Meanwhile, the substitutive one has a high cooccurrence rate with neighboring services of the invalid cloud service. Thus, the proposed method is efficient and integrates process collaboration well in service substitution.

1. Introduction

With the promotion of cloud computing applications, a variety of cloud services with different functions are quickly registered in various cloud computing platforms [1]. Users can easily search and lease their expected cloud services in these cloud computing platforms. For example, “casicloud.com” is a cloud manufacturing platform providing manufacturing services. We can find that there are nearly 950,000 cloud services in the website and more than 8800T industrial data have been handled by these services in the end of August 2019 [2]. A new business application can be easily built by invoking these cloud services. To effectively address complicated service requests, we can assemble a group of cloud services as a composed service process with a specific business flow [3].

Service invocation is the most popular way to integrate existing cloud services in the network-based software systems [4]. It can greatly reduce the time cost in constructing new business system. To integrate a cloud service, an appropriate service which can properly respond service request needs to be selected. Since there are a large number of cloud services in cloud platforms, service discovery is a time-consuming process. Current service discovery methods face a large searching space, and their seeking processes are tedious and inefficient [57].

In the complicated and precarious network environment, some cloud services may be invalid during their invoking processes [8]. A substitutive service need to be searched once any of the component services is unavailable in service-oriented business systems [9]. Many service replacement methods are inefficient. The main reason of low efficiency is service substitution faces a large searching space. The existing work mainly concerns on service function and quality in service substitution. These methods can find a service to replace the invalid one. The substitutive service is equivalent in service function and quality. However, it may not be able to cooperate with other services as well as the invalid one. The main reason is that process collaboration is not considered in service substitution [10].

Aiming at finding a quick and more reasonable substitutive service, we propose a cluster and process collaboration-aware method to achieve service substitution. To improve the efficiency of service discovery and substitution, we cluster cloud services with the same or similar functions as a group, named as a service cluster. The clustering mechanism can reduce service searching space. It can improve the efficiency of service discovery and substitution. We also take process collaboration of the component services into consideration. The candidate service with the highest process collaboration similarity to invalid one is recommended to apply service substitution. The main contributions of this paper are as follows:(1)We introduce clustering mechanism to reduce service searching space. The efficiency of service discovery and substitution is prominently increased.(2)A method to evaluate process collaboration similarity is proposed. Service processes are transformed into path strings. We train service vectors for cloud service by Word2Vec based on these path strings. Then, process collaboration similarity is obtained by computing the cosine value of service vectors.(3)Service function, quality, and process collaboration are comprehensively considered to achieve service substitution. The proposed cluster and process collaboration-aware method is obviously superior to the existing methods in service substitution.

The rest of this paper is organized as follows: Section 2 introduces the related work about service substitution; the concept of service cluster and service response schema based on service clusters is presented in Section 3; how to substitute cloud service based on the cluster and process collaboration-aware method is proposed in Section 4; Section 5 presents simulation experiments; and Section 6 concludes this work.

Finding an appropriate service for the invalid one is a key work in service substitution. Thus, the existing service discovery methods can offer an important reference for the research of service substitution. Cheng presents a diversified keyword search approach on service connection graphs. This method can satisfy the various possible requirements underlying a given keyword query [11]. Zhang defines a service composition context model based on three types of parameter correlations between service input and output parameters. The similarity between any two services is measured using the PersonalRank and SimRank++ algorithms by the composition context model [12].

Chen proposes a new measure of semantic similarity integrating multiple conceptual relationships for web service discovery. The new measure enables more accurate service-request comparison by treating different conceptual relationships in ontologies such as is-a, has-a, and antonomy differently [13]. A comprehensive ontology has been developed to provide a standardized semantic specification of cloud services based on their functional features and nonfunctional features in [7]. The authors present an intelligent cloud service discovery framework based on these ontology concepts to identify cloud service. The average amount of error expected to identify a service by using the proposed framework is 11% compared to 31% by using the cloud service discovery solution. Hierarchical Dirichlet processes (HDP) model and personalized PageRank algorithm are used to achieve a two-stage model for cloud service recommendation by integrating the information of service descriptive texts and service tags [14]. Nabli proposes a self-adaptive semantic focused crawler based on latent Dirichlet allocation (LDA) for efficient cloud service discovery [15]. A method to learn features from service descriptions by using variational autoencoders is proposed by Lizarralde. It achieves significant gains compared to both word embeddings and classic latent features modelling techniques [16]. The above methods are the latest service discovery methods in recent three years. We can see that service context, comprehensive ontology about cloud services, and vector-based service similarity calculation are more concerned in service discovery.

In the domain of service substitution, researchers have also presented some effective methods. For example, Gong employs a cloud model to compute the QoS uncertainty to determine dynamic substitute targets. By targeting substitutions, the reconfigured web service will better satisfy users’ requirements [17]. Three rules are provided to establish the compatibility and substitution of service operation interfaces [18]. The experiments to show service substitute identification based on the proposed framework achieve a best precision of 85%. By recording execution context data and mining the execution context conditions, an execution context-aware approach for web service substitution is proposed in [19].

Santhanam uses preference networks to represent and reason about preferences over nonfunctional properties in service substitution [20]. The proposed method is independent of the specific formalism used to represent functional requirements of a composite service as well as the specific algorithm used to assemble the composite service. By computing similarity degree between interface data and analyzing critical paths, Gao presents a method to check the data consistency for the dynamic replacement of service process [21]. This method provides fundamental theory guidance to enhance the credibility of the service process in the modern service industry. In recent research work, Sara et al. presented a similarity network for web services operations substitution [22]. The nodes represent the operations of the web services. A link joins two similar operations according to some relationships defined between them. The constituted network responds to the substitution best and much easier than existing works.

The aforementioned methods must go through a large number of cloud services to find the substituted one in service substitution. Most of previously mentioned methods are time-consumed. In [23], Wu proposes deploying a web service cluster to perform service substitution. Service cluster contains a logic service and a set of concrete services, and these concrete services have functional equivalence or compatibility. Du converts a service cluster into a service cluster net unit. And it is used to analyze whether the services in the cluster can satisfy some service requests [24]. However, service clusters in their methods are restricted with the same interfaces. They can reduce the searching space, but the flexibility of substitution still needs improvement.

The existing work mainly concerned on service function and quality in service substitution. Response time and substitution recall rate are the main evaluation indexes. We can also find some researchers give theoretical analysis to prove the feasibility and effectiveness of their approach. Few studies have taken process collaboration in consideration. The substitutive service will show better cooperation effect with other services once we add process collaboration relations in service substitution. In this paper, we introduce process collaboration similarity into service substitution and investigate service cooccurrence rate to show the benefits of our method.

3. Similarity Computation of Process Collaboration

A service process is composed of several cloud services. These cloud services cooperate to accomplish service request from tenants. We can obtain the collaboration similarity from the existing service processes.

There are two factors to determinate collaboration similarity for cloud services: one is cooccurrence rating, and another is the process distance. Two cloud services will have a higher collaboration similarity if they simultaneously appear more times than other services. Moreover, two services will also have a higher collaboration similarity once they are with a smaller process distance.

3.1. Path Strings of Service Processes

In this study, we first convert service processes into path strings. Then, we train service vectors for all the cloud services in these path strings by Word2Vec. Finally, we compute the collaboration similarity based on these service vectors. To obtain path strings, we use service nets to formally model cloud service processes. In service nets, logic Petri net is employed to describe the business flow. Now, we give the definition of logic Petri net.

Definition 1. (logic Petri net [25])
LPN = (P, T, F, I, O, M) is a logic Petri net, where(1)P is a set of places(2)T includes three subsets of transitions, i.e., T = TDTITO, where TD denotes a set of traditional transitions, TI denotes a set of logic input transitions, and TO denotes a set of logic output transitions(3)F is a flow relation, i.e., a set of directed arcs F⊆ (P×T) ∪ (T×P)(4)I and O are mapping functions between logic input transitions and logic input expressions, i.e., ∀t ∈ TI, I (t) = fI (t); ∀t ∈ TO, O (t) = fO (t)(5)M: P ⟶ {0, 1} is a marking function, ∀p ∈ P, and M (p) denotes the token count in p(6)Transition firing rules(a)t ∈ TD, and the firing rules of t are the same as in PNs(b)t ∈ TI, and t is enabled only if fI (t)|M = T, where T denotes the logic value “true”. M [t > M′, where ∀pt, M′ (p) = 0; ∀p ∈ t, M (p) = 0, and M′ (p) = 1; and ∀ptt, M′ (p) = M (p)(c)t ∈ TO, and t is enabled only if ∀pt: M (p) = 1. M [t > M′, where ∀pt: M′ (p) = M (p)−1; ∀p ∉ tt: M′ (p) = M (p); and ∀p ∈ t must satisfy fO (t)|M′ = T; i.e., t must satisfy the logic output expression fO (t) at M′

Definition 2. (service net)
A service net SN = (LPN, i, o, L) is a labelled logic Petri net, where(1)LPN is the process model of a service process, where TD denotes the component services. P = Pc ∪ Pd, Pd is a set of data places interacting with the external services, and Pc is a set of control places representing the states of the service process(2)i is the initial place, and o is the terminal place of a service process, with •i = o• = ∅(3)L: T ⟶ Θ is a mapping function, where Θ is a set of cloud service names

Definition 3. (preset/postset)
For a service net LPN, x ∈ P ∪ T, x = {y|(y, x) ∈F} is called a preset of x, and x = {y|(x, y) ∈F} is called a postset of x.
To get the preset and postset of x, we introduce two operations, π and τ, to compute x and x. In this study, π (x) = •x and τ (x) = x•.

Definition 4. (paradigm of logic expressions)F = {f1, f2, …, fn} is a group of service nets, and to and ti are the logic transitions to link F, i.e., ∀ fj: π (fj.i) = {to} ∧τ (fj.o) = {ti}. The paradigm of logic expressions is defined as follows:(1)O: O (to) = f1.i∧¬ f2.i¬∧…∧¬ fn.i∨¬ f1.if2.i¬∧…∧¬ fn.i …∨ ¬f1.i∧¬ f2.i¬∧…∧ fn.i(2)O: O (to) = f1.if2.i∧…∧ fn.i(3)I: I (ti) = f1.of2.o∨…∨fn.o(4)I: I (ti) = f1.of2.o∧…∧fn.oA service net for online shopping is provided in Figure 1. The service process described by this service net is initialized by inquiring some merchandises. If the query fails, a service is presented to show failure information. If the online seller can provide these merchandises, another service process which can purchase merchandises will be triggered. As we all know, either payment before receipt or receipt before payment can be both supported in the online trade. So, two subprocesses are concurrently presented. One is “reserve-defray-delivery,” and the other is “reserve-delivery-defray.” However, the logic expression labelled on transition to′ is p5∧¬p6∨¬p5p6, and it means that only one of the places in p5 and p6 can be assigned one token. Thus, only one service process can be performed. Similarly, ti′ is labelled with a logic expression p11∨p12, and it means that p13 can get a token once p11 or p12 has obtained the token. How to construct service nets for service processes can be found in our previous work [26].
To compute service vectors for the component services in the service processes, we convert the paths of service nets into strings, named path strings. A lot of path strings can be acquired from the existing service nets. The symbols in these path strings can form a corpus. Then, the tool Word2Vec is employed to train service vectors for the component services by utilizing these path strings mapping from the service nets. Finally, we can obtain collaboration similarity for two cloud services by computing the cosine similarity based on their service vectors. Relevant introduction about Word2Vec can be referred from [27]. The following section presents how to generate path strings and give an algorithm to compute service vector for each cloud service.
There are four types of basic process structures in the service nets: sequence, choice, parallel, and loop. The concept “one-fold service process” is proposed in our previous work [25]. In the one-fold service process, logic expressions labelled on logic transitions in the structures of choice and parallel must strictly follow Definition 4. Meanwhile, nesting structures must be not found in the one-fold structure. Four types of basic one-fold structures are illustrated in Figure 2. A merge-reduced method is introduced to generate a path string in this study. In the merging phase, all basic one-fold structures are mapped into a path string. The one-fold structures (a), (b), (c), and (d) can be merged as the path strings “t1t2t3tn,” “Ot1 ⊗ t2t3…⊗tnI,” “Ot1||t2||t3||…||tn||I,” and “ti,” respectively.
There are very few one-fold structures in the service nets in practice. If a transition ti is replaced by a service process spi, we should first merge all the transitions in spi and then replace ti by the path string generated from spi. In the reducing phase, we link all the path strings obtained from the four types of process structures to generate a path string for a service net. The path string of the service net in Figure 1 is generated as “t1tot2||tot3t4t5||t3t5t4||ti′||ti” by this method. Here, service names have been mapped into symbols as {t1: query, t2: query fail, t3: reserve, t4: defray, t5: delivery}. Details about how to obtain path string for a service net are presented in Algorithm 1.

Input: service net SN;
Output: path string of SN;
(1)t1 = t (SN.i);
(2)PS = t1;
(3)tc = t1;
(4)tn = t (τ (t1));
(5)while (tn! = Null)
(6){ if (tc ∈ SN.Tstn ∈ SN.Ts) PS=PS + tn;
(7) if (tc ∈ SN.Tstn ∈ SN.TO)
(8) { m = |τ (tn)|;
(9)  For j = 1 to m
(10)   obtain a place p in τ (tn) and build service net SubSNj with SubSNj.i = p;
(11)   spj = PathString_Generate (SubSNj);
(12)   if (I (tn) = O) PS = PS + spj + ||;
(13)   if (I (tn) = O) PS = PS + spj + ⊗;
(14)  End for
(15)if (tc ∈ SN.Tstn ∈ SN.TOj = = m) PS = PS + tn;
(16)tc = tn; tn = t (τ (tc)); }
(17)return (PS);
(18)}
3.2. Computation of Process Collaboration Similarity

Given a group of cloud services, process collaboration similarity is used to evaluate what extent two cloud services can cooperate with other ones. Normally, two cloud services with high-process collaboration similarity means that they may have more partner cloud services in service processes. Since service vectors for cloud services can be trained from the path strings, the process collaboration similarity of two cloud services can be obtained by computing the cosine similarity of their service vectors.

Assume there are two resource pools: the cloud service clusters pool (CSCP) and service net pool (SNP). All the service clusters and cloud services are organized in CSCP. Meanwhile, the existing cloud service processes have been transformed into service nets and stored in SNP. Algorithm 2 provides a method to generate word vectors and service vectors for cloud services and service clusters in CSCP.

Input: services and service clusters in CSCP;
 Output: vectors for the services and service clusters
(1)Construct two corpus CP1 and CP2;
(2)CP1 = CP2 = Φ;
(3)For each cloud service S in CSCP
(4) Obtain the sentence send in S.D.Ft and CP1 = CP1 ∪ {Send};
(5)End for
(6)For each service net SN in SNP
(7) ps = PathString_Generate (SN);
(8) Delete the symbol || and ⊗ from ps
(9) CP2 = CP2∪{ps};
(10)End for
(11)For each cloud service S and service cluster SC in CSCP
(12) Train word vector S.WOp and S.WTh for the word in S.D.Op and S.D.Th by CP1;
(13) Train word vector SC.WOp and SC.WTh for the word in SC.D.Op and SC.D.Th by CP1;
(14) Train service vector PS for S by CP2;
(15)End for
(16)Return (S.WOp, S.WTh, SC.WOp, SC.WTh and PS);

In Algorithm 2, we first construct two corpus CP1 and CP2. CP1 consisted of the description sentences of all the cloud services. All the path strings of service nets are gathered in CP2 (lines 1 to 10). For the cloud service and service cluster, CP1 is used to train word vector for words in function description item D.Op and D.Th. These word vectors are used to compute the function similarity of cloud service and service cluster in finding candidate service set (see lines 11 to 13). In line 14, CP2 is used to train service vector for cloud services. Since CP2 consists of the path strings, the service vector trained by CP2 can be adopted to calculate the collaboration similarity.

Definition 5. (process collaboration similarity)
Assume S is a set of cloud services. Let PS be the set of path strings of all the services in S. For two service Si and Sj in S, Pi and Pj are service vectors of Si and Sj which are trained by the corpus PS. The collaboration similarity of Si and Sj is defined as CollSim (Si, Sj). CollSim (Si, Sj) = .
Notice that we omit the semantic of symbols in path strings, and only the positional adjacency of different symbols is considered to train the vectors. Thus, we use the serial numbers of cloud services to generate path strings in practice.

4. Service Substitution Based on Clustering and Process Collaboration-Aware Method

In this section, we first introduce the concept of service cluster, present the service response schema based on service clusters, and then propose the cluster and process collaboration-aware method to achieve service substitution.

4.1. Service Response Schema Based on Service Clusters

Some similar definitions to describe a group of web services are put forward in the existing research, such as service pool [28], service class [29], and service cluster [26, 30]. Cloud services in above concepts are required with the same input and output parameters. Thus, they have little flexibility in service substitution because they can only achieve service migration with same interfaces.

In this paper, we do not require all the cloud services in a service cluster with the same interfaces. The definitions of cloud service and service cluster are formally defined as follows.

Definition 6. (cloud service)
A cloud service is a 6-tuple Cls = (N, D, I, O, Q, L), where(1)N is the serial number of cloud service in cloud service platform(2)D is a function description of the cloud service(3)I and O are the sets of input and output parameters, respectively(4)Q is a set of quality parameters(5)L is the URI of the cloud serviceFunction description of a cloud service is defined as D = <Op, Th, Ft>. Here, Op, Th, and Ft are the operation, theme, and function text of a cloud service, respectively. For example, a weather forecast service is set as D = <query, weather, “the service can provide the weather forecast, users present the city and date, and then, the service can return temperature, humidity, ultraviolet intensity, and wind speed.” >.
As we known, service quality is an important factor to evaluate a cloud service. There are many common attributes in cloud services, such as response time, cost, and reliability. Besides, there may be some other quality attributes related to the practical application domain of cloud services. For example, manufacturing cycle and the level of after-sales service are more concerned by the tenants in cloud manufacturing.
Here, all these attributes are defined as quality parameters. We formally define it as Q = {qi}, qi = (n, c, , u), where n is the name of quality parameter, c is a comparison operator, is the value of the parameter, and u is the unit of quality parameter. If a cloud manufacturing service is assigned as q = (manufacturing cycle, <=5 day), it means that the manufacturing cycle is no longer than five days.

Definition 7. (service cluster)
A service cluster is a 6-tuple Sec = (N, D, I, O, S, ), where(1)N is the serial number of cloud service in cloud service platform(2)D is a function description of a service cluster(3)I and O are the sets of input and output parameters(4)S = {cls1, cls2,…clsn} is the set of component services in a service cluster, where clsi is a cloud service with 1 ≤ i< = n(5) = {qi}, where qi = {n, c, [, ], u}, and and represent the upper and lower bound values of qi, respectivelyFigure 3 shows the architecture of service response schema based on service clusters. Cloud services published by service providers are stored in the physical resource layer. Service cluster is a mapping collection of these services, and all the service clusters constitute the virtual resource layer [23].
The tenant request is modelled and submitted in the business model layer. It can be responded as two ways: single service or service composition. To respond to tenant request, we can find another service to substitute the invalid one in its responding service cluster. In majority of cases, the searching space is the volume of the corresponding service cluster; thus, the efficiency of service substitution can be greatly improved.

4.2. Service Substitution

A cluster and process collaboration-aware method is proposed to achieve service substitution in this paper. The method can be divided into two steps: (1) we find a candidate service set for substitution based on service clusters. All these candidate services can replace the invalid one in view of service function and quality. (2) We compute the vector similarity between candidate services and invalid one so as to obtain collaboration intensity. By comprehensive consideration of function, quality, and collaboration, a cloud service with the highest similarity for the invalid one is selected to perform service substitution.

Definition 8. (functional similarity)
S1 and S2 are two cloud services. Let WOpi and WThi be word vectors of Si.D.Op and Si.D.Th, respectively, where i = 1, 2. The functional similarity of S1 and S2 is defined as FuncSim (S1, S2). FuncSim (S1, S2) = .
Two cloud service S1 and S2 are called functional equivalence if FuncSim (S1, S2)≥δ. Here, δ is a threshold value. Meanwhile, the functional equivalence for two cloud services is denoted by S1 ↔ S2.

Definition 9. (parameter compatibility)
Px and Py are two parameters. Parameter compatibility of Px and Py is the replaceable degree of Px and Py, denoted as PC (Px, Py).
Parameter compatibility is used to evaluate whether two groups of parameters can replace each other. It is divided into three levels in this study. To differentiate each level, we introduce three functions Num (P), type (P), and value (P) to represent the amount, type, and value of parameter P, respectively. The partition rules of parameter compatibility are formally described as follows:(1)PC (Px, Py) = L1 if Num (Px) ≤ Num (Py) and ∀mi ∈ Pxnj ∈ Py:mi ⟷ nj∧Type (mi) = Type (nj)(2)PC (Px, Py) = L2 if PC (Px, Py) = L1∧ PC (Py, Px) = L1(3)PC (Px, Py) = L3 if PC (Px, Py) = L1 and ∀mi ∈ Py, ∃nj ∈ Px: value (mi) ⊆ value (nj)PC (Px, Py) = L1 means that Px is a subset parameter of Py, and it is symbolically represented as Px ∝ Py. Similarly, PC (Px, Py) = L2 means that Px is a isomorphic parameter of Py and it is symbolically represented as Px ⇔ Py. Meanwhile, PC (Px, Py) = L3 is denoted as Px ≥ Py.

Definition 10. (quality score)
S = {cls1, cls2,…clsm} is the component cloud services in a service cluster. Assume that each service in S has n quality parameters, i.e., <qi1, qi2, …; qin> is the quality parameters of clsi. The quality score of clsi is defined as Qscore (clsi):Quality parameters of cloud service can be divided into two types: positive parameters and negative parameters. Positive parameters will be attached with a higher quality when they are assigned a bigger value. On the contrary, negative parameters are attached with a lower quality when they are assigned a bigger value. Formula (2) is adopted to scale positive parameters, while formula (3) is utilized to scale negative parameters. Quality score can be computed by formula (1) after all quality parameters have been normalized.
In Algorithm 3, CSCP is a cloud service cluster pool. All the cloud services and service clusters are stored in CSCP. In line 1, we initialize two empty sets, i.e., CSR and cs_r. All the possible cloud service clusters which can provide the similar function are enrolled into CSR. The candidate service set for substitution is represented as cs_r. By traversing CSCP, we can obtain every cloud service cluster in line 2. Functional similarity between each service cluster and the invalid service Se is computed, and the service cluster will be added to CSR if the function similarity is larger than a threshold δ.
For each component cloud service in CSR.S, we apply interfaces and quality matching in line 6 and line 7. In the level of interface, we know that cs1 can replace cs2 if the input parameters of cs1 are the subset parameters of cs2′s input parameters, while the output parameters of cs2 are the subset parameters of cs1′s output parameters. Meanwhile, the quality parameters of cs1 should also provide a wide range value than cs2. For cs ∈ CSR.S and Se, it can be formally described as cs.I ∝ Se.I∧Se.O ∝ cs.O ∧cs.Q ≥ Se.Q. Finally, the candidate service set for substitution of Se is obtained in line 8 as the set cs_r.
From line 10 to line 12, we give a comprehensive scoring method to rank the service quality and collaboration similarity. Here, the weights α and β can be set according the tenants. Both α and β are assigned as 0.5 in this paper. The top rating cloud service will be returned to substitute the invalid service Se in lines 13 and 14.
Compared with traditional service discovery or substitution, our method needs to add service clusters. The number of service clusters will directly affect resource consumption. To verify how the granularity of service cluster affects service lookup time, we have grouped the 5000 cloud services into 50, 100, 200, 400, 600, 800, and 1000 service clusters, respectively. We find that when the number of service clusters is about 20%–40% of the total number of services, the service discovery is with a high efficiency.
In previous work, we have discussed the impact of service cluster granularity on service discovery from three aspects: quantity, structure, and quality [31]. However, we cannot give a specific granularity value on which the service discovery is in a highest efficiency. It is because we are unable to determine the size of the number of services in each service cluster. Normally, we can conclude from experiments that the scenario where the number of service clusters is about 20%–40% of the total number of services is the best granularity. Thus, we think the resource consumption will increase by 20%–40% by introducing service clusters in our method. In addition to these, we need to add a 200-dimensional vector for each service and its functional description to calculate the functional similarity. Of course, similar resource consumption also exists in other vector-based service similarity calculation work [1416].

Input: the cloud service cluster pool CSCP; the invalid cloud service Se;
 Output: the substitutive cloud service St for Se.
(1)CSR = Ø cs_e = Ø;
(2) for each Sec ∈ CSCP
(3) compute FuncSim (Sec, Se)
(4)  if (Sec↔ Se) CSR = CSR ∪{Sec};
(5) end for
(6) for each cs ∈ CSR.S
(7)  if (cs.I ∝ Se.I∧ Se.O ∝ cs.O∧cs.Q ≥ Se.Q)
(8)   cs_r = cs_r ∪{cs};
(9) End for
(10)  For each cloud service S in cs_r
(11)RecomGrade (S) = αQscore (S)+βCollSim (S, Se);
(12)End for
(13)St = {S| max (RecomGrade (S))∧S ∈ cs_r};
(14)Return (St);

5. Simulation Experiments

Simulation experiments are conducted to show the efficiency of the proposed method. Hardware for the computer is as follows: CPU is i5-8500 with 3.0 GHz, six cores. Memory is 16 G. Graphics card is GTX1060 with 6 G. Simulation program is designed by Java.

“Casicloud.com” is a famous industrial Internet platform of China. A large number of cloud manufacturing services were registered in this platform. We crawl 3780 cloud services from “casicloud.com.” These cloud services are about the same manufacturing domain. Four hundred cloud services are randomly selected, and we manually build two to five similar services for each selected cloud service. The total number of cloud services in simulation experiments is 5000.

We first present an experiment to obtain a reasonable threshold value in Definition 8. To obtain the threshold value δ, the function texts of all the cloud services are collected to form corpus. Then, Word2Vec is used to train the word vectors for the terms in the operation and theme. The value of δ is set as 0.7, 0.75, 0.8, 0.85, 0.9, and 0.95, respectively. For each value, we randomly select a cloud service as an invalid service. By computing function similarity and interface matching, we find substitutive one for it from 5000 cloud services. The accuracy and recall rating of substitution for different threshold values can be evaluated from Figure 4. By analyzing the trend of the two curves, we select 0.85 as the value of δ. The following experiments are conducted with δ set as 0.85.

Our method has two advantages. One is that the efficiency of service discovery in service substitution is improved by introducing service clusters and vector-based similarity calculation, and the other is that the recommended substitutive service is with a high collaboration similarity to the invalid one.

To verify performance of proposed method, we compare it with Santhanam et al. method [20], Sara et al. method [22], Wu et al. method [23], and Du et al. method [24]. Five rounds of experiments are performed in this study. These experiments in each round are performed for ten tests, and the average value of these results is taken as the final simulation result. According to different application areas, 5000 cloud services are manually divided into five parts. Different parts are selected to conduct experiments in turn. The number of cloud services, service clusters, and service nets of each round is shown in Table 1.

We make a rule that the component services in a service net can only be selected from different service clusters. That is, we cannot choose another cloud service from the same service cluster to compose a service net once we have chosen one from a service cluster. The number of cloud services in a service net is restricted within an interval of 8 to 20.

Algorithm execution time and service cooccurrence rate are compared between the above methods. As shown in Figure 5, our proposed algorithm has the least execution time in all rounds of experiments. Especially with increase in the number of cloud services, the advantage of our algorithm’s execution efficiency is more obvious. The result also shows that the algorithm execution time of clustering-based methods (our method, Du’s method, and Wu’s method) is lower than that of nonclustering method (Santhanam’method and Sara’s method). Thus, we can get a conclusion that the clustering-based method can improve the efficiency of service replacement.

To prove that the substitutive services found by our method are more reasonable than other methods in process collaboration, we design another experiment to investigate service cooccurrence in service substitution. Let OccuNum (Si) be the number of service Si appearing in all the service nets. Service cooccurrence of Si and Sj is defined as ServiceCo_Occu (Si, Sj) = OccuNum (SiSj)/(OccuNum (Si) + OccuNum (Sj)).

Service cooccurrence can be used to judge whether two cloud services are with a close collaborative relationship. If a substitutive one has a high service cooccurrence with precursor and successor of the invalid service, we think it is a good collaboration-aware service substitution. Assume Si, Se, and Sj are three cloud services. Let Si and Sj be precursor and successor of Se, respectively. If Se is not working and St is the substitutive service of Se. Service cooccurrence in service substitution is defined as SubSerCo_Occu (St, Se) = (ServiceCo_Occu (Si, St) + ServiceCo_Occu (Sj, St))/2. From Figure 6, we can see that our method shows remarkably good performance in service cooccurrence. By numerical comparison in service cooccurrence rate, it is about 2 to 4 times higher than other methods. Thus, the proposed method integrates service collaboration well in the service substitution.

Experiments to show the efficiency in service discovery are also conducted. We compare our method with three recently proposed methods (Cheng et al. method [11], Zhang et al. method [12], and Nabli et al. method [15]). We have investigated two factors: service discovery time and top-k accuracy. Service discovery time reflects the search efficiency, while top-k accuracy is an illustration of discovery accuracy. Figure 7 shows the service discovery time in different round experiments. We can see that Nabli’s method is the most efficient in all the methods. Our method got the second place, and its discovery time is nearly close with Nabli’s method. Nabli’s method is vector-based service discovery. All service vectors must be trained in advance. The existing service vectors are directly used to compute similarity; thus, it is with a high efficiency.

Compared with Nabli’s method, service search speed of our method is slightly slow although we introduce service clusters and vector-based similarity calculation. The main reason is that we present interface matching in the service discovery.

In the experiment of top-k accuracy, we have revised data set and guaranteed that there are several groups of services which can be used to evaluate the discovery accuracy. Each group of services can respond to the same discovery requirement. The number of each service group is not less than k. In top-k experiment, we test the proportion of appropriate services in the first k services found by different methods. As shown in Figure 8, our method is with the highest accuracy in top-k service discovery experiments. However, Nabli’s method is with the worst accuracy in all the methods. It is because Nabli’s method is computed similarity based on the LDA topic model. The accuracy of the LDA topic model is greatly fluctuated by the service descriptive information.

6. Conclusions

To efficiently and reasonably find a substitutive cloud service for the invalid one, this work proposes a method to achieve service substitution. The searching space of finding the substitutive service is greatly reduced by introducing service clusters. To get the substitutive service, we first obtain a service candidate set by applying function similarity computing and parameters matching of service quality. Service collaboration is mined from the existing service processes. By comprehensive consideration of function, quality, and process collaboration, we propose an algorithm to achieve service substitution.

We innovatively obtain the similarity of service function and process collaboration by computing the cosine value of their word/service vectors. How to construct the vectors which can represent the feather of function similarity and collaboration intensity is discussed in detail in Section 4. Results of simulation experiments have shown that the proposed method significantly outperforms the state-of-the art methods, especially for substitution in a mass of cloud services. In future work, we will focus on how to divide the process collaboration into different dimensions. A more reasonable way to measure process collaboration will be presented so as to better realize service substitution.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key R&D Program of China (Grant 2018YFB1702902), Natural Science Foundation of China (Grant 61973180), and Natural Science Foundation of Shandong Province (Grant ZR2019MF033).