Clustering web services is an effective method to solving service computing problems. The key insight behind it is to extract the vectors based on the service description documents. However, the brevity of natural language service description documents typically complicates the vector construction process. To circumvent the difficulty, we propose a novel web service clustering method to vectorize documents based on the semantic similarity, which can be calculated via WordNet and multidimensional scaling (WMS) analysis. We utilize the dataset from the ProgrammableWeb to conduct extensive experiments and achieve prominent advances in precision, recall, and F-measure.

1. Introduction

Through the rapid development of Internet technology [1], clustering web services has become an effective method to solving service discovery [24], service composition [5, 6], and service recommendation [7]. Firstly, there are increasing enterprises and institutions that encapsulate software functions or data into web services and publish them to the network. For instance, the number of services in ProgrammableWeb has grown from fewer than 3,000 in 2011 to more than 20,000 by 2020, which remarkably increases the difficulty of managing web services. Secondly, when users query the web services they need, the service discovery system generally searches all related web services and returns ordered ones, where the ranking index is mainly based on the relevance to the query. However, it is intractable to search through the entire whole web services space and obtain accurate results. According to the work made by Zhang et al., clustering web services can improve the performance of service discovery by reducing the search space [2]. Thirdly, service composition is proposed to select appropriate web services from the repository to build functional web services. However, the scale of the repository will influence the efficiency of finding and sorting multiple web services with various functions. Clustering web services that matches the clusters to the requirements of developers can successfully alleviate this dilemma in the light of the research results of Xia et al. [5].

The premise of clustering web services is to extract vectors corresponding to service description documents, which are mainly constructed based on keyword or semantic features. Unfortunately, despite the effectiveness of clustering web services towards a variety of web service tasks, its applicability is hindered by extracting vectors from natural language service description documents. The service description document is an important basis for clustering web service, which is commonly implemented by Web Services Description Language (WSDL) document or natural language web service description document. Though WSDL document written in Extensible Markup Language (XML) can offer plentiful convenient functions such as describing the service in combination with Web Ontology Language (OWL), its construction procedure is quite complex. Therefore, some companies and institutions, such as ProgrammableWeb, leverage natural language to describe web services to generate succinct service description documents, where each keyword appears almost once. However, the brevity of these documents leads to extra problems when clustering web services extracts’ two types of target features. Specifically, extraction of keyword features highly depends on the frequency of keyword occurrence. Similarly, the corpus of service description documents can hardly establish so that the corresponding probabilistic topic model is difficult to construct to extract the semantic features that refers to the probability distribution of a document on different topics.

This paper proposes a novel approach that constructs vectors via differences between documents instead of document features. We mainly cluster the web services described in natural language. The operation object is the natural language service description document, and the dataset is from ProgrammableWeb. The main contributions of this paper are based on (1) designing of an algorithm to calculate the similarity between documents, (2) proposing a methodology to convert similarity data into distance data, which is an important prerequisite for multidimensional scaling analysis, and (3) the implementing principal component analysis (PCA) methods on the vectors corresponding to the service documents to determine the appropriate clustering algorithm.

The rest of our work is structured as follows. Section 2 compares existing work. Section 3 introduces the study materials. We explain the study methods in detail in Section 4. We explain the experimental process in detail in Section 5. Finally, Section 6 summarizes the main conclusions and highlights future work.

In this section, we will separately introduce the related works on WSDL service description document clustering and natural language service description document clustering.

2.1. WSDL Service Description Document

In the early days, there were many web services described using WSDL documents, so many scholars paid attention to the clustering of such web services. Paik and Kumara et al. used the ontology model in service clustering [8, 9], which greatly improved the service clustering effect. Some scholars used WordNet to calculate the semantic similarity [8, 10], but the algorithm they proposed is not suitable for natural language documents. We proposed an algorithm for calculating the semantic similarity between natural language documents using WordNet. In addition, some scholars also used context-aware methods to improve service clustering [9, 11]. Liang et al. used tag information in WSDL document clustering to improve the clustering effect [12]. Considering the sparse semantics of WSDL documents, Gu et al. used open data to increase semantic information before clustering [13]. Because of the too much useless information of WSDL documents, Agarwal et al. used a probability model to filter useless information before clustering [14]. Sun et al. added neural networks to service clustering [4]. In general, these methods have a common limitation, and they are not suitable for processing service documents described in natural language. For example, in literature [8], separate ontology is constructed for different “element” data, and the “element” includes <definitions>, <types>, <messages>, and <portType>. However, in natural language documents, there is no “element,” so it is very difficult to construct ontology.

2.2. Natural Language Service Description Document

Because WSDL documents are too complex to construct, some companies and organizations now use natural language to describe web services. Some scholars focus on the clustering of natural language service description documents. Muth and Inkpen used the term frequency-inverse document frequency (TF-IDF) to extract keywords and then clustered web services according to keywords [15]. However, the service description documents are too short to extract keywords. Some scholars used the latent Dirichlet allocation (LDA) to build a probabilistic topic model and calculated the probability distribution of each service description document on each topic so as to achieve document vectorization and then clustered the vectors [2, 16]. The premise of LDA is to construct the unigram model. However, the corpus of service description document is too few, and the constructed unigram model is weak. Some scholars used the Word2Vec training external corpus to expand service documents to improve the effect of LDA training [17, 18]. However, the size and type of the corpus seriously affect the degree of improvement. Lizarralde used deep variational autoencoders in this work to solve this problem [3]. Cao et al. used the Doc2Vec model to train the service document dataset, converted each document into a vector, and then clustered the vectors [19]. However, there is no reference basis for the selection of vector dimensions, which increases the uncertainty of the results. Zou et al. first trained the WE-LDA model to obtain the probability-topic distribution of each document, then trained the recurrent convolutional neural network (RCNN) to obtain a fitting model from each service document to the probability-topic distribution, and finally clustered the document-feature vectors [20, 21]. However, the structure of RCNN is very complicated, the training effect of RCNN depends on adjusting the parameters, and the training results of the LDA model greatly increase the uncertainty of RCNN. So, it is very difficult to get a suitable model by adjusting parameters. In short, the problem of these methods comes from the uncertainty caused by mining service document features. We noticed that it is difficult to extract features from short documents, but it is easier to compare the differences between short documents, so we use WordNet to quantify document differences and then use multidimensional scaling analysis to construct vectors corresponding to the documents and finally cluster the vectors. In our method, only very few parameters need to be adjusted and our work on adjusting parameters has a theoretical and experimental basis.

3. Study Materials

The experimental data in this paper come from the ProgrammableWeb website. This article uses the WordNet database to calculate semantic similarity, and we will introduce them in detail below.

3.1. ProgrammableWeb

ProgrammableWeb is an information and news source about the Web as a programmable platform. It is a subsidiary of MuleSoft and has offices in San Francisco, CA. The website publishes a repository of web APIs, mashups, and applications and has documented over 22000 open web APIs and thousands of applications in October 2020. It has been called the “journal of the API economy” by TechCrunch [22]. The data in ProgrammableWeb mainly include category, description document, tag, and calling method (see website https://www.programmableweb.com/) (see Figure 1). This paper uses description documents as the main body for service clustering. “Tag” is the auxiliary information given by web service developers, which helps us to preprocess service documents. “Category” is the classification given by web service developers, and we use it as the evaluation index of clustering.

3.2. WordNet

WordNet is an English dictionary established and maintained by the Cognitive Science Laboratory of Princeton University [23]. Because it contains semantic information, it is different from a dictionary in the usual sense. WordNet groups the entries according to their meanings. Each group of entries with the same meaning is called a Synset. WordNet provides a short, summary definition for each Synset and records the semantic relationship between different Synsets. A word may have multiple meanings, which are in different Synsets (see Table 1).

Synset contains a variety of semantic relations, such as upper and lower relation, antisense relation, and whole and part relation (see Figure 2). Based on these relationships, the semantic similarity between Synsets can be calculated. A word may have multiple semantics and parts of speech corresponding to different Synsets. Therefore, the two words have different semantic similarities in different Synsets (see Table 2). We have to choose one of them as the semantic similarity between two words. Some of the existing methods choose the maximum value [24, 25]. This is the basis for calculating the semantic similarity between documents.

4. Study Methods

This section consists of three parts. Section 4.1 introduces the method of calculating the semantic similarity between two service documents. Section 4.2 introduces the method of using semantic similarity to calculate the vector corresponding to the service document. Section 4.3 introduces how to select the appropriate algorithm to cluster the vectors.

The main study methods of this paper are based on (1) obtaining preprocessed documents (PD) through tags and WordNet, (2) calculating the semantic similarity between PDs and then obtaining the semantic distance matrix, (3) using the multidimensional scaling to analyze the semantic distance matrix to obtain the semantic distance vector (SDV) corresponding to each web service, and (4) using the K-means algorithm to cluster the SDVs to achieve clustering of web services (see Figure 3). The multidimensional scaling is used to translate “information about the pairwise “distances” among a set of n objects or individuals” into a configuration of n points mapped into an abstract Cartesian space [26].

4.1. Calculate Semantic Similarity

The basis of calculating document semantic similarity is to calculate the semantic similarity between words. We enumerate all the semantic similarities of two words in different Synsets and select the largest as the semantic similarity of the words [24, 25].

Before calculating the semantic similarity of documents, preprocessing is required. General preprocessing methods include removing punctuation and stop words. This paper considers the particularity of Web service description documents. Except for stop words, there are many words that have nothing to do with document semantics. “Tag” is the auxiliary information given by web service developers according to the research results of Jingli et al. In [27], using tags can filter out the words that are not related to the topic; according to the research results of Shi et al. [28], the more tags two web services have duplicates, the more likely they are to belong to the same category. Therefore, in the process of document preprocessing, we keep words that are semantically similar to tags, thereby removing words that have nothing to do with the subject of the document. We use D to represent the service description document, T to represent the document tag collection, and PD to represent the preprocessed document (see Algorithm 1).

Input: D, T, α
Output: PD (PD is initialized to empty)
FOR each tag in T do:
flag ⟵ 0;
 FOR each word in D do:
  IF WordNet.similarity (tag, word) >α do:
   PD.add (word);
   flag ⟵ 1;
 IF flag = 0 do:
  PD.add (tag);

Regarding the threshold α, since the semantic similarity calculation result of WordNet is between 0 and 1, we adopt an intermediate value strategy and take α as 0.5.

The semantic similarity between the two PDs is determined by the words in the PD (see Figure 4). We can calculate the maximum semantic similarity between each word and all the words on the opposite side. The semantic similarity between two PDs is divided by the sum of length after the similarity is accumulated, which can ensure the symmetry (see Algorithm 2).

Input: PD1 (the length is m), PD2 (the length is n)
Output: sim (the semantic similarity between two PDs)
SUM ⟵ 0;
FOR i ⟵ 1 to m do:
 MAX ⟵ 0;
 FOR j ⟵ 1 to n do:
  IF WordNet.similarity (Ai, Bj) >MAX do:
   MAX ⟵ WordNet.similarity (Ai, Bj);
FOR j ⟵ 1 to n do:
 MAX ⟵ 0;
 FOR i ⟵ 1 to m do:
  IF WordNet.similarity (Ai, Bj) >MAX do:
   MAX ⟵ WordNet.similarity (Ai, Bj);
sim ⟵ SUM/(m + n);

The semantic similarity calculated by WordNet is between 0 and 1.

So, we can get

We assume that the number of web service description documents is n. Through this algorithm, we can get a semantic similarity matrix called . The matrix elements are between 0 and 1, the larger the element value, the higher the semantic similarity. The represents the semantic similarity between the i-th and j-th documents. Obviously, all diagonal elements are 1.

4.2. Multidimensional Scaling Analysis

The problem solved by the multidimensional scaling method is as follows. When the similarity (or distance) of each pair of n objects is given, the representation of these objects in multidimensional space is determined, and the original similarity (or distance) is expressed as much as possible. In other words, two semantic similar web services are represented by two points close to each other in multidimensional space, which creates conditions for clustering [29]. We first introduce data concepts related to multidimensional scaling.

4.2.1. Similar Data and Distance Data

Similar Data. This is the data representing the similarity of two objects. The larger the value is, the higher the similarity is. “Semantic similarity” in the previous article is the similar data.

Distance Data. This is contrary to similar data. The larger the value is, the lower the similarity is.

Only the distance data can be directly used for multidimensional scaling analysis [29].

4.2.2. Distance Matrix

A matrix DIS = (disij)n×n of order n × n, disij is the distance between the i-th object and the j-th object if the following condition is met:

Then, the matrix DIS is a distance matrix.

If there is a positive integer r and there are n points in Rr, X1, X2, …, Xn, such thatthen DIS is called the Euclidean distance matrix [30]. In fact, there is a simpler way to determine whether the distance matrix is a Euclidean distance matrix, which we will introduce in later chapters.

4.2.3. Similarity Coefficient Matrix

A matrix of order n × n, cij is the similarity coefficient between the i-th object and the j-th object if the following condition is met:

Then, matrix C is a similarity coefficient matrix.

If the data are not a distance matrix, it must be transformed into a distance matrix by a certain method in order to carry out multidimensional scaling analysis.

Therefore, the semantic similarity matrix SIM is not suitable for multidimensional scaling analysis. We need to translate semantic similarity into “semantic distance” through inversion. We define “semantic distance” as a value from two service description documents, between 0 and 1. The smaller the semantic distance value, the higher the semantic similarity. The semantic distance matrix is . We need to use appropriate functions to reverse the semantic similarity. There is a classic transformation function (see equation (6)) [3133]. However, from the experimental point of view, the effect of this function is not satisfactory.

The sigmoid function is a commonly used activation function in neural networks [34]. This function expression is shown in the following equation:

The sigmoid function is an increasing function, and it cannot activate the data between 0 and 1, so we need to deform it. We use this function (see equation (8)) to reverse the data.

μ and σ are adjustment coefficients. To ensure that NS is a minus function, μ and σ should be positive real numbers. We adjust μ and σ for many times through experiments and determine that when σ = 20 and μ = 0.3, we can get better results.

So, we can calculate DIS by the following equation:

Let n points in r-dimensional space be expressed as and expressed by matrix as . If the corresponding point of the i-th web service description document is Xi, then the coordinate of Xi is marked as follows:

The purpose of multidimensional scaling analysis is to calculate X. We call X a fitting composition of the semantic distance matrix DIS.

Let , where B is called the central inner product matrix of X, and the construction of matrix B is the premise of multidimensional scaling analysis [29]. Let us first construct matrix according to the following equation:

Next, the matrix is constructed according to equation (12). In equation (12), is an identity matrix of order n, is a square matrix of order n, and any element of matrix is 1.

Finally, matrix B is constructed as follows:

We calculate the n eigenvalues of B and arrange them to obtain

The eigenvectors corresponding to the n eigenvalues are

The sufficient and necessary condition for the semantic distance matrix DIS to be Euclidean distance matrix is [29]. We will discuss two cases of .(1)When , DIS is a Euclidean distance matrix, and all eigenvalues are nonnegative:The dimension of coordinate Xi is r. We need to construct X by using the eigenvector corresponding to r maximum eigenvalues . r can be determined by accumulating the eigenvalues and calculating the proportion of the accumulated sum to the sum of all eigenvalues.α is the threshold given in advance, generally 80% [29]. Then, are selected to construct X.(2)When , DIS is a non-Euclidean distance matrix. And there are negative eigenvalues.r can be determined by accumulating the eigenvalues and calculating the proportion of the accumulated sum to the sum of absolute values of all eigenvalues.α is the threshold given in advance, generally 80%. Then, are selected to construct X.Next, X is calculated as follows:Each line in X corresponds to a web service description document, and the i-th line is Xi. Next, we need to cluster .

4.3. Clustering Algorithm

The existing clustering methods are mainly divided into: layering, partitioning, density-based, model-based, grid-based, and soft computing methods [35]. We project the SDV into a two-dimensional space through PCA [36]. We analyzed the distribution of SDV projections and believed that the partitioning clustering method [37] is suitable for processing our data. We compared each partitioning clustering method through experiments and chose the K-means algorithm. The K-means algorithm is a classic unsupervised learning clustering method, which is used in this paper for service clustering. [38, 39].

At this point, our research methods are all introduced.

5. Experimental Results and Analyses

5.1. Experimental Data

Our experimental data are real data crawled from the ProgrammableWeb. As of October 3, 2020, there were 21956 web services on ProgrammableWeb, totaling 425 categories [40]. The number of web services covered by different topics varies greatly. For example, there are 1020 web services in the category Financial and only one web service in the category IDE. The number is too unbalanced, which seriously affects the clustering effect. For this experiment, we select the categories that contain more than 400 web services. There are 11 categories (), including 6533 web services (see Table 3). These classifications are completed by the developers who publish these web services and are generally considered to be accurate.

5.2. Evaluating Indicator

We use three indexes to evaluate the clustering effect, which are precision, recall, and F-measure. We cluster 6533 web services into 11 clusters, which are expressed as . The three indexes are defined as follows:

5.3. Comparison Method

Our method is compared with these five methods. The introduction is as follows:(1)TF-IDF+K [15]. Keywords are extracted by word frequency and inverse document word frequency, and document-keyword vectors are constructed with keywords. K-means is used to cluster the document-keyword vectors.(2)LDA+K [16]. We use latent Dirichlet allocation to model the documents and then get the topic-word matrix and document-topic vectors. K-means is used to cluster document-topic vectors.(3)Doc2Vec+K [19]. We use the Doc2Vec model to train the documents and convert the documents into vectors. K-means is used to cluster document vectors.(4)RCNN+LDA+K [20]. First train the LDA model to obtain the probability-topic distribution of each document and then train the RCNN network to obtain a fitting model from each service document to the probability-topic distribution. In this process, the feature vector of each service document can be obtained. Finally, cluster the document-feature vectors by K-means.(5)CMD+CT+K. The classical transformation function is used to process similar data, and then multidimensional scaling analysis is carried out (see equation (6)). Finally, K-means clustering is used. We want to demonstrate the effectiveness of our new transformation method through this experiment.(6)WMS. The new method proposed in this paper.

In order to compare the performance of the methods more objectively, we use Algorithm 1 to preprocess documents for all six methods, .

5.4. Comparison of Experimental Results
(1)Algorithm implementation.The distance matrix DIS with a dimension of 6533 can be obtained by processing the experimental data using the method designed above. We need to determine if the DIS is a Euclidean distance matrix. We calculated the eigenvalues and eigenvectors of the DIS and got 6533 eigenvalues (see Figure 5):A total of 2032 eigenvalues are negative, so DIS is a non-Euclidean distance matrix. Let us take α = 80%, and when r = 50, equation (19) is satisfied, so we take the vector dimension as 50.(2)Precision comparison of 6 methods on 5 categories (see Figure 6).(3)Recall comparison of 6 methods on 5 categories (see Figure 7).(4)F-measure comparison of 6 methods on 5 categories (see Figure 8).(5)The average precision, recall, and F-measure of the 6 methods on 11 categories (see Table 4).
5.5. Result Analysis

From the experimental results, the TF-IDF + K method is the worst because the service description document is too short to extract keywords although the clustering effect is improved by adding context information. It should be noted that the LDA + K method, the Doc2Vec + K method, and the RCNN + LDA + K method contain a large number of random processes, resulting in different operation results in each operation. In contrast, the WMS method proposed in this paper not only has stable results but also has the best clustering effect. From the results, the clustering effect of the Doc2Vec + K method and the RCNN + LDA + K method is poor. We believe that these two methods rely on the continuity of the document, but our preprocessing (see Algorithm 1) destroys the continuity of the document. In contrast, the LDA + K method and the WMS method do not have any requirements for document continuity, so the clustering effect is better. And we improved the method of transposing data in multidimensional scaling analysis. Experiments prove that our improvement is effective.

5.6. Selection of Clustering Algorithm

We project the SDV into a two-dimensional space through PCA (see Figure 9).

We can see from Figure 9 that the clusters of SDV data are roughly distributed around a certain center in an elliptical shape, and some clusters are more fused. The partitioning clustering method is suitable for processing such data [39]. And our data belong to numerical type data. There are three typical partitioning clustering methods suitable for processing numerical type data: K-means, K-medoids [41], and Clustering for Large Application (CLARA) [42]. We compared the average precision, average recall, and average F-measure of the three methods (see Table 5). It is finally determined that K-means has the best clustering effect.

5.7. Supplementary Notes

Here, we show how to determine μ = 0.3 and σ = 20 in equation (8). The symmetry center of sigmoid function is , so plays the role of data segmentation. We count the distribution of elements in the matrix SIM (see Figure 10).

It can be found that 0.3 is the segmentation point of frequency, so μ = 0.3. After that, determine μ = 0.3 and record the changes of average F-measure by adjusting σ in equation (8) (see Figure 11).

As can be seen from Figure 11, when σ = 20, the average F-measure has good result.

6. Conclusions

In this paper, we propose a web service clustering method based on semantic similarity and multidimensional scaling analysis. We first used WordNet to calculate the semantic similarity between documents and then obtained the semantic distance matrix. Then, we used multidimensional scaling analysis to get the SDVs. Finally, we used the K-means algorithm to cluster the SDVs. Most of the existing methods vectorize documents by extracting document features. We have proposed a new idea to vectorize documents by comparing the differences between documents. The improvement of the vectorization method leads to the improvement of the clustering effect. Multidimensional scaling analysis is the core of our method. The experimental results show that our method is better than existing methods in precision, recall, and F-measure. And our method is more deterministic than the method based on deep neural network and LDA. And we improved the method of transposing data in multidimensional scaling analysis. Experiments prove that our improvement is effective.

We believe that our method has a major flaw; our algorithm relies on tags and is less robust. For future work, we will improve Algorithm 2 to get rid of the dependence on tags. In addition, service clustering cannot be directly useful to users. For future work, we will use the service clustering method in this article as a basis to improve service composition, service discovery, and other web service tasks.

Data Availability

The data used to support the results of this study are obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This research was supported by the National Natural Science Foundation of China (61572195) and the special fund of Shanghai Economic and Information Commission (sheitc160306).