Stability Analysis of Learning Algorithms for Ontology Similarity Computation

Gao, Wei; Xu, Tianwei

doi:https://doi.org/10.1155/2013/174802

Abstract and Applied Analysis

On this page

Abstract Introduction Conclusion Acknowledgments References Copyright Related Articles

Special Issue

Learning Theory

View this Special Issue

Research Article | Open Access

Volume 2013 | Article ID 174802 | https://doi.org/10.1155/2013/174802

Stability Analysis of Learning Algorithms for Ontology Similarity Computation

Wei Gao¹and Tianwei Xu^1,2

Academic Editor: Ding-Xuan Zhou

Received27 Feb 2013

Revised09 May 2013

Accepted09 May 2013

Published04 Jun 2013

Abstract

Ontology, as a useful tool, is widely applied in lots of areas such as social science, computer science, and medical science. Ontology concept similarity calculation is the key part of the algorithms in these applications. A recent approach is to make use of similarity between vertices on ontology graphs. It is, instead of pairwise computations, based on a function that maps the vertex set of an ontology graph to real numbers. In order to obtain this, the ranking learning problem plays an important and essential role, especially k-partite ranking algorithm, which is suitable for solving some ontology problems. A ranking function is usually used to map the vertices of an ontology graph to numbers and assign ranks of the vertices through their scores. Through studying a training sample, such a function can be learned. It contains a subset of vertices of the ontology graph. A good ranking function means small ranking mistakes and good stability. For ranking algorithms, which are in a well-stable state, we study generalization bounds via some concepts of algorithmic stability. We also find that kernel-based ranking algorithms stated as regularization schemes in reproducing kernel Hilbert spaces satisfy stability conditions and have great generalization abilities.

1. Introduction and Motivations

The study of ontology deals with questions concerning what entities exist and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences. The developed tools have been widely applied in medicine, biology, and social science. In computer science, ontology is defined as a model for sharing formal concepts and has been applied in intelligent information integration, cooperative information systems, information retrieval, electronic commerce, and knowledge management. After a decade’s development, ontology technology has matured as an effective model of hierarchical structure and semantics for concepts, supported by systematic and comprehensive engineering theory, representation, and construction tools.

Ontology similarity computation is an essential part in practical applications. In information retrieval, it has been used to compute semantic similarity and search for concepts. We take a graph-theory approach and represent an ontology by a weighted graph . In this setting, is the (finite) set of vertices corresponding to concepts or objects of the ontology, is a set of edges, and is a weight function. For two vertices and representing two concepts, the weight measures their similarity in the ontology.

Example 1. In some applications of ontology similarity computation, the weight function takes values on . Then the case means that and represent the same concept while means that these two concepts have no similarity. In information retrieval, with a threshold parameter , when one tries to find related information of the concept , all concepts satisfying are returned, which means that and have a high similarity.

Traditional methods for ontology similarity computation are based on pairwise similarity calculation. Their computational complexity is high, and they required selection of many parameters, which are not so intuitive. In this paper, we use a learning theory approach. The idea is to learn a scoring function and then to determine the similarity between vertices (concepts) and by their value difference : the smaller the difference the higher the similarity. Formally, if and only if . Such an inspiring approach was introduced from the viewpoint of ranking in [1] where a ranking algorithm is used for learning from samples a scoring function with small ranking error. The method was employed in the ontology setting in [2] which demonstrates accuracy and efficiency. Another possible way to learn such a function is by a graph Laplacian and taking an eigenvector associated with its second smallest eigenvalue. See [3–6] for details. This method requires a positive definiteness condition for a similarity matrix which is hard to check in our setting. Also, when the size of the graph is large, the computational complexity is high.

In this paper, we explore the learning theory approach for ontology similarity computations in a setting when the ontology graph is a tree. It is a connected graph without cycle. Thus, there is a unique path between any two vertices. The tree structure gives restrictions on similarity of vertices (concepts). For example, we assign a top vertex and let it be the root, then denote the degree (the number of edges that link to a vertex) of the top vertex. Let be the neighbor set of . If there is a path from one vertex to through , then it belongs to branch . Thus, we have branches in the tree and any two vertices belonging to different branches have no edge between them. The concepts in the same branch of the tree should have higher similarity, compared with concepts in different branches. This observation motivates us to apply the -partite ranking algorithm [7] in which the parts correspond to the classes of vertices of rates. The rate values of all classes are decided by experts. Intuitively, a vertex of a higher rate is ranked higher than any vertex of rate if . Thus, the -partite ranking algorithm is reasonable to learn a similarity function for an ontology graph with a tree structure.

The main contribution of this paper is to state some ontology computations as a -partite ranking problem and to conduct stability analysis of the algorithms with mild conditions, which leads to useful error bounds for ontology applications.

The organization of the rest part of this paper is as follows. The setting and main results are given in the next section. The generalization bounds for learning algorithms will be shown in Section 4. The stability and generalization bounds for the learning algorithms stated as regularization schemes in reproducing kernel Hilbert spaces will be discussed in Section 5.

2. Formal Setting and Main Results

Now, we state our learning algorithm for ontology similarity computation.

Let be the finite set of vertices of an ontology graph. It is divided into disjoint subsets corresponding to rates. Let be a probability measure on .

The performance of a ranking function can be measured by the following concept.

Definition 2. A ranking loss function is a function that assigns, for and , a nonnegative real number interpreted as the loss of in its relative ranking of and . The expected -partite ranking error on the ontology graph for a ranking function associated with the ranking loss function is defined as where is the conditional distribution of on .

Example 3. One commonly used ranking loss function is the hinge ranking loss defined as where . Another ranking loss function is the -ranking loss with a smoothing parameter defined as

Learning algorithms are implemented with a sample of size , called a preference graph, which is assumed here to be independently drawn according to . It can also be divided into parts , where consists of those sampling points of rate .

In [8], Agarwal and Niyogi have studied the algorithmic stability in a general setting, where the training examples take labels for some . A goal ranking function ranks future instances with larger labels higher than those with smaller labels. Here, our setting is more specific. The learner is given a preference graph consisting of disjoint parts corresponding to the classes of vertices. Every part has a rate value. The target ranking function ranks future instances in higher-rate parts higher than those in lower-rate parts.

A large class of learning algorithms is generated by regularization schemes. They penalize an empirical error which is chosen here to be the empirical -partite ranking error on the ontology graph defined for a associated with the sample as

In this paper, we study a learning algorithm generated by a regularization scheme in a reproducing kernel Hilbert space (RKHS) associated with a Mercer kernel . Now, the regularization scheme is defined by where is a regularization parameter. On the selection of the regularized parameter, readers are referred to [9, 10] for more details about the method of cross-validation.

One point we need to emphasize that we abuse terminology for the sake of better readability. If the ranking function does not associate with RKHS (for instance, in Lemma 9, Theorems 10 and 12), then the second term in the right-hand side of (5) vanishes.

Our error analysis provides a learning rate of algorithm (5) when the ranking loss is -admissible.

Definition 4. Let be a ranking loss, , and a class of real-valued functions on . We say that is -admissible with respect to if for any , and ,

Let us state the estimate of learning rates which will be proved in Section 5.

Theorem 5. Let be a RKHS such that for all . Let be a ranking loss, -admissible with respect to , and bounded by some , such that is convex with respect to . Let be a fixed function in satisfying for some . Then, for any , with confidence at least , one has

Form Theorem 5, we see that if and (e.g., ), then converges with confidence to . The quantity is well understood in the literature related to learning theory (e.g., [11–14]).

3. Stability Analysis

An algorithm is stable if any change of a single point in a training set yields only a small change in the output. It is natural to consider that a good ranking algorithm is one with good stability; that is, a mild change of samples does not necessarily lead to too much change in the ranking function. Some analysis of the stability of ranking algorithms is given in [1, 8, 15].

Let and be the th element in for . Let be the sequence obtained by replacing in by a new sampling point of rate . We define some notions of stability for -partite ranking algorithms.

Definition 6 (uniform loss stability for a -partite ranking algorithm on ontology graph). Let A be a -partite ranking algorithm for ontology whose output on a preference graph is denoted by . Let be a ranking loss function and for . We say that has uniform loss stability with respect to if for all , , and , we have for all , but belong to different rate,

Definition 7 (uniform score stability for a -partite ranking algorithm on ontology graph). Let A be a -partite ranking algorithm for ontology whose output on a preference graph is denoted by . Let be a ranking loss function and for . We say that has uniform score stability with respect to if for all , , , , and , and for all ,

The main tool used here is McDiarmid’s inequality, which bounds the deviation of any function of a sample on which a single change in the sample has limited effect.

Theorem 8 (see [16]). Let be independent random variables, each taking values in a set . Let such that for each , there exists a constant such that Then, for any ,

In what follows, denotes a training sample set obtained by replacing in by by for , and . Also, and are simply denoted by and , respectively. We only consider the case of sample replacements with the same rate: for some ontology graphs, the graph structure is fixed; hence the members of vertices and edges in each branch are fixed.

4. Generalization Bounds for Stable -Partite Ranking Algorithms on Ontology Graph

From this section, our analysis for stability of -partite ranking algorithms is stated on an ontology graph and our organization follows [8]. In this section, generalization bounds for ranking algorithms that exhibit good stability properties will be derived. Our tricks are based on those of [17]. We start with the following technical lemma.

Lemma 9. Let A be a symmetric -partite ranking algorithm for ontology whose output on a preference graph is denoted by , and let be a ranking loss function. Then, for all , , and , , one has

Proof. We have By symmetry, the term in the summation is the same for all . Therefore, we get Interchanging the roles of with and with , we get Since the results follow.

We are now ready to give our main result of this section, which bounds the expected -error of a ranking function learned by a -partite ranking algorithm with good uniform loss stability in terms of its empirical -error on the training sample. The proof follows [18].

Theorem 10. Let A be a symmetric -partite ranking algorithm for ontology whose output on a preference graph is denoted by , and let be ranking loss function such that for all and . Let for such that A has uniform loss stability () with respect to . Let . Then, for any , with confidence at least , one has

Proof. Let be defined by We show that satisfies the condition for McDiarmid’s inequality. To this end, let . For each , we have These give Similarly, it can be shown that for any , , Thus, applying McDiarmid’s inequality to , we get for any , Now, by Lemma 9, we know that the expectation can be bounded as
Thus, for any , The result follows by setting the right-hand side equal to and solving it for .

For any , and any -partite ranking algorithm with good uniform loss stability with respect to , Theorem 10 can be applied to bound the expected ranking error of a learned ranking function in terms of its empirical -error on the training sample. The following lemma shows that, for every , a ranking algorithm with good uniform score stability also has good uniform loss stability with respect to . Using the techniques of Lemma 2 in [17], and taking in Example 3 as , the following lemma can be obtained immediately.

Lemma 11. Let A be a -partite ranking algorithm for ontology whose output on a preference graph is denoted by . Let for such that A has uniform score stability . Then, for every , A has uniform loss stability with respect to the ranking loss , where for all ,

Combining Theorem 10 and Lemma 11, we get the following result which bounds the expected ranking error of a learned ranking function in terms of its empirical -error for any ranking algorithm with good uniform score stability.

Theorem 12. Let A be a -partite ranking algorithm for ontology whose output on a preference graph is denoted by . Let for such that A has uniform score stability , and . Denote . If is a ranking loss satisfying for all and , then for any , with probability of at least ,

Proof. One applies Theorem 10 to with the ranking loss (using Lemma 11), which satisfies . One finishes the proof thanks to the fact that

5. Stable Ranking Algorithms

In this section, we will demonstrate stability of some ranking algorithms in which a ranking function is selected by minimizing a regularized objective function. A general result for regularization-based -partite ranking algorithms will be derived in Section 5.1. In Section 5.2, this result is used to illustrate stability of kernel-based -partite ranking algorithms that perform regularization in a reproducing kernel Hilbert space. These stability results are also used to achieve consistency theorem for kernel-based -partite ranking algorithms in Section 5.3.

5.1. General Regularizers

Let be given a ranking loss function, a class of real-valued functions on , and a regularization functional. Consider the following regularized empirical -error of a ranking function (with respect to a preference graph ) with regularization parameter , We consider -partite ranking algorithms that minimize such a regularized objective function; that is, ranking algorithms that, given a preference graph , output a ranking function that satisfies for some fixed choice of ranking loss , function class , regularized , and regularization parameter . We derive a general result below that will be useful for showing stability of such regularization-based algorithms.

Lemma 13. Let be a ranking loss such that is convex in . Let be a convex set of real-valued functions on , and let such that is -admissible with respect to . Let , and let be a functional defined on such that for preference graph , the regularized empirical -error has a minimum (not necessarily unique) in . Let A be a -partite ranking algorithm for ontology defined by (29). Let , , and . For brevity, denote and let Then for any and , one has

Proof. Recall that a convex function satisfies Since is convex in , is convex in . Therefore, for any , we have and also (interchanging the roles of and ), Adding the above two inequalities yields Now, since is convex, and . Since minimizes in and minimizes in , we have Adding these two inequalities and applying (36), we get Similarly, for , we have The results follow.

As we will see below, the above result can be used to establish stability of some regularization-based ranking algorithms.

5.2. Regularization in Reproducing Kernel Hilbert Spaces

Let be a reproducing kernel Hilbert space (RKHS) of real-valued functions on associated with a Mercer kernel . Here, is defined as , and the reproducing property of gives that for all and all , where denotes the RKHS inner product in . By the Schwartz inequality, it is easy to show that for all and all , where denotes the RKHS norm in . We consider ranking algorithms that perform regularization in the RKHS using the squared norm in as regularizers. Specifically, let be the regularizer defined by It will be demonstrated below that if for some , for any , then a ranking algorithm that minimizes an appropriate regularized error over , with regularizer defined as above, has good uniform score stability.

Theorem 14. Let be an RKHS with kernel such that for all . Let be a ranking loss such that is convex in and is -admissible with respect to . Let , and let be given by (42). Let A be the -partite ranking algorithm for ontology that, given a preference graph , outputs a ranking function defined by (29). Then, A has uniform score stability with

Proof. Let and .
Applying Lemma 13 with , we get (using the notation in the proof of Lemma 13) that Since is a vector space, , , and , so and are well defined. It is easy to check that Combined with (44), this gives Since (as noted above) , this together with (41) gives It follows that This together with (41) tells us that for any , Similarly, we can also obtain The conclusion follows.

Theorems 12 and 14 give the following generalization bound for kernel-based ranking algorithms.

Corollary 15. Under the conditions of Theorem 14, one has that for any , with probability of at least over the draw of , the expected ranking error of the ranking function learned by the regularized algorithm associated with the ranking loss is bounded by

The result of Corollary 15 shows that a larger regularization parameter leads to better stability and, therefore, a tighter confidence interval in the resulting generalization bound.

Under the conditions of the above results, a kernel-based ranking algorithm minimizing a regularized empirical -error also has good uniform loss stability with respect to ; this follows from the following simple lemma.

Lemma 16. Let be a class of real-valued functions on , and let A be a -partite ranking algorithm for ontology that, given a preference graph , outputs a ranking function . If A has uniform score stability and is a ranking loss that is -admissible with respect to , then A has uniform loss stability with respect to , where for all ,

The proof of this result can follow the proof of Lemma 13 in [8]. Using Theorem 14 and Lemma 16, we can immediately get the following corollary.

Corollary 17. Under the conditions of Theorem 14, A has uniform loss stability with respect to , where for all ,

5.3. Consistency

We can also use the above results to show consistency of kernel-based ranking algorithms. In particular, let denote the optimal expected -error in an RKHS (for a given distribution): Then, for a bounded loss function , we can show that with an appropriate choice of the regularization parameter , the expected -error of the ranking function learned by a kernel-based ranking algorithm that minimizes a regularized empirical -error in converges (in probability) to this optimal value. We first show the following simple lemma.

Lemma 18. Let be a fixed ranking function, and let be a bounded ranking loss function such that for all and . Then, for any , with probability of at least ,

Proof. Define as Then, . We show that satisfies the condition of McDiarmid’s inequality. For each and , we have Therefore, applying McDiarmid’s inequality, we know that for any , The result follows by setting the right-hand side equal to and solving it for .

We are now in a position to prove our main result (Theorem 5).

Proof of Theorem 5. We use Corollary 17 and apply Theorem 10 with to get that with probability of at least , Clearly, Applying Lemma 18 to with , we, thus, get that with probability of at least : One finishes the proof by combining the inequalities in (59) and (61), each of which holds with probability at least , together with the condition in (7).

6. Conclusion

The main focus of this paper is on studying the stability and generalization properties of -partite ranking algorithm used for ontology computation. This algorithm shows good intuition about the vertex in ontology graph mapping to a vertex in a line. The representation of vertices in ontology graph does not take real-valued labels, and the samples are given by preference graph (pairwise vertices in different ranking rates). This setting is suitable for ontology. We have derived generalization bounds for -partite ranking algorithms in this setting using the notion of algorithmic stability. It is also shown that -partite ranking algorithms revealing good stability properties have good generalization properties. Our results are applied to obtain generalization bounds for kernel-based -partite ranking algorithms that perform regularization in a reproducing kernel Hilbert space.

Acknowledgments

The authors thank the reviewers for their constructive comments and detailed suggestions for improving the quality of this paper. This work was supported in part by the Key Laboratory of Educational Informatization for Nationalities, Ministry of Education, the National Natural Science Foundation of China (60903131) and Key Science and Technology Research Project of Education Ministry (210210).

References

S. Agarwal, “Ranking on graph data,” in Proceedings of the 23rd International Conference on Machine Learning (ICML '06), pp. 25–32, June 2006.
View at: Google Scholar
Y. Y. Wang, W. Gao, Y. G. Zhang, and Y. Gao, “Ontology similarity computation using ranking learning method,” in Proceedings of the IEEE International Conference Computational Intelligence and Industrial Application, pp. 20–23, 2010.
View at: Google Scholar
M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Computation, vol. 15, no. 6, pp. 1373–1396, 2003.
View at: Publisher Site | Google Scholar
W. Gao and D. X. Zhou, “Convergence of spectral clustering with a general similarity function,” Science China Mathematics, vol. 42, no. 10, pp. 985–994, 2012.
View at: Google Scholar
U. von Luxburg, M. Belkin, and O. Bousquet, “Consistency of spectral clustering,” The Annals of Statistics, vol. 36, no. 2, pp. 555–586, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
S. Smale and D.-X. Zhou, “Geometry on probability spaces,” Constructive Approximation, vol. 30, no. 3, pp. 311–323, 2009.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
S. Rajaram and S. Agarwal, “Generalization bounds for k-partite ranking,” in Proceedings of the NIPS Workshop on Learning to Rank, pp. 28–33, British Columbia, Canada, 2005.
View at: Google Scholar
S. Agarwal and P. Niyogi, “Generalization bounds for ranking algorithms via algorithmic stability,” Journal of Machine Learning Research, vol. 10, pp. 441–474, 2009.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
A. Caponnetto and Y. Yao, “Cross-validation based adaptation for regularization operators in learning theory,” Analysis and Applications, vol. 8, no. 2, pp. 161–183, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
G. H. Golub, M. Heath, and G. Wahba, “Generalized cross-validation as a method for choosing a good ridge parameter,” Technometrics, vol. 21, no. 2, pp. 215–223, 1979.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
T. Hu, “Online regression with varying Gaussians and non-identical distributions,” Analysis and Applications, vol. 9, no. 4, pp. 395–408, 2011.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
S. Smale and D.-X. Zhou, “Estimating the approximation error in learning theory,” Analysis and Applications, vol. 1, no. 1, pp. 17–41, 2003.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
H.-Y. Wang, Q.-W. Xiao, and D.-X. Zhou, “An approximation theory approach to learning with $ℓ^{1}$ regularization,” Journal of Approximation Theory, vol. 167, pp. 240–258, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
D. C. Yang and D. Y. Yang, “Real-variable characterizations of Hardy spaces associated with Bessel operators,” Analysis and Applications, vol. 9, no. 3, pp. 345–368, 2011.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
S. Agarwal, “Learning to rank on graphs,” Machine Learning, vol. 81, no. 3, pp. 333–357, 2010.
View at: Publisher Site | Google Scholar
C. McDiarmid, “On the method of bounded differences,” in Surveys in Combinatorics, vol. 141, pp. 148–188, Cambridge University Press, 1989.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
S. Agarwal and P. Niyogi, “Stability and generalization of bipartite ranking algorithms,” in Proceedings of the 18th Annual Conference on Learning Theory, vol. 3559 of Lecture Notes in Computer Science, pp. 32–47, 2005.
View at: Google Scholar
O. Bousquet and A. Elisseeff, “Stability and generalization,” Journal of Machine Learning Research, vol. 2, no. 3, pp. 499–526, 2002.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet

Copyright

Copyright © 2013 Wei Gao and Tianwei Xu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1136

Downloads

1343

Citations