Mathematical Approaches in Advanced Control Theories 2013View this Special Issue
Sign Inference for Dynamic Signed Networks via Dictionary Learning
Mobile online social network (mOSN) is a burgeoning research area. However, most existing works referring to mOSNs deal with static network structures and simply encode whether relationships among entities exist or not. In contrast, relationships in signed mOSNs can be positive or negative and may be changed with time and locations. Applying certain global characteristics of social balance, in this paper, we aim to infer the unknown relationships in dynamic signed mOSNs and formulate this sign inference problem as a low-rank matrix estimation problem. Specifically, motivated by the Singular Value Thresholding (SVT) algorithm, a compact dictionary is selected from the observed dataset. Based on this compact dictionary, the relationships in the dynamic signed mOSNs are estimated via solving the formulated problem. Furthermore, the estimation accuracy is improved by employing a dictionary self-updating mechanism.
Over the past few years, a number of mobile applications that allow users to enjoy networking have emerged. Correspondingly, there has been a proliferation in mobile online social networks (mOSNs). With the ubiquitous use of mobile devices and a rapid shift of technology, it is worthy to investigate the mOSNs from a privacy or security standpoint [1, 2]. The related applications are also extensive such as authentication and recommendation online. In this context, researches about mobile online networks where two opposite kinds of relationships can occur have become common; people not only form links to indicate friendship, support, or approval but signify disapproval or distrust of the opinions of others. It is natural to model such networks as signed networks, where the sign of a link weight can be either positive or negative, representing the status of a relationship. Analogous to traditional social networks analysis, the relationships in signed mOSNs can be represented as a graph, where nodes denote the objects (e.g., people or mobile terminals) and signed edges denote the relationships or links (e.g., a communication made between two people). The link structure of the resulting graph can be exploited to detect underlying groups of objects, predict missing links, and handle many other tasks [3–17].
One of the most fundamental theories that are applicable to signed social networks is social structural balance [5, 6, 16]. Structural balance corresponds to the possibility of exactly dividing the signed graph into two adversary subcommunities such that all edges within each subcommunity have positive weights while all edges joining agents of different communities have negative weights. Obviously, graphs of nonnegative weights are a special case of structural balance, in which one of the two subcommunities is empty. Since the assumption that structural balance exists in a real signed network might be too extreme, a concept called weak structural balance further generalizes structural balance by discussing the multiadversary-subcommunities partition of the signed graph .
Structural balance and weak structural balance have been shown to be valid to analyze signed networks. For instance, the sign inference problem, which aims to infer the unknown relationship between two objects, can be solved by mining balance information of signed networks from local and global perspectives [8–10, 12–17]. With the help of the result inferred, it is possible to predict the relationships so that legitimate participants can eliminate networking security vulnerabilities. Nevertheless, most of these state-of-the-art methods for sign inference problem are mainly considered from a static point of view, and dynamic scenarios are rarely taken into account. Therefore, it is necessary to establish a rational dynamic network model to infer the sign of relationships.
Actually, there exist several inherent qualities of mOSNs that are challenging to reliably sense the global states of relationships for the large networks in practice . First, in contrast to traditional social networks, the observations of relationships in mOSNs are closely associated with the geographical environment, as well as the relative locations and signal coverage of mobile terminals/network access points. Due to these spatial constraints, such observations, which seem linearly related to the global data of relationships (i.e., linearly sampled from the global data), are bound to miss a significant number of values. Consequently, they are not sufficient to unambiguously infer the true status by the traditional solutions of linear-inverse problem in general. Second, in mOSNs different relations between entities may appear at different times. Accordingly, observations of the networks vary during a time period long enough. These dynamic interactions over time essentially introduce time dimension to the problem of mining, the potential relationship structures. Third, despite maintaining the dynamic performance, the underlying relationships in reality always display some “redundancy” attributed to the gradual/periodic variation , the relative stability, and so forth. Owing to the aforementioned characteristics of mOSNs, the mass redundant data generated in variant scenarios will result in resource challenge. Hence, although many observers collect features for at least part of the networks, there are still serious impediments to reliable large-scale or network-wide data processing. After these aspects of mOSNs are learned, it is reasonable to organize the entire relationship dataset in the form of tensor coincident with its spatiotemporal structure. Meanwhile, efficient relationship inference approaches associated with the tensor model are required to overcome the obstacles of this data processing.
The aim of this paper is to develop algorithms for the sign inference in signed mOSNs in global and spatiotemporal evolvement perspectives. In particular, we assume that the signed mOSN possesses an underlying dynamic weakly balanced complete network structure. Suppose that we are given an incomplete networking observation tensor (or 3-dimensional array), which consists of the adjacency matrices corresponding to the snapshots of the underlying dynamic weakly balanced complete network at times . Then the sign inference task is to estimate the sign patterns of all possible links in the dynamic complete network at time . Utilizing the low-rank property of the weak structural balance and the features extracted from the observation tensor, we consider the inference via the incomplete relationship data as an underdetermined linear-inverse problem and develop an approach via a low-rank matrix reconstruction to solve this problem. Moreover, we regard the observation tensor as the training data set and choose a dictionary from it to improve the validity and efficiency of our inference approach. The dictionary selection method is designed by reducing the size of an overcomplete feature set extracted from the training dataset. Also, a dictionary self-updating mechanism is introduced to improve accuracy of the inference.
Here are the key contributions we make in this paper.(i)A dictionary selection approach based on group sparsity has been designed to generate a set containing minimal sizes of features to increase computational efficiency. Specifically, the observation tensor is considered to be the raw materials for feature extraction.(ii)The sign inference problem referring to the weakly balanced mOSNs is formulated as a low-rank matrix reconstruction from the selected dictionary. Under certain mild conditions, a low-rank matrix reconstruction algorithm is applied to solve the sign inference problem, and it turns out to be much more accurate and efficient than other inference methods in the literature. A dictionary self-updating mechanism is also introduced to adjust the dynamic characteristics of the network and improve the sensing accuracy.
The rest of this paper is organized as follows. In Section 2, we build the model of the dynamic signed network. Some basics of balance theory are also reviewed for the sake of integrality. In Section 3, we first extract the initial candidate feature pool from the observation tensor and propose a dictionary selection approach. Then we propose our low-rank matrix reconstruction method to solve the sign inference problem. The implementation details of the dictionary self-updating procedure are also proposed. In Section 4, we conduct numerical experiments which demonstrate the validity of our network model for sign inference and justify the performance of our methods as well. Finally, we present our conclusions in Section 5.
2. Background and Preliminaries
2.1. Dynamic Signed Network Structure
Formally, a dynamic undirected signed network is represented as a dynamic graph , where is the vertex set of size and is the edge set varying over time. A network snapshot denoted by presents the connections of observed at time . Here, is the subset of and is the adjacency matrix of with the signed weights
Particularly, for each time , a zero entry in is treated as an unknown relationship based on the acknowledgement that some potential attitudes exist between any two entities, even if the relationship itself is not observed. From this viewpoint, we can assume that there exists an underlying dynamics complete signed network , in which only some partial relationships are observed at times , respectively. Correspondingly, we let denote the three-dimensional tensor that contains relationship information between all pairs of entities in . Thus, the observation tensor consisting of a series of network snapshots can be represented as where is the index set of the observed entries. Let be the orthogonal projection operator onto the span of tensors vanishing outside so that the th component of is equal to when and zero otherwise. Then we have (shown in Figure 1) and for each time slice , where and .
While the above kind of signed networks is called homogeneous, that is, relationships of the networks are between the same kinds of entities, a signed network can also be heterogeneous. In a heterogeneous signed network, there can be more than one kind of entities, and relationships between same or different entities can be positive and negative, such as YouTube with two kinds of entities—users and videos. Moreover, this three-dimensional network adjacency tensor can increase dimensions (e.g., spatial dimension, etc.) to adapt to a wider range of scenarios. In this paper, we mainly focus our attention on three-dimensional homogeneous signed networks.
2.2. Weak Structural Balance
Structural balance theory was first formulated by Heider  in order to understand the structure in a network of individuals whose mutual relationships are characterized in terms of friendship and hostility. Formally, a triad is considered balanced if the product of the signs in the triad is positive; that is, it contains an even number of negative edges. This is in agreement with principles such as “a friend of my friend is more likely to be my friend” and “an enemy of my friend is more likely to be my enemy” . The configurations of balanced and unbalanced triads are shown in Figure 2. One possible weakness of this theory is that the defined balance relationships might be too strict. In this perspective, by extending the fundamental beliefs in real networks, weak structural balance is proposed as a way of eliminating the assumption that “the enemy of my enemy is my friend” . Equivalently, the case that “the enemy of my enemy is my enemy” is permitted. Therefore, the local structure of weak balance posits that only triads with exactly two positive edges are implausible and that all other kinds of triads should be permissible (also illustrated in Figure 2).
The formal definition of weakly balanced networks is as follows.
Definition 1 (weakly balanced networks ). A (possibly incomplete) network is weakly balanced if and only if it is possible to obtain a weakly balanced complete network by filling the missing edges in its adjacency matrix. Furthermore, in terms of patterns of global structure, a complete network is weakly balanced if and only if the vertex set can be divided into clusters, , such that all the edges within clusters are positive and all the edges between clusters are negative.
There exists the literature discussing the approaches of clustering and sign prediction with respect to signed networks. Ideas derived from local balance of signed networks can be successfully used to yield algorithms for sign inference [9, 10]. Meanwhile, several works analyze the social interrelations from global perspective of structural balance [8, 13–15, 17]. In particular, it is shown in  that the adjacency matrix of weakly balanced networks has a “low-rank” structure, and the sign prediction methods based on low-rank modeling were proposed as well.
Theorem 2 (low-rank structure of signed networks ). The adjacency matrix of a complete -weakly balanced network has rank 1, if , and has rank for all .
Actually, since the global viewpoint of weak balance stated in Definition 1 obeys clustering characteristics presented in Theorem 2, for , there exists an invertible matrix such that where on the primary diagonal is an -order square matrix whose entries are all 1 () and the other entries of are all . The -order square matrix indicates the th cluster.
Notation. For , let the mixed norm ; the soft-thresholding operator is also defined obeying where and denote the th row of and , respectively . The invertible vectorization is denoted by .
Let be the class of convex functions with Lipschitz gradient . A continuous differentiable function belongs to for some if for any we have both of the following:
3. Sign Inference via Dictionary Learning
In this section, we focus on a solution of the sign inference to estimate connection statuses via dictionary learning. As the preparation, we propose a large-scale dictionary selection method to generate the dictionary for inferring. Assume that we are given a (usually incomplete) network observation tensor sampled from an underlying dynamic weakly balanced complete network with the adjacency tensor . As the description in Section 1, it is reasonable to suppose that most relationships between entities have their own stability in a long period of time in practice and subsequently the change in the scale of each subcommunity is limited. Apparently, this implies the strong dependence retained among the observed data. Combining these assumptions with the low-rank characteristic of weakly balanced complete networks, we extract an initial feature pool from the observation tensor and propose a dictionary selection method to compress the scale of the feature pool in Section 3.1. The corresponding algorithm is presented, respectively, in Section 3.2. With the trained dictionary, we propose our sign inference approach and dictionary updating mechanism in Section 3.3, which are also inspired by the low-rank characteristic of weakly balanced complete networks.
The method we propose to handle the dictionary selection is motivated by the Singular Value Thresholding (SVT) algorithm, which is a simple and efficient algorithm for nuclear norm minimization problems proposed by Cai et al. . Our basic idea is to obtain the optimal solution of the trace norm minimization problem by solving its dual problem whose objective function can be shown to be continuously differentiable with Lipschitz continuous gradient. Specifically, we prove that the optimal solution of the primary problem can be readily obtained from the optimal solution of the dual problem. We first provide a brief review of the standard SVT algorithm.
Considering the problem Cai et al.  give a theoretical analysis that, when , the optimal solution of problem (6) converges to that of the standard problem: Given that , the SVT algorithm operates as a linear Bregman iteration scheme. Furthermore, by defining the Lagrangian function of problem (6) as where is the Lagrangian dual variable, we can derive its dual function as Cai et al. show that SVT indeed optimizes the dual function via the gradient ascent method.
3.1. Large-Scale Dictionary Selection
We address how to select the dictionary given an initial candidate feature pool in this subsection. To this end, we first extract an initial candidate feature pool from , which is sampled from . Since consists of the adjacency matrices (), the matrix in can retain the information of more or less. Thus, we reserve the group of with relatively higher sample rate to extract features. We use singular value decomposition (SVD) to express each as a series of orthogonal bases in Hilbert space; that is, where and are singular vectors of with eigenvalue , . Without loss of generality, we sort the singular values of in descending order, and set Then, due to the low-rank property of the weakly balanced complete adjacency matrix, we keep the group of corresponding to the largest as the features. By this procedure, we extract an initial candidate feature pool as , where each matrix denotes a feature. Equivalently, we can discuss and form the matrix for convenience, where , , , and .
Due to massive data of the initial feature pool , we hope to find an optimal subset to form the dictionary such that the set can be well reconstructed by and the size of is as small as possible. To achieve this goal, we select such that the rest of the features in can be well reconstructed using it. Analogous to the optimization problem in , the basic problem is formulated as follows: where , , and . Apparently, enforces the group sparsity on the variable and the optimal solution usually contains zero rows. This means that not all features in are necessary to be selected to reconstruct any data sample.
Motivated by SVT, we have the equivalent problem of (12) as follows: The Lagrangian function of problem (13) is defined as and its dual function is We first examine the properties of the dual function and then show how to achieve the optimal solution of the problem (13) from its dual optimum directly. As the mixed norm is not differentiable, it is difficult to optimize the dual function directly. However, we can obtain a useful property of the dual function as follows.
Theorem 3. For all , the dual function is continuously differentiable with Lipschitz continuous gradient at most . Furthermore, the primal optimal of the problem (13) is given by when the dual optimal of the problem (13) is obtained.
The proof of Theorem 3 is based on the following results.
Lemma 4. For each and , one has
As a matter of fact, considering the following optimization problem: it is easy to show that the unique solution admits a closed form called the soft-thresholding operator, following a terminology introduced by Donoho and Johnstone ; it can be written that Thus, from a generalized view, one has Lemma 4.
Also, the following result can be deduced based on the properties of Moreau-Yosida regularization .
Lemma 5. For any , one has It follows that is globally Lipschitz continuous with modulus 1.
Proof of Theorem 3. Since and is the Moreau-Yosida regularization of the mixed norm , using the well-known properties of Moreau-Yosida regularization , we get the results that is a globally continuously differentiable convex function. Moreover, and is continuously differentiable with Lipschitz continuous gradient ; that is, for any , where . Then the gradient of can be obtained as follows: It follows that, for any , where the first inequality follows from (20) and . When the dual optimal is obtained, by using the result of (21), we can get This concludes the proof.
Since is the dual function of the objective function (13), is concave. Let which is convex. Thus, the following holds for any : It is also easy to show that belongs to the class and where is the identity matrix. Therefore, we can solve problem (13) by minimizing the objective function ; that is, Therefore, the dictionary is selected by the optimal solution ; that is, the th column of is chosen to be the atom of if . The optimization algorithm is presented in the next subsection.
3.2. Optimization Methods
In this subsection, we develop an efficient optimization algorithm to solve the dual problem (29). Because the objective function is continuously differentiable with Lipschitz continuous gradient, it is feasible to utilize gradient-based optimization methods to achieve the optimal solution for their simplicity and low complexity within each iteration. However, classical gradient-based methods for functions with Lipschitz continuous gradient converge at a rate of , where is the number of iterations during optimization . In fact, this is too slow especially when dealing with large-scale datasets. Note that Nesterov showed in his work  that an accelerated gradient algorithm can be constructed such that , the lower bound on the convergence rate for gradient-based methods , is achieved when minimizing unconstrained smooth functions. With this consideration, in the following we propose an accelerated thresholding algorithm to solve these smooth convex optimization problems using Nesterov’s method with an adaptive line search scheme [19, 26].
We recall Nesterov’s method with an adaptive line search scheme as follows. Take the unconstrained smooth convex minimization problem , for instance, where belongs to , , and . Nesterov’s method for this problem utilizes two sequences: and , . The searching point satisfies where is a tuning parameter. The approximate solution can be computed as a gradient step of as where is the step size. Starting from an initial point , and can be computed recursively according to (30) and (31) and can arrive at the optimal solution . Although it has been shown that Nesterov’s method is a very powerful optimization technique for class , how to choose and in each iteration is a critical issue in Nesterov’s method. When they are set properly, the sequence can converge to the optimal at a certain convergence rate. As a well-known scheme for setting and , Nesterov’s constant scheme assumes and to be constant , while Nemirovski’s line search scheme requires to monotonically increase, and is independent of . Both of the settings result in slow convergence.
To overcome this drawback, an adaptive line search scheme for Nesterov’s method is proposed in . Under the assumption that , the low bound of , is known in advance, this scheme is built upon the estimate sequence  defined as follows.
Definition 6 (estimate sequence ). A pair of sequences and is called an estimate sequence of function if and , for all .
The estimate sequence defined in Definition 6 has the following important property.
Theorem 7 (see ). Let and be an estimate sequence. For any sequence , if , where is the optimal objective function value.
We further specify the estimation sequence in : where the sequences , , and satisfy
Note that Theorem 3 indicates that the objective function satisfies the conditions of using Nesterov’s method with an adaptive line search scheme. Therefore we directly extend Algorithm 2 in  to the high-dimensional scenarios to solve (29). The complete procedures are summarized in Algorithm 1.
In Algorithm 1, the while loop from Step 4 to Step 13 is designed to choose a proper step size to satisfy Step 8. As the Lipschitz gradient of is , is upper bounded by since Step 8 always holds when . In Step 14, we initialize , where and due to the condition in Step 8 . Apparently, when is large, can be adjusted to avoid the step size becoming too small, which may slow down the convergence rate.
3.3. Sign Inference and Dictionary Update Mechanism
This subsection details how to use the dictionary to solve the sign inference problem. Actually, this problem bears similarity to the sign prediction problem in the static signed networks or the unsigned networks varying periodically [3, 8, 11, 12]. In this paper, we intend to infer the unknown relationship between a pair of entities and based on partial relationship observations of the entire dynamic network at time . We expect to accomplish this task with the help of the dictionary constructed by the relationship data for times through . As aforementioned, there exists strong dependence between the connection status at time and the history relationship dataset in the dynamic network. We formulate the sign inference problem as follows: where is the dictionary and is the invertible vectorization of the matrix observed at time ; that is, . Because by using SVD and subsequently , we will estimate in the form of vector and transform the low-rank matrix reconstruction problem into a traditional -norm minimization problem in compressive sensing. We solve (35) by applying backtracking-based adaptive orthogonal matching pursuit (BAOMP) method, which incorporates a simple backtracking technique to detect the previously chosen atoms’ reliability and then deletes the unreliable atoms at each iteration . Then we force that , the element of the resulting matrix, is equal to 1 if or equal to −1 if , to ensure the elements coinciding with the value setting of relationships.
Furthermore, assume that we are given a sequence input samples , where , , the task of the sign inference becomes to reconstruct the complete adjacency matrices one by one. Since the may contain some features which are not included in dictionary, it is necessary to add these features into the dictionary to increase the accuracy of the inference. However, the inferred matrix is not the original matrix exactly and consequently the unobserved relationships are not really known. In contrast, the observed adjacency matrix retains all existing relationships. For this reason, we only use to extract the features rather than the optimal solution of (35). We apply the extracting approach in Section 3.2 and add the complementary features into the dictionary. Note that this operation will continuously increase the scale of the dictionary while the samples keep inputting for inference; the dictionary selection approach proposed in Section 3.2 will be applied to compact the dictionary once the size of the dictionary exceeds a predetermined bound.
4. Numerical Experiment
In this section, we perform experiments on synthetic networks and show that our low-rank model and dictionary learning method outperform other methods on the task of the sign inference for dynamic signed networks. To ensure that our results are reliable, we conduct all experiments 20 times and average out the results from all of the trials.
To construct synthetic networks, we first consider a weakly balanced complete network whose adjacency tensor is . The slide of at time is an adjacency matrix in the form of (3). In addition, only a few patterns of exist in . The observation tensor is formed by sampling some entries from . Concretely, we let the adjacency tensor of consist of 50 matrices of complete 4-weakly balanced structure. For the network , four clusters are generated randomly. The size of each cluster is larger than 20 and the sum of the sizes is 250. We further assume that only a part of network relationships is observed by uniform sampling with probability . It results in entries being randomly sampled from , where is the fraction of observed entries. We choose a set of matrices whose lost rates are from to and apply the approach proposed in Section 3.2 to select the dictionary .
With the dictionary and the given observed matrix at time , the task of the sign inference is achieved by solving (35). We use BAOMP to estimate the complete matrix and compare the performance of our approach to two state-of-the-art methods, alternating least square (ALS)  and singular value projection (SVP) , for the sign inference problem. Different from accuracy defined by the relative error on the observed set in , we utilize the similarity between the inferred matrix and the original one to indicate the accuracy of estimation. The definition of the similarity is . We vary the lost-rate of the original matrix from 0.5 to 0.999 and plot the inference accuracy in Figure 3 (lost-rate: 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.96, 0.97, 0.98, 0.99, 0.995, and 0.999). Apparently, dictionary learning outperforms ALS and SVP. To present our result more clearly, we also use a visual expression in which the white pixels represent 1 and the black pixels represent −1. Figure 4 shows one example of the sign inference and we find that relationships and the clusters can almost be accurately estimated by our inference approach.
In this paper, we establish a low-rank tensor model for the dynamic weakly balanced signed networks. With this model, we first extract the feature pool and propose an approach to extract the compact dictionary from pool. To improve the performance of the selection approach, we derive the corresponding dual problem and introduce an accelerated thresholding algorithm to solve the dual problem. Consequently, the optimal solution of the primary problem can be readily obtained from optimizing the dual problem. In addition, combined with the compact dictionary generation method, the sign inference approach is provided for estimating missing relationships of the dynamic weakly balanced signed networks at a certain time slice. Also, the approach is endowed with the function of the dictionary updating if relationship statuses change.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research was jointly supported by National 973 Program of China (no. 2013CB329204), National 863 Program of China (no. 2011AA01A104), National Natural Science Foundation of China (no. 61100206), Research Fund for Doctoral Program of Higher Education of China (no. 20120005130001), China, and Fund of State Key Laboratory of Information Photonics and Optical Communications (Beijing University of Posts and Telecommunications), China.
N. Jabeur, S. Zeadally, and B. Sayed, “Mobile social networking applications,” Communications of the ACM, vol. 56, no. 3, pp. 71–79, 2013.View at: Google Scholar
J. A. Davis, “Clustering and structural balance in graphs,” Human Relations, vol. 20, no. 2, pp. 181–187, 1967.View at: Google Scholar
C. J. Hsieh, K. Y. Chiang, and I. S. Dhillon, “Low rank modeling of signed networks,” in Proceedings of the Knowledge Discovery and Data Mining Conference (KDD '12), 2012.View at: Google Scholar
L. Lü and T. Zhou, “Link prediction in complex networks: a survey,” Physica A, vol. 390, no. 6, pp. 1150–1170, 2011.View at: Google Scholar
J. Kunegis, A. Lommatzsch, and C. Bauckhage, “The slashdot zoo: mining a social network with negative edges,” in Proceedings of the 18th International World Wide Web Conference (WWW '09), pp. 741–750, ACM, 2009.View at: Google Scholar
F. Heider, “Attitudes and cognitive organization,” The Journal of Psychology, vol. 21, no. 1, pp. 107–112, 1946.View at: Google Scholar
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic, Boston, Mass, USA, 2003.
J. B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms, vol. 2, Springer, Heidelberg, Germany, 2nd edition, 1996.
Y. Nesterov, “A method of solving a convex programming problem with convergence rate O(1/k2),” Soviet Mathematics Doklady, vol. 27, no. 2, pp. 372–376, 1983.View at: Google Scholar
A. Nemirovsky and D. Yudin, Informational Complexity and Efficient Methods for Solution of Convex Extremal Problems, John Wiley & Sons, New York, NY, USA, 1983.
A. Nemirovski, “Efficient Methods in Convex Programming,” Lecture Notes, 1994.View at: Google Scholar
Y. Koren, R. M. Bell, and C. Volinsky, “Matrixfactorization techniques for recommender systems,” IEEE Computer, vol. 42, no. 8, pp. 30–37, 2009.View at: Google Scholar
P. Jain, R. Meka, and I. Dhillon, “Guaranteed rank minimization via singular value projection,” in Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS '10), pp. 934–945, December 2010.View at: Google Scholar