Abstract

In today’s Internet, a web user becomes members of multiple social networks due to different types of services provided by each of these networks. This creates an opportunity to make trust decisions that go beyond individual social networks, since these networks provide single perspective of trust. To make trust inference over multiple social networks, these networks need to be consolidated. It is nontrivial as these networks are of heterogeneous nature due to different naming conventions used in these networks. Furthermore, trust metrics extracted from these networks are also varied in nature due to different trust evaluation algorithms used in each of these networks. Heterogeneity of these social networks can be overcome by using semantic technologies as it allows us to represent knowledge using ontologies. Trust data can be consolidated by using such data fusion techniques which not only provide but also preserve trust data integrity from each of the individual social network profiles. The proposed semantic framework is evaluated using two sets of experiments. Through simulations in this work, we analysed various techniques for data fusion. For identifying suitable technique that preserves the integrity of trust consolidated from each of the individual networks, analysis revealed that Weighted Ordered Weighted Averaging parameter best aggregated trust data, and, unlike other techniques, it preserved the integrity of trust from each individual network for varying participant overlap and tie overlap (). Similarly, for experimental analysis, we used findings of the simulation study about the best trust aggregation technique and applied the proposed framework on real-life trust data between participants, which we extracted from pairs of professional social networks. Analysis partially proved our hypothesis about generating better trust values from consolidated multiple heterogeneous networks. We witnessed an improvement in overall results for all the participants who were part of multiple social networks (), while disproving the claim for those existing in nonoverlapping regions of the social networks.

1. Introduction

At present, online social networks (OSNs) replace real-world social networks, where people interact with each other remotely [1, 2]. Due to these social networks, several interactions and activities that required physical interaction are conveniently possible now while sitting at distant locations through World Wide Web. A survey conducted by Pew (https://www.pewresearch.org/internet/fact-sheet/social-media/) in February 2019 about the use of online social networks in the US disclosed that adult web users account for 72% of utilizing online social networks for online interaction [3]. The same survey also revealed statistics about the use of multiple social networks. It stated that 56% of online adults use more than one social network, wherein 95% of Twitter users and 92% of LinkedIn users also use Facebook. It indicates the increasing trend of using multiple social networks due to the different nature of services provided by these networks. We describe this situation as individuals belonging to multiple heterogeneous (MuHe) social networks; “multiple” because there is more than one network structure, and “heterogeneous” because the networks represent different types of relationships. MuHe networks usually are made up of networks owned and managed by various organizations, so, typically, users have different virtual identities in each system.

The researchers have developed a range of trust algorithms for individual social networks [4, 5]. These examine personal information and interaction history of the users to calculate trust metrics in their respective networks. Such mechanisms have become essential features for some successful social networks; for example, the eBay (http://www.ebay.com) network evaluates the reputation of sellers based on the ratings of buyers, which helps it to ensure high standards of online shopping. Similarly, expertise recommendation mechanism in the LinkedIn (http://www.linkedin.com) network uses recommendations provided by other professionals in the network, which are of great help for new users.

There are several definitions of trust in the literature, but we have taken the definition as given by [6] as it is both general and concise: “Trust of party X to a party Y for a service A is a measurable belief of X in that Y behaves dependably for a specified period (and within a specified context in relation to service A)” [3].

OSNs may be categorised based on the services provided by these networks such as friendship networks, professional networks, and e-commerce networks. When considered in context of trust, these networks represent multifaceted information which may be helpful if trust is to be modelled for multiple networks. However, existing trust mechanisms tend to be restricted to a single network, whereas users are typically involved in MuHe networks. Basing trust decisions on MuHe networks would have two distinct advantages: (1) it increases the chance of calculating a trust value, as individuals need not share the same network, and (2) it bases trust values upon diversified information that can reflect accurately a user’s behaviour on multiple social networking platforms. Our definition of trust notes that trust is calculated in a specific content, and therefore we expect that MuHe networks will be related to a particular trust domain (such as professional networks) rather than a general aggregation of all these networks in which individuals are present.

Unfortunately, the task of linking multiple social networks to generate a single big social network for performing trust calculations is not trivial, the reason being varying structures of the networks and weights on the links between professionals on these diverse networks [7, 8]. The aggregation mechanism should not inflate or dampen trust values artificially based on the availability of information from either some or all of the constituent networks. It should be able to particularly differentiate between absence of trust and distrust as unavailability of information from certain networks does not mean distrust.

This paper proposes a novel framework for performing trust calculations on MuHe networks. The framework is based on semantic web technology and uses data fusion techniques such as Weighted Ordered Weighted Averaging to aggregate individual networks without distorting trust values. We then present two evaluations of this framework. The first uses a simulation environment to look at the impact of consolidating two networks on their trust properties (strength of trust ties and length of trust path); based on this simulation, we then select the most appropriate data fusion technique. The second uses the proposed framework and this chosen technique to conduct a comparative evaluation of trust calculations based on existing single social networks against those based on MuHe networks. It uses real-world trust values elicited from participants as the gold standard.

The remainder of the paper is structured as follows: the related work about semantic web, data fusion, and online trust is described in Section 2. Section 3 presents the proposed semantic web framework and gives an overview of the data fusion techniques. It further describes the characteristics by which different data fusion techniques can be compared. Section 4 presents the simulation experiment and justifies the choice of data fusion technique. Section 5 presents the real-word experiment that compares trust values calculated using existing single networks to those calculated using MuHe networks. Finally Section 7 presents potential future extensions and concludes the paper.

There are various studies reported in the literature, which attempt to consolidate multiple social networks, but they merely focus on combining these networks rather than exploring their impact on trust-related measures. For example, the work in [9] attempts to merge trust and distrust relations from multiple trust networks but lacks two dimensions; first, it fails to differentiate between distrust and absence of trust, and, second, the impact of that consolidation on the accuracy of trust metrics is not examined [3].

In semantic web, the concept of coreference resolution resolves the problem of users having distinct identities in multiple social networks. There are many existing methods that address this issue such as [10], which discusses two methods of URI coreference resolution: (1) logical inference and (2) label comparison. Logical inference matches IFPs (Inverse Functional Properties) to evaluate whether a pair of URIs are coreferred, while label comparison compares data properties to classify URI pairs as coreferred or non-coreferred URIs. The URIs classified as coreferred may be linked using the predicate owl:sameAs provided as part of the OWL DL specification in the semantic web [11, 12]. Trust data can be annotated using either of the URIs defined in the existing network or by using afresh generated URI using the target namespace [13]. The resultant annotated information in the MuHe environment may be published as a separate named graph [14]. It helps those reusing the data to scope down their queries to target graph rather than writing long query patterns over existing graphs.

Multiple trust values emerge between coreferred users when MuHe networks are consolidated. These trust values represent subjective trust in the context of a particular network. While integrating these values, the data fusion algorithm should respect the trust integrity of these social networks. A number of data fusion techniques exist in the state-of-the-art literature such as [15], where the authors have proposed an aggregation operator based upon Ordered Weighted Averaging (OWA), which takes into account the multiple trust values based upon relative importance by ranking data values in descending order. Similarly, another technique, WOWA, proposed by [16] considers importance of both data and their sources. IOWA behaves similar to WOWA but allows ranking data points with respect to different trust sources [17, 18].

Furthermore, there are different implementations in the literatures that have used these techniques for aggregating trust. For example, the work in [19] collects trust scores between any two users from multiple paths in a single network and aggregates them using different data fusion techniques. This scenario is similar to consolidating trust metrics from multiple social networks. The results of the knowledge awarding-OWA (K-OWA) and knowledge awarding-averaging (KAAV) approaches showed better performance when compared with other techniques. Similarly, [20] presents aggregation techniques used for generating trust scores over transitive triads in a realistic fashion. Advanced deep learning techniques have been used in order to perform recommendation based upon trust score in social networks [21].

Consolidation of MuHe networks also helps to discover links among isolated users. This emanates the scenario of quantifying indirect trust and trust decay is one of the approaches discussed in the literature. Trust decay uses the principle of trust transitivity and was first discussed by a psychologist to analyse the existence of transitivity in real-world social networks [22]. The experiment was conducted to test the transitivity of positive interpersonal sentiments. This work was later extended by social psychologists to examine this in terms of social relations by running an experiment over a set of 917 sociograms [23]. A random group of people were asked about their sentiments towards other people in the group. They found that in 70% of the cases there was a strong inclination towards the transitivity.

A number of trust algorithms are developed using the concept of trust decay; for example, [24] proposed an approach for trust and distrust propagation, “Appleseed,” which uses the theory of spreading activation [25]. According to that, trust or distrust in a friend flows along all paths leading from the friend. They chose a realistic decay factor based on the empirical experiment and a normalised local edge weight, , is assigned to each link in the network. The work in [26] uses reinforcement learning for calculating local trust values in social networks. The trust from the source is propagated towards the destination using different strategies. Experimental results revealed that the hybrid approach of weighted mean aggregation and min-max over shortest paths turned out to be the best approach. Similarly, [27] presents an idea of incorporating multiple trust paths of varied lengths for enhancing accuracy of trust calculations. The decay of trust is considered and trust is calculated for both shortest and longest trust paths. Results showed that shorter paths generate more accurate trust measures than longer paths. The method proposed in [28] extracts domain-specific trust network from large-scale heterogeneous network, with the claim that trust propagation is a domain-dependent phenomenon and cannot work through heterogeneous relations [3]. Another study elaborates trust decay as a function of leakage in network flow and proposed network flow based trust evaluation scheme GFTrust [29]. The work carried out in [30] develops an algorithm for Trust Path Searching (TPS) to evaluate trust for indirectly connected users using the principle of trust transitivity. The research conducted in [31] summarizes different trust decay methods used by the literature and provides comparative analysis of these techniques.

3. Proposed Semantic Web Framework

The proposed framework uses semantic web technology to model constituent networks in uniform format and then allows different data fusion techniques to be used to integrate them. The key idea behind semantic web is that each resource should have a unique identifier (URI), but, in practice, different online networks have different namespaces for knowledge representation. Resultantly, individuals have multiple URIs across MuHe social networks. The proposed framework should resolve multiple URIs, which refer to the single user in the real life. Further, it should provide a mechanism to aggregate multiple trust values that originate between users due to their existence in MuHe networks.

The MuHe consolidation framework is shown in Figure 1, which presents our solution for building trust applications over multiple distributed social networks. The experimental setup to test the feasibility of framework for linking multiple networks resides on top of the Sesame triplestore (http://rdf4j.org/). A number of preprocessing modules are written in Python (https://www.python.org/), which gather data from multiple sources (including local RDF files, local and remote triplestores) and convert them into a single semantic representation for consolidation. They then apply different trust evaluation algorithms for calculating trust metrics for both directly and indirectly connected users.

The first module, named Data Acquisition Module, uses Python SPARQL wrapper classes (http://rdflib.github.io/sparqlwrapper/) for extracting RDF data between users from MuHe networks. It particularly targets information that can help in building linked networks with nodes of the network representing people and weights on the links show trust ranks between them.

The remaining three modules are more complex and are described in the following subsections.

3.1. Coreference Resolution Module

The coreference module locates different URIs from MuHe networks (defined in different ontologies) that represent the same person and allows us to generate afresh URI from the target namespace (defined in our own ontology). The resultant annotations may be added as a separate graph linked with existing graphs using owl:sameAs predicates. The concept of named graph [14, 32] is helpful when consolidating MuHe Networks. It allows us to represent consolidated networks as a separate layer built over existing graphs acting as an overlay network. The sample representation of such a network is shown in Figure 2. The individual networks layer represents two publication networks, where square shapes represent publications, while researchers are depicted using oval shape. This layer is linked with existing published information using owl:sameAs statements specifying which of the URIs in the consolidated network correspond to URIs in each of the individual networks. Once published, this eliminates the need to write long query patterns over individual networks for coreferencing or retrieving trust information between users.

The intuitive way to identify coreferred users from these networks is to compare metadata. In semantic web, this information can be extracted from owl:DP (data property) predicates. Equation (1) provides a rule to perform coreference resolution for two URIs: ?1 and ?a. Here, ?1 and ?a are URIs in individual networks and ?1a is the newly allocated URI in the aggregated single network. ?p represents the set of data properties selected for comparison. If both data properties hold the same values for both the URIs, ?1 and ?a, then they are resolved to represent the same person in multiple networks and the resultant URI may be ?1a in the consolidated version of these networks. Further, it can be linked with both ?1 and ?a in individual networks using owl:sameAs predicate, thereby stating that both URIs are the same. Note that the numeric value ?1 and a character ?a in individual networks and ?1a in consolidated version are shorthand of the original URIs.

An ontology is needed for making annotations in the consolidated graph. Figure 3 presents the classes and properties defined in the proposed ontology. It extends existing ontology given in [33] and adds object and data properties, which are particularly needed for annotating trust in the context of MuHe networks. The single new class TrustRelationship defines the trust relationship between instances of trustor and trustee classes using and properties. The trustor class holds the identity of trustor, while the trustee class represents the person being trusted by the trustor. Both the trustor and trustee are persons, so their URIs are generated using person class of the FOAF namespace.

The ontology allows us to model trust in three different formats: absolute, processed, and fuzzy. Absolute trust is the subjective trust value extracted from any particular network. For example, consider the example of a publication network, where trust between researchers is measured in terms of coauthorship frequency; in this case, absolute value is the count of the number of times that a pair of researchers have appeared in publications together. Processed value in this context is an absolute value normalised in the range between 0 and 1. It is a translated value that transforms trust from multiple contexts into a format that can be compared or combined. Fuzzy trust is the human understandable version of numerical trust values such as high trust and medium trust. There are various techniques of modelling fuzzy trust; however, trust modelling using fuzzy logic is out of the scope of this work.

Trust in MuHe social networks can be viewed as a set of direct experiences from multiple social networks or recommendations provided by other members of those networks. In our ontology, it is modelled using has_type data property, where the domain of the property is the URI of the relevant person and range is the string value of “direct” or “indirect.” The direct trust is extracted based on the direct interactions (which can be either of the explicit or implicit activities), whereas indirect trust is calculated between isolated users, typically belonging to multiple social networks. There are various methods of calculating indirect trust, so the has_process object property records the technique used and the has_pathLength object property stores the length of the trust path involved.

The trust definition used in this work takes it as a subjective value, so the has_scope data property specifies the area that any particular trust value belongs to. It is particularly important when trust data only related to certain area needs to be consolidated such as if the system has to aggregate trust information about trusted academics in the fields of semantic web, machine learning, and so forth.

The outcome of the coreferencing technique is a linked network having consolidated URIs for overlapped users and multiple trust values between them extracted from MuHe networks. The next step is to combine these multiple trust values into a single one, using the data fusion technique. The ontology must therefore also record this process, which is modelled using two properties: has_metricNature and has_aggregationTechnique. The former defines whether the target pair of users were overlapping or nonoverlapping, therefore having to establish whether there are multiple values to aggregate or only a single value to be reevaluated. The later one stores the name of the data fusion technique used to aggregate if multiple trust values are present or reevaluate if only singular value is available.

3.2. Trust Data Fusion Module

Data fusion module aggregates trust values between users belonging to multiple social networks and generates a single value that represents multifaceted trust. The task of consolidating multiple trust relationships is basically a data fusion problem and particularly it may be considered as a data aggregation case study. One of the simplest approaches is to perform summation (S) or an average (A) of the trust values. This may work for simple numerical quantities, but in the case of trust it could damage trust values by either inflating them unnecessarily (through summation) or deflating them drastically (as some of the trust values may be effectively zero due to missing information from some of the networks being consolidated). Therefore, more comprehensive techniques are needed to aggregate trust values because they represent trust in varied context and naive methods of aggregation can distort the integrity of trust.

The trust metrics between a pair of users can range from one to the n number of networks involved in consolidation. Suppose that represents a pair of users from multiple social networks and is the set of trust values between them from n social networks; i.e., . There are two parameters involved in trust aggregation: (1) importance of the trust values from individual social networks, denoted as , and (2) importance of the sources of that information, denoted as . The values of both parameters stay in the range of [0, 1], where zero value shows low importance, while a value of one represents higher importance. The function aggregates these values and generates a single value by considering and for each in the dataset .

There are various data fusion techniques discussed in the literature and Section 2 presents their detailed background. A richer description of these methods can be found in [34], which reported on a subset of the simulation experiment but the full experiment focused on the following, all of which have been implemented within the data fusion module:Weighted Average (WA) technique consolidates multiple trust values by only considering the importance of data sources. The individual trust data points in are aggregated by multiplying them with the weights of the corresponding sources. The importance of each source network is represented using a weight vector p, and its size is equal to the number of trust values.Ordered Weighted Averaging (OWA) works in a similar way to WA instead of the weights being associated with a given source, the sources are ordered (according to which is most trusted) and the weight is associated with the position within the ordering. The trust data points in are sorted in an order from high to low; then already permuted trust values are multiplied with the corresponding weights from the weight vector to generate a single aggregated value .Weighted Order Weighted Average (WOWA) associates importance parameters to both source and the order of the trust values. It takes two sets of weights other than the set of trust values : first is the weight vector that shows the importance of trust data points; and second is the weight vector p to show the relevancy of the sources of trust information. The vector can have both integer and fraction values, while and p vectors are continuous values in the range of [0, 1].

3.3. Trust Evaluation Module

The trust between isolated users is calculated using trust propagation algorithm. It works on the principal of transitivity and existing studies (mentioned in Section 2) empirically prove that trust decreases as the length of path between indirectly connected users increases. In short, people, in both real life and online social networks, have high degree of trust towards friends of their friend rather than strangers, but this trust decreases as the length of trust path increases. Drawing on this knowledge, we have used a transitive decay-based trust calculation within the module.

The first step is to calculate all possible trust paths between any two users; the trust value associated with a given path is the result of multiplying all the trust values (in between [0, 1]) between all the nodes in that path. There are then two possible options for choosing the trust path. The first is the strongest path where the algorithm returns the trust path having the maximum strength of the trust ties between users without considering its length, that is, the number of users involved in the path. The second is the shortest path, which chooses the path with the shortest length, regardless of its trust value. However, if multiple trust paths of the same shortest length exist, then the one with the highest trust value is chosen. In this work, we have used the shortest path approach for experimentation, while the strongest path approach is already published in [34].

4. Experiment 1: Simulation Experiment for Trust Inference over Consolidated MuHe Social Networks

The simulation experiment discussed in this paper is an extension of the works described in [3, 34] and it further extends these works to include the OWA data fusion method and shortest trust path algorithm. The simulation examines the impact of the consolidation of MuHe networks on trust properties used by the shortest trust path algorithm (described in Section 3.3 above). The simulation is based around a consolidation of pairs of networks, using four data fusion techniques starting from naive ones such as summation (S) and weighted average (WA) and then extending up to more complex techniques such as Weighted Ordered Weighted Averaging (WOWA) and Ordered Weighted Averaging (OWA). The technique that best satisfies trust properties qualifies to be used for making trust computations over real-world data. Both the simulation and real-world experiments are implemented using the NetworkX (http://networkx.github.io/documentation/latest/reference/introduction.html) library of the Python programming language. It includes the code for generating networks, consolidation them, measuring network properties, and applying trust inference algorithms.

4.1. Experiment Design

The pairs of social networks were generated, with randomly assigned trust values on the links between users, having a varying percentage of participant overlap (PO) and tie overlap (TO) in each pair of networks. It was to assess the values of trust properties when the networks with varying percentage of overlaps were consolidated. N1 and N2 represent the original networks generated by the simulation; MuHe was the final linked network, while CN1 and CN2 represent subnetworks in the MuHe mapped to N1 and N2. We can then see the impact of consolidation by comparing CN1 to N1 and CN2 to N2.

The impact of consolidation on trust properties is measured using the approximation of two metrics: average tie strength (TS) and average tie length (TL). TS is the average trust of the network using shortest trust paths, and TL is the average length of the shortest trust paths in the network. Ideally, TS metric from CN1 and CN2 and MuHe should be similar to that of N1 and N2, even if there are no significant PO and TO. This indicates that trust is not being inflated in the consolidation process. Furthermore, it is desirable that, due to the emergence of additional trust paths, TL metric be overall decreased in CN1 and CN2 and MuHe as compared to N1 and N2. If TS remains steady and TL reduces, it may be deduced that the consolidation has successfully enhanced trust calculations by opening up new trust paths without escalating trust values in the network. More formally, we can say that any data fusion technique chosen for trust should satisfy the following set of propositions (adapted from [1517]).

Proposition 1 (boundary conditions). The trust aggregation function should keep consolidated trust value within the maximum and minimum range.

Proposition 2 (idempotence). The resultant value of trust aggregation function should be equal to if all the trust values are the same; that is, .

Proposition 3 (monotonicity). The trust aggregation function should be monotonic, which means that it should generate high aggregated trust for high trust values as compared to low trust values.

The final property, known as Trust Absence, ensures the integrity of trust from individual networks. It refers towards the missing trust information from any of the constituent networks and recommends to consider it as the absence of trust and not distrust between individuals.

Proposition 4 (trust absence). The trust aggregation function should differentiate between absence of trust and a distrust. The numeric value of zero, in this study, represents absence of trust information from either of the individual networks, so their aggregate should generate trust value that is approximately similar to the one generated without that numeric zero.

This simulation generates networks for four different participant overlap (PO) percentages, that is, 40%, 60%, 80%, and 100%, and then for each value of PO (except 40% PO) TO is varied from 0 to PO in increments of 20%. In each of the simulations, trust information on the links is aggregated using different aggregated schemes named S, WA, WOWA, and OWA. Table 1 provides details about network and consolidation parameters used in this simulation.

4.2. Results and Analysis

In our analysis, we considered the impact of consolidation on both average strength of trust ties (TS) and average length of trust path (TL). These are discussed separately in the following sections.

4.2.1. Average Strength of Trust Ties

The results in Table 2 present the values of TS metric for varying consolidation parameters PO and TO using the shortest path algorithm. It shows that the TS metric using WOWA approach has more stable measurements for CN1, CN2, and MuHe than all the other approaches (S, WA, and OWA). Simple techniques S and WA severely distort TS metric for two extreme values, that is, at 100% PO and 100% TO, with a value of 0.93, and at 40% PO and 0% TO, it stood at 0.17. Similarly, OWA also performs poorly for low participant and tie overlap and its value for MuHe at 40% PO and 0% TO dropped to 0.22. The TS metric recorded by WOWA approach remained stable throughout varying values of PO and TO and it stood at 0.66 for 100% PO and 100% TO and 0.44 for 40% PO and 0% TO. Based on the results of all the data fusion techniques, WOWA appears to be the better technique for aggregating trust information.

The statistical significance of the apparent preservation of trust integrity by WOWA is assessed using a two-tailed paired t-test. It evaluates whether the results of WOWA approach are significantly better than those of the other two techniques. This test generates a value and a value of indicates significance. Table 3 shows values for two types of participant overlaps (PO); the first four rows show values for varying percentage of PO, while the last measurement, that is, overall, shows the collective performance of the system by including TS metrics for all percentages of PO. The analysis of the values reveals that the claim of WOWA being better than the other two techniques, WA and IOWA, is statistically significant and holds true for all values of PO.

4.2.2. Average Length of Trust Ties (TL)

Table 4 presents the results of the TL metric for varying percentages of PO and TO using the shortest path trust algorithm. It shows that the values of TL metric for CN1, CN2, and MuHe are the same for similar participant and tie overlaps across all the data fusion techniques. For subnetworks CN1 and CN2, it decreases with the increase in PO and TO due to emergence of new trust paths, but, for MuHe, it only decreases when PO  80 and then it starts increasing again with an increase in TO. The reason is that less value of participant overlap between the networks creates bottleneck due to more number of nonoverlapping nodes, which results in longer trust paths between users. This trend reduces when the value of PO increases. When compared with original networks N1 and N2, both have an average length of trust paths (TL) of 1.57; TL is higher (i.e., 1.69) than both the original networks when the participant overlap and tie overlap are 40% and 0%, respectively. It becomes even higher (i.e., 1.79) when the value of PO and TO reaches 60%. The number of new shortest paths becomes maximum at and as a result TL drops to 1.14, which was 1.57 and 1.57 in N1 and N2. However, at 100% participant and tie overlaps, it again becomes equal to N1 and N2 due to consolidated MuHe networks being exactly similar to original networks.

The results of the TL metric show that the aggregated version of the networks, that is, MuHe network, is basically dependent on the PO. When the percentage of PO is low, a path bottleneck exists in the consolidated network, which makes the TL metric worse as compared to original networks N1 and N2. This is due to the less number of fresh trust paths being generated as a result of low value of participant overlap. When the value of PO increases, it causes TL to fall due to increase in number of overlapped users, which reduces the bottleneck issue. Furthermore, TL is lower in each of the subnetworks CN1 and CN2 than in the corresponding original networks N1 and N2, respectively, irrespective of the value of PO. This shows the emergence of additional trust paths due to consolidation. The results from this simulation and the one discussed in [34] prove that the WOWA consolidation is the best approach along all the varying overlap values, while other techniques show poor performance for certain values of PO and TO. At low participant overlap, it respects the integrity of trust (as measured by stability in TS, average strength of ties) while creating new trust paths (as measured by decrease in TL, average length of trust paths).

5. Experiment 2: Network Trust versus Declared Proxy Trust

The simulation demonstrates that in principle the WOWA aggregation technique can result in a MuHe network with improved trust characteristics; however, without applying the technique to real-world networks, it is impossible to evaluate whether the trust values that emerge as a result are better than those calculated by single networks. To test this, we undertook an evaluation that compared trust values calculated from two real-world networks and their consolidation against trust values obtained from directly surveying members of that network.

Networks with pure trust values are rare, so a pair of professional social networks are extracted from publication and projects domain. These networks are managed by the University of Southampton and they represent proxy trust between users using the coauthorship and collaboration frequencies. The assumption is that individuals who had worked together (manifest as joint publications or participation in the same project) would exhibit higher trust. The coauthorship frequency was extracted from ePrints’ (http://www.eprints.soton.ac.uk) publication network, while collaboration frequency was taken from the public catalogue of research projects undertaken by the WAIS (Web and Internet Science) research group at Southampton (http://www.wais.ecs.soton.ac.uk/projects) and contained details about the projects and staff associated with those projects. The dataset included information about both active and past projects that are being completed under the WIAS research group.

Both networks are available online in RDF format, and these were transformed into the ontology described in Figure 3 and passed into the MuHe consolidation framework (described in Section 3). This then performs the WOWA aggregation and links the resultant consolidated graph with the existing graphs using owl:sameAs predicates.

Both of these networks are in the same environment, so there are significant PO (participant overlap) and TO (tie overlap), with the collaboration network nearly a subset of the coauthorship network, discounting users outside the university who work on projects. The PO and TO with respect to the WAIS were 51% and 78%, while for ePrints they were 2% and 1.4%, respectively. Table 5 shows description of the network parameters used in this experiment. Both these networks contain bidirectional symmetric trust, as coauthorship and collaboration represent the same trust values in both directions.

5.1. Experiment Design

A survey experiment is designed to test the accuracy of the proposed framework for real-world social networks. It collects proxy trust values between users in the professional context, which are then compared with the trust measurements from the original and consolidated pair of networks. This is a web application (developed using Django (https://www.djangoproject.com/) framework), which first examines the presence of user in one or both of the networks and then presents each user with a randomly selected set of related people from these networks, based on the presence of the user in these networks. A set of questions were asked, which represent proxy trust helping us to measure the level of trust between them.

5.1.1. Participants

The designed survey has two types of participants. The first set of participants is known as rating participants and they represent those taking part in the survey. The second group of people comprised those about whom rating participants expressed their trust by answering proxy trust questions. This set of people is known as rated participants. The rated participants were selected from the egocentric network created by taking rating participant as an ego-node and randomly selecting set of users. As the simulation experiment claims the existence of trust decay along paths in social networks, the accuracy of indirect trust was evaluated by selecting rated participants belonging to path lengths of one, two, and three. If the rating participant was present in both the selected networks, then four of the rated participants were selected from each of the networks; otherwise, all the eight rated participants were selected from one of the networks.

5.1.2. Questionnaire

The survey aimed to extract the trust that participants feel towards one another in a professional context. The substance of a trust survey can be ethically difficult, as participants can be unwilling to disclose genuine trust ratings for others; our solution was to ask less sensitive questions about the closeness of professional ties and to treat these as proxy trust values. There are two questions asked to each of the rating participants. First one is about the past work experience with some randomly chosen related person and second question is about the likelihood of them working together in future should there be the opportunity. The numerical data from this portion of the survey was in the range (0, 0.8) with the value of 0.8 corresponding to working very closely, while 0 means working hardly at all. The survey values are then compared with the data available from the system, which is already in the range (0, 1). The last question in the survey asked rating participant to briefly explain their relationship with each of the rated persons separately. This was so we could explore whether different categories were more accurately represented by the proxy networks or improved by the consolidated network.

5.1.3. System and Survey Trust Metrics

Figure 4 shows the number of nodes (people) and ties (relationships) in ePrints and project networks used for the experiment. Of these, a total of 26 individuals participated in this survey experiment; on average, each rating participant provided trust ratings on 3.15 out of 5.38 rated participants presented. The maximum number of trust ratings was provided for path length one, and, as might be expected, this number decreased as the length of path increased.

As discussed in Section 3.2, trust data from consolidated MuHe networks can be categorised into two types, complete and partial. With reference to Figure 4, complete data originated from the region EW, and user pairs belonging to this region are known as users, while partial trust information came from either regions E or W or across different regions, for instance, E  EW, and user pairs belonging to this category are known as users. Table 6 shows different trust parameters of the experiment along with their values. It can be categorised as system and survey readings.

System readings are the ones generated by evaluating trust over ePrints and WAIS networks, while survey readings are collected from the corresponding real-world users. For the two categories of users specified above, there are always two trust values available from the survey experiment, but there are a variable number of trust values available from system readings. The trust values extracted from the survey are represented as and for representing past trust and future trust, respectively. There are three system-generated trust values for overlapping pairs of participants, represented as : two from each of the individual networks ePrints () and WAIS () and one from the consolidated version, represented as . For pairs of participants, however, there were only two values available, one from either of the individual networks ePrints () or WAIS () based on the user presence and the other from their consolidated version, represented as MuHe () networks. Beside this, there was one additional parameter that describes the type of relationship (Rel) between each of the user pairs. This was to assess the accuracy of aggregated trust metric with respect to each of the relationship categories.

5.2. Results and Analysis

Results of this experiment were thoroughly analysed to test whether the trust metrics from MuHe networks () are closer to real-life trust metrics ( and ) than from individual networks ( and ). The test was conducted by first taking the mean of the absolute difference between system and survey readings for each category of users from individual and consolidated networks and then evaluating value between datasets using t-test to deduce whether the difference is statistically significant.

5.2.1. Overlapping Users’ Data

To analyse the overlapping users’ data for improvement, we calculated the mean value of the absolute differences between system and survey readings. Table 7 presents means for strongest and shortest path algorithms. It shows that, for both the algorithms, EP () and WP () are larger than MP (), which manifests that consolidated MuHe networks reduced the difference between system and survey metrics and brought trust metrics closer to real-life metrics than individual networks. Similar results are demonstrated by EF () and WF () when compared with the MF ().

To test whether the apparent improvement of over and is statistically significant, value was calculated by conducting one-tailed paired t-test over the set of absolute differences between system and survey readings. If , we judge that this improvement is statistically significant for this dataset. Table 8 presents the evaluated value. Results show that for all categories of users, which proves the claim of generating trust metrics from consolidated MuHe metrics being closer to real-life trust metrics for true for this dataset.

5.2.2. Nonoverlapping Users’ Data

As mentioned earlier, nonoverlapping set of user pairs has partial trust information, so there is only one system trust metric available between them other than the one obtained from the consolidated network.

Like for the users, means of the absolute difference between system and survey readings are calculated and Table 7 shows the results. Here, it shows opposite behaviour to the one existent in the case of users; that is, MP and MF are greater than EP and EF, respectively, for both the strongest and shortest path algorithms—see Table 9 for description of MP, WP, MF, and WF. This shows that the consolidated MuHe networks took the trust values farther from the survey values, which resulted in deterioration of trust values for users.

6. Discussion

If a trust-based system is developed for real-world social networks, the concept of implicit trust may be used to extract numeric values based on the activities of users in their individual social networks. This is due to unavailability of explicit trust metrics in all of the well-known social networks in use nowadays. The trust metrics in these networks may be based on the frequency of likes/favourites or retweets in the kind of social networks, for example, Facebook or Twitter. In likes of professional social networks such as Stack Overflow or Quora, the frequency of up votes/shares may become a metric of trust.

The task of fusing data from real-world MuHe networks is also nontrivial. In federated networks, it is relatively straightforward as multiple accounts of a single user are not allowed in these networks. The professional social networks selected for experimentation also belong to this category. The trust metrics from such networks may be extracted from each of the networks using implicit activities and can easily be aggregated using the techniques discussed in this article. However, for networks without any federated control, there may be fake or duplicate accounts, which can create data fusion problems. They may contribute wrong information, which is not reflective of the person in the real life. In such scenarios, the data should be rigorously preprocessed to eliminate any discrepancies before being fed to this framework for analysis. An absence of such preprocessing layer may generate distorted aggregation, which can mislead other users in making wrong predictions if they are totally relying on digital world for trust-related decision-making.

The future research directions that emerge out of this work are to implement the proposed model keeping in view the challenges that may arise due to federated and nonfederated types of real-world social networks. Another research direction may be to explore the value of MuHe networks created from multiple constituent networks and also to understand the impact of networks due to differing quality and size. Our hope is that this work will persuade developers of trust systems to go beyond individual social networks for trust calculations. This could improve existing trust systems by enabling them to make more intelligent trust decisions by incorporating information from a variety of sources on the web.

7. Conclusion

Since the inception of Web 2.0, the use of online social networks has been increasing and existence of users in multiple networks is a great opportunity to make trust metrics on the web more sophisticated by incorporating a variety of information. This paper describes this as calculating trust on multiple heterogeneous (MuHe) networks and makes three contributions.

Firstly, it presents a semantic web based framework for modelling and consolidating heterogeneous trust networks and for performing trust calculations on both the individual and consolidated MuHe networks.

Secondly, it demonstrates the efficacy of the framework via a simulation that creates MuHe networks from individual networks with varying node and tie overlap. The simulation allowed us to investigate the impact of different data fusion techniques on the trust metrics of the MuHe network, and we showed that the Weighted Order Weighted Average (WOWA) technique produces a MuHe network with new trust paths, which protects trust values (that could be overinflated or suppressed with other consolidation techniques).

Thirdly, we explored how the MuHe network approach could work with real-life networks and applied the framework to two professional networks (ePrints, a publication network, and WAIS, a project network). This showed that, for overlapped users, the MuHe network was closer to the assessments of the individuals within those networks (as gathered via a survey), but there was no substantial difference for nonoverlapped users (individuals who only appeared in one of the two networks).

Our simulation experiment clearly shows that, with appropriate data fusion techniques, MuHe networks can be constructed with better trust properties than their constituent networks. But we have also shown that the value of this in the real world is strongly effected by how close the trust represented in the constituent networks is to the trust required by the application and that in the real world the network sizes can be uneven, resulting in more complex interactions than those shown in our simulation. So, while MuHe networks appear to be a promising technique for improving trust calculations, they are not a panacea and still depend heavily on the quality of the constituent networks and how closely those networks reflect the type of trust required for a particular application.

Data Availability

The datasets generated and/or analysed in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Higher Education Commission, Pakistan, under the Startup Research Grant no. 21-1115.