Abstract

Vast amount of multimedia data contains massive and multifarious social information which is used to construct large-scale social networks. In a complex social network, a character should be ideally denoted by one and only one vertex. However, it is pervasive that a character is denoted by two or more vertices with different names; thus it is usually considered as multiple, different characters. This problem causes incorrectness of results in network analysis and mining. The factual challenge is that character uniqueness is hard to correctly confirm due to lots of complicated factors, for example, name changing and anonymization, leading to character duplication. Early, limited research has shown that previous methods depended overly upon supplementary attribute information from databases. In this paper, we propose a novel method to merge the character vertices which refer to the same entity but are denoted with different names. With this method, we firstly build the relationship network among characters based on records of social activities participating, which are extracted from multimedia sources. Then we define temporal activity paths (TAPs) for each character over time. After that, we measure similarity of the TAPs for any two characters. If the similarity is high enough, the two vertices should be considered as the same character. Based on TAPs, we can determine whether to merge the two character vertices. Our experiments showed that this solution can accurately confirm character uniqueness in large-scale social network.

1. Introduction

In the past decade, the mobile Internet and social multimedia applications have become an indispensable part of social life, and huge multimedia data are being produced and consumed [1]. For instance, Facebook reports 350 million photos uploaded daily as of November 2013; 100 hours of video are uploaded to YouTube every minute, resulting in more than 2 billion videos totally by the end of 2013 [2]. Social Media Networks allow people to communicate, share, comment, and observe different types of multimedia content [3]. As social activities are becoming more frequent, social networks have been larger and much more complex. Generally, we extract information and construct social transaction databases from vast amount of multimedia data, such as text, images [4, 5], videos, and audios [6], to construct large-scale social networks which are modelled by graphs [7] with node-edge representation [8]. Multimedia data, generally, can be described in multiviews [9, 10] such as color view and textual view [11, 12]. In social networks, the relations between character vertices are tangled by the time difference of transaction, incompletion of personal information record, anonymous phenomena, and difference of information pattern and structure. It is distinctly difficult to maintain one-to-one mapping between characters in relation networks and people in real life. Besides, characters are marked up as difference vertices by former and present name. These vertices have the same personal information, structure, and attributes of relation. For social networks, these vertices and relationships are redundant, which will severally perturb the results of social network analysis. Therefore, character vertices ambiguity has become a key problem in social network analysis.

In relational databases, we can use multidimensional personal information to confirm uniqueness of characters, such as name, gender, and date of birth. In big data environment, however, multimedia data is mainly from unstructured data storage. Its scale is vast and types are multifarious [13, 14], such as text, images, videos, and audios. Besides, the data is not complete and consistent generally, which caused the uniqueness of characters to be difficult to determine. Most of the large-scale multimedia data, nevertheless, are stored externally and included people’s participation in social activities, such as vocational and educational experience and participating in service club. Social relationships are generated by using multimedia data of these activities, and then the social networks can be built up. It is a key problem to confirm the character uniqueness in social networks analysis and application. We propose to measure the similarity of characters by network characteristics to conduct character correction by vertices merging with computing structure error of networks. However, social networks have temporal attributes generally, and relations extracted from them also have obvious temporal characteristics. We regard temporal attributes as a key factor of relations, which is used in computing the similarity of vertices. Accordingly, it boosts the accuracy of uniqueness conforming. We put forward new notions of character activity path and transaction activity network with heterogeneous features and then use temporal activity path similarity evaluation to improve the reliability of character correction.

The remainder of this paper is organized as follows. We introduce related work in Section 2 and then describe an academic network building process in Section 3. In Section 4, we present character vertices merging principle and structure error algorithm based on network structure. In Section 5, we introduce transaction activity networks, activity paths algorithm, and vertices screening and then analyze the experiment results. The conclusion and future work are included in Section 6.

2.1. Social Analysis via Multimedia

The advent of social networks and cloud computing has made social multimedia sharing in social networks become easier and more efficient [15]. With the rapid increasing of volume of multimedia data, social networks analysis and mining via multimedia data attract attention of a number of researchers recently. Zhuhadar et al. proposed combination of social learning network analysis and social learning content analysis in studying the impact of the social multimedia systems cyberlearners [16]. They presented evidence obtained from the analysis that Social Multimedia System impacts the communication between faculty and students. To deal with the challenges of event detection rom massive social media data in social networks, Zhao et al. [17] proposed a novel real-time event detection method named microblog clique to explore the high correlations among different microblogs, which was supported by social multimedia data. Sang and Xu [2] proposed to analyze into variety of big social multimedia from the perspective of various sources. Laforest et al. [18] present a new kind of social networks named spontaneous and ephemeral social networks (SESNs) which allow people to collaborate spontaneously in the production of multimedia documents. In order to find overlapping communities from multimedia social networks, Huang et al. [19] proposed an efficient algorithm named LEPSO for overlapping communities discovery, which is based on line graph theory, ensemble learning, and particle swarm optimization.

2.2. Name Ambiguity and De-Anonymity

Recently, name ambiguity and de-anonymity have been widely studied. There are several methods to identify characters in social networks, which can be divided into three categories, for internal relational database, Internet webpage, and topology structure of social networks.

Name ambiguity and de-anonymity are with the same essential features. In the past, the identity of the characters is determined by the accurate attribute information in internal databases of enterprises. In 2008 Narayanan and Shmatikov proposed the method to process high dimensional data [20], such as personal attribute, recommendation, and transaction information. Users can identify characters in the anonymous database with limited personal information. This method has strong robust even though background information is inaccurate or disturbed. However, internal database is localized and static; it cannot describe feature of characters thoroughly. Therefore, these methods are not suitable for name ambiguity problem in big data with complexity, dynamicity, and cross platform.

Name ambiguity is more prominent in the Internet. In 2008 Tang et al. proposed a standard probability framework to recognize the independence of observed objects [21]. But when we search name on the Internet, numerous webpages containing one same name can be returned, and it is not certain whether these pages belong to the same people. Bekkerman and McCallum proposed two statistical methods to solve this problem in 2005 [22]. One is based on link structure of webpages and the other is on multiway distributional clustering method, which is unsupervised frameworks and only needs a few of prior knowledge, and the experiments show that their solution outperform traditional clustering. However, the above methods are deeply subject to the uncertainty of web information.

At present, name disambiguation becomes even more prominent in social networks modeling and analysis. In 2008, Liu and Terzi analyzed character-centered social networks and then pointed out that features of relation structure can expose characters’ identity [23]. For identity hiding in social relations network, they defined graph-anonymization and proposed the algorithm based on -degree anonymous graph and node degree sequence. Narayanan and Shmatikov defined privacy of social networks in 2009 [24] and designed a novel reidentification algorithm which can implement de-anonymity and identifying node by using the topological structure of network. For dynamic evolution of social networks, Ding et al. proposed the “threading” technique and used the connection between released data to implement de-anonymizing [25]. And they proposed to combine structure information and attributes of nodes to reidentify anonymous nodes. Korayem and Crandall worked on de-anonymizing method specially in cross platform social networks [26], which can recognize that different accounts belong to one user by extracting time sequence data, text features, geographic location, and social relation characteristics. In 2012, Srivatsa and Hicks introduced de-anonymizing users’ mobile trace information based on graph structure of social networks [27]. As contact graph between characters consists of vast quantities of mobile trace, they proposed structure similarity of interuser correlation, which was used to map contact graph and social network.

Since a large number of mobile trajectories can be used to build the contact graph of characters, the structural similarity is employed to find out the corresponding nodes in the contact graph and social network. The de-anonymity with mobile trace is implemented by mapping character nodes between the two networks. The methods mentioned above aim to solve the problem of anonymity in traditional networks in which nodes and relations have only one category. However, the social network created on big data is comprehensive, such as category diversity of relationship and nodes and temporality. In addition, these methods need to add attributes to supplement topology information of social networks.

From enterprise databases to webpages databases and then to social networks databases, this is a developing process from local data to global data. Previous methods rely on attribute details of local data; namely, it needs much more auxiliary information to identify characters, but the efficiency is low. However, big data is multisourced [28], time-variable, global, and macroscopic. The social networks are built on global data, and it is impossible to look back upon distributed sources. In this type of networks, intrinsic relationship structure of vertices is a key factor to measure uniqueness of characters. Since different characters have different social relationships, we can identify characters by network structure features. For the heterogeneity and temporality of big data networks, we propose uniqueness correction method and the notion of activity path similarity based on heterogeneous temporal networks [29, 30], to promote the efficiency and accuracy of character identification.

3. Social Network Modeling

A large-scale social network is based on diversified multimedia data which is multimodality [31]; for instance, an image can be described by color modality or shape modality. It contains information of multifarious and complex social activities. We can build this kind of network by extracting relations from transaction activity information which is extracted from multimedia datasets. As academic network is a typical case of social network, we use it as an example to describe the process of social networks mining.

In general, academic relations mainly include teacher-student relationship, classmate relationship, project partnership, and coauthor relationship. These relations are contained in education experiences, research and work experiences, cooperation and coauthor experiences, and academic activities and conferences experiences. The information of academic activities is contained in project proposals, project progress and concluding reports, degree certificates, award certificates, photographs or videos concerning conference, and other scientific information documents. Therefore, we extract academic activities information from the multimedia data and then construct academic transaction activity network. It is the base of mining and analysis of academic network between scholars.

Figure 1 shows a general view of framework of academic relation network construction. First, we use academic activity transaction extract method from multimedia sources which contain texts, images, audios, and videos to collect individual resume information of scholar and team members information. Then we construct an academic activity transaction database. This database contains personal information, study and work experience information, and project and publication information. After that, we build academic activity relation network containing heterogeneous vertices and relations. On this basis, we create academic transaction activity networks. In this kind of networks, there are several types of transaction activities, such as study experiences (“graduated from Tsinghua University,” “studying in Cambridge University,” “was conferred doctor’s degree,” etc.), work experiences (“worked at Microsoft MSRA,” “teaches in Central South University,” etc.), publication information (“published (μ + λ) Evolutionary Strategy for 3D Modeling and Segmentation with Super quadrics,” etc.), and research experiences (“took over The Association Rules Mining of Time Series and Knowledge Discovery for Recognition of Expert Academic Activities Track project,” etc.).

These transaction networks are 2-mode networks which consist of two types of vertices: character vertices and entity vertices and their activity relations. These vertices represent scholars or researchers and academic entities, respectively. We can mine alumni relationship, workmate relationship, project cooperation, and coauthor relationship from them. The character vertices and academic relation constitute academic relationship network which is a kind of homogeneous 1-mode network. We proposed vertices merging method based on structure error of network to implement uniqueness correction in this 1-mode network.

4. Evaluating Uniqueness of Character Vertices Based on Structure Error

Redundant information of vertices and relation is generally carried out by nonunique character vertices. Thus, correct structure merging is a key process to remove redundant information from social networks. Theoretically, structure of networks will not be changed after redundant vertices and relations merging. We evaluated uniqueness of character vertices by merging test and then screened out redundant vertices candidates.

4.1. Evaluating Uniqueness of Character Vertices

In a social network, we consider the character vertices which have the same neighbor as suspicious redundant vertices. Some of them containing redundant information are nonunique, and the others with a high similarity may not be redundant. Thus, we call suspicious redundant vertices as redundant vertices candidates.

4.1.1. Uniqueness of Vertices

Let be a 1-mode network in which the vertices represent characters. Two nonempty finite sets and are character vertices set and relations set. We denote the mapping from relations set to vertices set as . The set which has vertices to be tested is denoted as and is set of neighbor vertices, and , , . The relation set which contains relations between character vertices and neighbor vertices is denoted as and . The mapping from character vertices set to relations set is denoted as , and the mapping from character vertices to its neighbor vertices is denoted as .

Property 1. In a 1-mode network G, if vertices in the vertices set have uniqueness, then and the values of structure error between are not zero.

4.1.2. Redundant Vertices Candidates

In theory, character vertices which are nonunique have selfsame or nearly identical relation structure. Redundant relations and vertices are generated by this situation and they should be merged so as to remove redundant information. We introduce the notion of structure error to describe the difference of network structure between vertices. The vertices with selfsame or highly similar structure are referred to as redundant vertices candidates. They contain redundant relation information.

Definition 2 (redundant vertices candidates). In a network , character vertices are redundant vertices candidates if the values of structure error between them are zero, and the redundant vertices candidate set is denoted as .

Definition 3 (redundant vertices). Let a vertices set be , , . If the vertices in are nonunique, the set is referred to as redundant vertices set. The number of all vertices in is denoted as .
If is the redundant vertices set, the vertices merging process is , and calculate , and , . Then calculate and to get the merged network . This new network does not contain any redundant information because vertices have been removed by vertices merging.

In Figure 2, network contains six character vertices and nine relations, we denote them as and ; the neighbors of vertices , , and are denoted as . Before merging process, they connect with neighbor vertices , respectively. Therefore ; we regard , , and as redundant vertices candidates. In network , vertices set and relation set are and . The neighbor vertices set is . Both and have two relations but structure of them is different; namely, , and . Thus, they are not redundant.

4.1.3. Structure Error

After merging process, the number of relations between neighbor and character vertices has been changed, but the number of relations between character vertices and their neighbor vertices and the number of neighbors remain unchanged. According to this principle, we use these numbers to define structure error which is the validation criteria of structure merging.

Definition 4 (structure error). In network , the character vertices subset is denoted as , and the neighbor vertices subsets before and after merging are and , ; thenIn (1), , , and the numbers of neighbor vertices before and after merging are denoted as and . and severally represent the number of relations between and and their neighbors. represents the type label of social networks and is the set of type labels, . The structure error of and is denoted as .

Based on this notion, we can recognize redundant vertices candidate from social networks according to structure error. If , we can regard and as a vertex pair with uniqueness, whereas they are redundant vertices candidates.

4.2. Algorithm

We designed redundant vertices candidates screening method in social networks according to the above notion, which is shown in Algorithms 1 and 2. Firstly, we arbitrarily select two character vertices and from networks and then calculate the number of relations between character vertices and their neighbors. We denote it as preRelations. Secondly, based upon merging principle we calculate the number of the relations between them after correct merging, and it is denoted as postRelations. Lastly, we calculate structure error of each vertex pairs and put the vertices which have zero value of structure error into redundant vertices candidates set.

Input: social network
Output: candidate redundant vertices set H
(01)Initialize list
(02)for    to    do
(03)for    to    do
(04)if the name of ≠ the name of   then
(05)if    then
(06)add  ,   into H
(07)end if
(08)end if
(09)end for
(10)end for
(11)return H
Input: Two character vertices and
Output: structure error value
(01)Initialize list and in
(02) for    to    do
(03) relations of person node
(04)for    to    do
(05)preRelations←preRelations + 1
(06)end for
(07)end for
(08)Initialize list relations in which shared by
character vertices and
(09)for    to    do
(10)postRelations←postRelations + 1
(11) – (preRelations
postRelations)
(12)end for
(13)return  

5. Character Uniqueness Measure Based on Activity Path Similarity

The temporal attributes of semantic relations are composed by start time and end time of activities. We can use them to construct heterogeneous temporal social networks which consisted several different types of subnetworks. Each of the subnetworks contains only one type of relations. Vertices similarity is therefore decided by activity relations between character vertices and entity vertices in different subnetworks. As differences of temporal attributes cause differences of relation path, we introduce activity path to describe these network structure. Based on this notion, we quantitatively measure similarity of character vertices by calculating temporal weight of activity paths. After combining all results in each subnetwork, character uniqueness can be measured precisely.

5.1. Transaction Activity Network (TAN)

Let and be label sets of vertices types and relation types, . Nonempty definite sets and denote character vertices set and entity vertices set, respectively. Nonempty relation set is denoted by . Let be temporal attributes set of activities in . The mapping from relations to vertices and temporal attributes is denoted by , and its inverse mapping is . The mappings of vertices types and relation types are denoted by and severally.

Definition 5 (transaction activity network). A transaction activity network (TAN for short) contains activity information and temporal attributes. It is denoted by , .

Property 1 (heterogeneity). In a TAN denoted by , there is .

Property 2 (temporality). In a TAN denoted by , there is , denotes time attribute of , and and denote severally start time and end time of .

A large-scale TAN always contains several types of social activities. We can divide them into two or more subnetworks. Each of them contains one type of transaction activities. Let be subnetworks with different types of activities. If its relations have temporal attributes and the set is , we denote as , and the sets of vertices, relations, and types are denoted, respectively, by , , and . It indicates that consists of subnetworks ; thus we denote it as simply. Thus, it can be seen that a large-scale TAN contains multitype vertices and relations, and differences of vertex types lead to differences of relation types [32]. In real world, a TAN always contains several types of social activity; namely, there are different types of relations and vertices in a network.

Figure 3 shows a heterogeneous academic TAN. It contains two types of vertices: scholar vertices and entity vertices , , , which represent institution, conference, publication, and research project. Due to differences of academic activities, there are different relations between vertices, such as write relation between scholars and papers and participation relation between scholars and conferences. We use , , , , , and to denote six types of relations (organize, attend, included, work at, write, and undertake) and , , , , , and denote temporal attribute set.

5.2. Transaction Activity Path (TAP)

In a TAN, transaction activity paths (TAPs for short) are relative to topology of it. We regard character vertices and entity vertices, respectively, as master vertices and their neighbor vertices, and then we can describe TAPs. A TAP is a path which goes through a pair of character vertices and one entity vertices and the relations between them. From one master vertex to another, there is one or more TAPs through their common neighbors, and they contain semantics and temporal attributes of original transaction records.

Let character and neighbor vertices be and , respectively, and are two label sets of relations in relation sets and , and

Definition 3 (transaction activity path). In a TAN , let or be start vertex; a path which begin at , and go through neighbor vertex and then end at is called transaction activity path. It is denoted by . The set of TAPs between and is denoted by .

Property 1. Let be the number of TAPs in set , .

Instance 1. Figure 4 shows that, in a TAN , the sets of character vertices and their neighbor vertices are and ; and are the sets of relations and temporal attributes. We can find that , , , . Evidently though, there are two activity paths from vertex to through neighbor vertex , and we denote them by and . Similarly, we denote the path through neighbor by .

5.3. Character Uniqueness Measure

Owing to temporal attributes of relations, we can define and calculate temporal weight of relations and TAPs, which reflect temporal characteristics of transaction activity networks. Based on temporal weight we can calculate TAP similarity to measure similarity degree of character vertices pairs. The similarity threshold is a filter to screen out unique vertices so that we can get redundant vertices set.

5.3.1. Temporal Weight Calculation

In a transaction activity network, temporal weights of relations are decided by start time and end time, while temporal weights of TAPs are decided by the former. Based on time attribute , we can use the following equation to calculate temporal weight of :

Now denotes current data, denotes label of relations, and is label of neighbor vertex . The following equation is the temporal weight of TAPs:The temporal weight of relations reflects the start time and end time, as well as the duration of relations. Apparently, the temporal weight of TAPs contains all of this information since TAPs consisted of two relations. The weight is decided by the temporal attributes of relations.

5.3.2. Transaction Activity Path Similarity

In a transaction activity network , let character vertices and entity vertex be and which is the neighbor of and . The TAP sets are denoted by ,  , and . , , and represent three types of paths, respectively. They have three different structures.

Figure 5 shows the first type of TAP between and . These paths begin from then go through relations , neighbor , and relation and end to . In a network, all of the TAPs between different two vertices are this type. The second and third types are showed in Figures 6 and 7. Both of them begin from one vertex ( or ) and end to the same vertex, and they are through the same relation twice.

Definition 2 (SimTAP). SimTAP is the similarity between two vertices and . It is decided by structure and temporal weight of TAPs between and . The definition formula of SimTAP is as follows:In this formula, , , and denote the temporal weight sums of these three types of TAPs. We use the following formulas to calculate these weights: and are weights of relations between and .

Generally, a transaction activity network contains several subnetworks. In order to measure the similarity of all characters, we need to add all similarity values in each subnetwork and then calculate arithmetic mean. Let a TAN be , we calculate similarity of vertices pair and in ; then we get the TAPs similarity set . After that we calculate the arithmetic mean in . The formula is as follows:

In the formula, is the number of subnetworks in and is the TAP similarity of and .

5.3.3. Character Uniqueness Measure

can measure uniqueness of characters quantitatively in TANs. The larger its value is, the greater the similarity between character vertices is, and vice versa. According to this idea, we proposed uniqueness measurement of characters: after calculating, we set character uniqueness threshold based on features of networks and data-analytic requirements to screen out the results. If , we regard and as unique characters, while if , vertices and have high similarity, which indicates that we need to merge these vertices and their shared relations.

Instance 2. In a transaction activity network , the type sets of vertices and relations are, respectively, denoted by and . There are 10 character vertices in this network and the values of similarity of them are shown in Table 1.
In this table, columns and are name of characters, , , , and indicate the similarity of vertices pairs in these four subnetworks, and denote the similarity in . After setting the threshold , we can find that there are four similarity values larger than : , , , and . It indicates that these characters are remarkably similar, and they do not have uniqueness. Besides, the similarity of character Long Chen and Jay Liu is smaller than θ; namely, ; thus they have uniqueness. This shows that we can screen out the character vertices which have uniqueness by calculating similarity and setting threshold .

5.4. Algorithm Design

We designed TAPs similarity algorithm based on the above-mentioned theories, which is shown in Algorithm 5. At first, we get the relation lists of vertices pair and from each subnetwork and then calculate the temporal weight of transaction activity paths. Second, we calculate the transaction activity network similarity of and and then calculate arithmetic mean of similarity SimTAP in network , shown in Algorithm 4. After traversing all vertices pairs in candidate redundant vertices set H and getting their similarity, we set threshold and compare it with each similarity. We regard the vertices whose similarity is larger than as redundant vertices and put them into redundant vertices set , shown in Algorithm 3. The vertices whose similarity is smaller than are regarded as unique vertices and they must remain in network.

Input: Candidate redundant vertices set H
Output: Redundant vertices set
(01)Initialize list , by candidate redundant vertices set H
(02)for    to    do
(03)for    to    do
(04)if the name of = the name of   then
(05)if    then
(06)if   or   then
(07)Insert or into
(08)else
(09)Insert , into
(10)end if
(11)end if
(12)end if
(13)end for
(14)end for
(15)return  
Input: Two character vertices and
Output: value of path similarity of and in
(01)Initialize list by relation list of in
(02)Initialize list by relation list of in
(03)for    to    do
(04)for    to    do
(05)if   of = of   then
(06)
(07)end if
(08)end for
(09)end for
(10)for    to    do
(11)for    to    do
(12)if of [] = of   then
(13)
(14)end if
(15)end for
(16)end for
(17)for    to    do
(18)for    to    do
(19)if of = of   then
(20)
(21)end if
(22)end for
(23)end for
(24)
(25)return  
Input: Two character vertices and
Output: value of path similarity of and in G
(01)Initialize
(02)for    to    do
(03)
(04)end for
(05)
(06)return
5.5. Experiment and Analysis

The multimedia dataset for academic transaction networks building contains texts, images, and videos concerning proposals, papers, award certificates, and videos of academic conference. In the experiment of this paper, we extract academic activity transaction data from 724 proposals of Natural Science Foundation of China (NSFC) [33], which are texts in Chinese only, and then established a transaction database. After that, we import these data into graph database Neo4J and then construct transaction activity networks which contained 598 vertices. We mine academic relationship between scholars and then build academic networks. On this basis, we calculate structure error of character vertices and then give the visual presentation of network [34]. Based on the results of structure error calculation, we get vertices from redundant vertices set H and calculated SimTAP of each vertices pair.

5.5.1. Evaluation for Structure Error

Our academic transaction activity networks contain 589 scholars’ academic transaction information. The first step was extracting academic activity data from transaction database and then importing them into graph database. We construct four types of activity networks; they were education experience network, work experience network, project cooperation network, and coauthor network. After that, we build academic network based on them. Figure 3 shows this network.

In this network, we calculate structure error of each vertices pair and screen out the vertices with 0 structure error. Table 2 shows partial results.

In Table 2, fields and denote two vertices and field denotes value of structure error of these vertices in . We can find that the structure error of vertices pairs Faye Wu and Fei Wu and Shaojia Zhu and Shaonan Zhu equals zero. Therefore, these two vertices are regarded as redundant vertices candidates. We can find their structure features in Figure 8. Four highlighted character vertices are Faye Wu, Fei Wu, Shaojia Zhu, and Shaonan Zhu. These two highlighted subnetworks illustrate that the two vertex pairs have same neighbors, respectively.

In order to analyze our method deeply, we extract academic activity information from the database. Tables 36 show academic activity information of Faye Wu and Fei Wu.

We can find that Faye Wu and Fei Wu studied in the same school over the same period. Likewise, they have the same experience on the aspects of work, project, and publication. Namely, their experience of academy is selfsame. Thus, Faye Wu or Fei Wu is not unique, which is redundant information.

In Figure 8, vertices Shaojia Zhu and Shaonan Zhu own the same neighbors. Similarly, we extract their activity information.

From Tables 710 we can see that vertices Shaojia Zhu and Shaonan Zhu studied in the same universities and were employed by the same employer but the periods are different. That means their education and work experience is different. The difference between Shaojia Zhu and Shaonan Zhu is caused by the difference of temporal attributes. Therefore, both of them are unique and they do not contain redundant information.

The results indicate that character vertices which have the same neighbors may not contain the exact same social activity information. These vertices are redundant candidates and among them there are some vertices with uniqueness. But we cannot recognize them by structure error. On the contrary, we can only screen out vertices whose structure error is not zero. They exactly have uniqueness. Above all, we need to recognize character uniqueness ulteriorly.

5.5.2. TAPs Similarity Calculation

We first calculate the similarity of vertices pair in an academic network containing 589 characters. After setting θ as 0.70, we screen out the vertices whose SimTAP are higher than θ. The results are shown in Table 11.

In this table, we find that the value of similarity of vertices pair Faye Wu and Fei Wu is 1.0000, which indicates that their academic activity information is identical. That means the similarity between them has been maximized. The similarities of Jia Gao and Di Feng, Xinhua Zou and Xingxing Zou, and Zhe Feng and Kang Du are 0.7450, 0.7463, 0.8546, and 0.7500.

5.5.3. Regression Analysis

We chose the vertices whose similarity in subnetworks is zero and extracted their transaction information from database. It is shown in Tables 12, 13, and 14.

We can see from Table 12 that Jia Gao and Di Feng studied in three different universities. Likewise, in Table 13, Xinghua Zou and Xingxing Zou studied in different colleges as well. In Table 14, the publications of Zhe Feng and Kang Du are entirely different. These situations indicate that these three pairs of character are different in education and publication activities. It leads to differences of academic relationship between them. However, high similarity of other types of academic activities leads to high value of SimTAP of these characters. It is even higher than threshold θ so that these characters cannot be screened out from networks. This situation adverse impact character uniqueness identification. This problem can be solved by calculating TAPs similarity and screening out redundant character vertices from social networks.

Based on experiment results in Section 4, we calculated similarity of candidate redundant vertices in H. The results of structure error calculation are shown in Table 15.

We got the redundant vertices set after structure error calculation and then calculated the similarity of these four vertices. θ was set as 0.80; the results are shown in Table 16.

5.5.4. Redundant Vertices Merging

After vertices screening we merged redundant vertices and and their relations; then we got academic network without redundant information. In Figure 9, we can find that and its relations were removed by vertices merging, but is saved. Likewise, we can save and remove in the process. Compared to the network before merging, the relations between and neighbors were not changed. It indicated that the vertices merging was correct.

The analyzing above indicates that we can promote accuracy of vertices uniqueness identifying based on structure error calculation and transaction activity paths similarity. Similarity threshold setting implements vertices screening, which is the basis of redundant vertices merging. Therefore, our solution realized character correction in social networks from multimedia datasets.

6. Conclusion and Future Work

In this paper, we introduce the framework of social network modeling via multimedia data. Then, we present the notion of structure error according to structure features of networks and vertices merging principles and then calculated structure error and screened out redundant vertices by using transaction information to build social networks. Besides, we designed algorithm of vertices similarity which can precisely measure character vertices uniqueness and created redundant vertices set. Finally, we removed redundant information in a network by merging redundant vertices in the set. Our solution improved the accuracy of character uniqueness recognition and solved character correction effectively in a network. At present, we set threshold empirical during experiment, but we do not implement intellectualized adaptive adjustment of threshold yet. In future work, we will compute the range of threshold based upon large amount of network data and statistical techniques and then design adaptive adjustment algorithm.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (61272150, 61379110, 61472450, 61402165, 61702560, S1651002, and M1450004), the Key Research Program of Hunan Province (2016JC2018), and Science and Technology Plan of Hunan Province Project (2018JJ2099 and 2018JJ3691).