Abstract

Social recommender systems, such as “Who to follow” on Twitter, utilize approaches that recommend friends of a friend or interest-wise similar people. Such algorithmic approaches have been criticized for resulting in filter bubbles and echo chambers, calling for diversity-enhancing recommendation strategies. Consequently, this article proposes a social diversification strategy for recommending potentially relevant people based on three structural positions in egocentric networks: dormant ties, mentions of mentions, and community membership. In addition to describing our analytical approach, we report an experiment with 39 Twitter users who evaluated 72 recommendations from each proposed network structural position altogether. The users were able to identify relevant connections from all recommendation groups. Yet, perceived familiarity had a strong effect on perceptions of relevance and willingness to follow-up on the recommendations. The proposed strategy contributes to the design of a people recommender system, which exposes users to diverse recommendations and facilitates new social ties in online social networks. In addition, we advance user-centered evaluation methods by proposing measures for subjective perceptions of people recommendations.

1. Introduction

Social media and social networking services such as Twitter are widely used in professional cooperation within and across organizations, helping to gain new insights and share knowledge. The functionality of recommending new connections is essential for expanding the social network and introducing new professional ties. Such people recommender systems represent the areas of social computing and social matching [1], which are argued to require careful design of the algorithmic principles [2]. Thus, people recommenders aim at influencing followership by suggesting seemingly suitable others based on user modeling and predictive analytics.

The majority of existing approaches tend to support homophily bias [3]—a tendency of preferring others with similar characteristics as oneself, focusing on similarities in user-created content [4]. Another commonly used principle is the triadic closure [5] in the followership networks [6] that focuses on friend-of-a-friend connections. Furthermore, the “Who to follow” feature on Twitter has been found to favor already popular users and promote uni-directional network connections [7]. A recently much-discussed concern is that network-based algorithms on social media can lead to echo chambers and perpetuate social polarization [8] because they are efficient in reproducing existing connections but limited in developing new ones. Therefore, introducing new social ties is likely to be based on similarity or close social vicinity of the active user.

Consequently, an important goal has been set to increase diversity in the recommendations [9, 10], potentially decreasing human and algorithmic biases [11]. Our work highlights this goal toward diversification and heterogeneity, especially in the professional networking context where diversity is seen as a key driver for fruitful collaboration [2]. Diversifying people recommendations can enable unexpected yet valuable social encounters [12], which require alternative recommendation strategies to identify relevant people in the vast and complex Twitter network. Traditional recommender systems research seeks to optimize algorithmic accuracy and effectiveness [13, 14], creating algorithms that can reproduce actors’ current behavior as accurately as possible [15] rather than aiming at increasing diversity. In turn, focusing on accuracy results in a lack of user-centered research addressing the intricacies of recommendation strategies regarding the desirable degree and types of diversity exposure. To this end, understanding the users’ subjective perceptions of the relevance of given diversity-oriented people recommendations is crucial.

An ongoing merger of three nearby universities provided an opportune case study for exploring a new social matching strategy on Twitter. This merger raised a need to enable cross-sectoral collaboration between scholars and stakeholders within the new university community [16]. Prior research suggests that bridging polarized intellectual communities and increasing social awareness contributes to developing creativity and innovation capabilities [17]. The pool of Twitter users following one of the to-be-merged universities represents an implicit community of interest in research and innovation with various backgrounds, disciplines, and areas of life at a specific locality. To make this community explicit, we address the untapped potential for professional social matching by introducing new connections with a diversification strategy that subscribes to the principle of balancing between similarity and diversity [2, 18]. Specifically, we vary degrees of diversity in the social network structures while at the same time seeking shared interests and topics by measuring the similarity of the produced content.

To apply and evaluate the recommendation strategy in practice, we collected tweets and followership data on more than 12,000 actors who follow the Twitter account of at least one of the three universities. To remedy isolated social groups on Twitter, we suggest reshaping the social network structures rather than exposing the users to more diverse content. In contrast to prior research, which typically analyzes only followership ties [19], we also use mention-based social networks since mentions are stronger interaction indicators between actors. Such an approach allows for identifying three topology-based structural positions in the active user’s egocentric network [20]—Dormant ties, Mention-of-Mention, and Community membership. While previous research touched on three structural network positions [21], in this paper, we provide their extended definition and description of the analysis procedure and present empirical findings of an online user experiment on subjective perceptions of the produced recommendations.

To empirically study the proposed diversification strategy, we set the following research question: How do recommendations based on the proposed structural positions associate with the subjective users’ perceptions of the relevance and willingness to follow-up? Unaware of the different recommendation groups, 39 voluntary Twitter users in the target community evaluated a total of 288 recommendations (72 from each proposed structural position and one baseline group). The analysis shows that the proposed structural positions can help introduce diversity exposure in different ways: remind about forgotten ties, motivate to connect with new people, and help enter latent communities. Thus, the paper contributes to interdisciplinary research on social and people recommender systems by proposing a nonconventional perspective for diversifying the pool of people recommendations and, prospectively, making online social networks more heterogeneous.

We first outline existing conceptualizations of diversity and similarity within the context of interpersonal relationships and social matching. Next, we outline the existing people recommendation approaches and diversity-enhancing mechanisms on Twitter. Finally, we review research on user-centered evaluation of recommender systems.

2.1. Optimizing for Diversity or Similarity

Concepts of similarity and diversity are two essential polarities in social participation. Driven by the natural tendency of humans to prefer similar others [22], homogeneity is preferable when establishing trustworthy and coherent relationships [23]. At the same time, diversity is vital for productive and innovative collaboration [24]. Prior research has studied the perceived diversity of social relationships [25] and explored how diversity dimensions (e.g., cognitive, physiological, and demographic differences) are addressed in Human-Computer Interaction research [26]. User-centric recommender systems research is interested in diversity as a design goal to overcome algorithmic biases [11, 27] and drawbacks of personalization in information filtering [28, 29]. The common conceptual aspect across prior literature is that diversity is seen as the opposite of similarity [30] and has been defined, for instance, as average dissimilarity [31], distributional inequality [32], and nonredundancy [33]. Therefore, diversity can be interpreted as a perceived difference or measurable distance between all recommendations presented to the user.

While both similarity and diversity can be substantial, optimizing for either of them has been criticized [10]. For instance, social recommendations built on the principle of similarity might strengthen existing communities but can also lead to social polarization and echo chambers [34], hampering information flow, innovation, and creativity [35]. Extreme diversity among community members can negatively affect, for example, knowledge sharing and decision-making [36], resulting in conflicts, especially in the case of surface-level social and cultural differences (e.g., demographic qualities). Thus, researchers have investigated how to overcome or decrease the impact of the abovementioned adverse effects. For instance, it has been found that actors should share common ground in terms of background qualities, values, or goals to establish fruitful relationships [37]. At the same time, professional roles, capabilities, and skills should vary [38]. Rajagopal et al. [39] suggest that matching people based on dissimilarities of attitude or opinion toward the topic of interest results in better learning experiences compared to similarity-maximizing recommendation approaches. Geared toward diversity-enhancing approaches in the social matching of scholars, researchers have proposed recommending not only very similar others but also somewhat similar and different people to extend the social circles [13].

In this study, we focus on the so-called diversity exposure [10] that refers to “the content that the audience actually selects, as opposed to all the content that is available.” Diversity exposure is associated with studies on detecting expertise and opinions online to extend personal autonomy (individuals’ choices) and overcome echo chambers [40] for more informed rather than polarized opinions. We contribute to the research on diversity exposure by introducing a strategy that exposes Twitter users to the diversity in their social networks. Driven by the idea that fruitful relationships benefit from both shared interest and diversity, we conduct an experiment controlling for similarity and focusing on the effects of social diversity. While the concepts of diversity and similarity are well studied regarding cognitive qualities, personality, and demographics, their manifestation on social networks remains understudied. Considering that social structure encapsulates various human biases, our approach aims to decrease their impact by enhancing diversity in the composition of individuals’ social networks.

2.2. Approaches to People Recommendations

Epistemologically, Twitter-based people recommendation approaches utilize user modeling based on data retrieved from basic features of the platform: “follow,” “tweet,” “mention,” and “retweet” [19]. Accordingly, content-based approaches focus on analyzing textual content, such as tweets and retweets, while network-based approaches examine followership and mentions relationships. Table 1 provides an overview of existing approaches for recommending people on Twitter.

The most conventional approach identifies similarities in users’ topics of interest content-wise and shared audience in social networks. In addition to similarity-based approaches [41], the number of followers and followees in a user profile can be used for producing recommendations based on the “popularity” dimension [42]. Recommendations can also be based on users’ activities such as tweeting, mentioning, and retweeting [43]. For example, depending on the social matching scenario, the most popular and active users might be prioritized or omitted from the list of recommendations.

Since traditional recommendation approaches have been criticized for fostering human and algorithmic biases [45], recommender systems research has recently explored different approaches for diversification. In the context of people recommendations, these approaches fall into two categories. The first relates to the diversity of features—the most conventional approach that focuses on deriving multiple user features, that is, explicit or implicit characteristics such as interests, social network, affiliation, and others. For example, Yuan et al. [46] proposed extracting contextual features such as mobility and activity for people recommendations on Twitter. Guimarães et al. [47] proposed an extension of users’ features by simultaneously utilizing content-based, collaboration-based, and user-based information, thus increasing user modeling accuracy and the effectiveness of people recommendations.

The second category relates to the diversity of analytical procedures—utilizing hybrid analysis techniques for filtering the recommendation pool. A representative example of this approach complements identifying the content similarity with sentiment analysis to create emotion-based recommendations for matching Twitter users with shared topics and similar [48] or different [49] emotional attitudes toward them. Jacovi et al. [50] argued for mining person and content interest relationships to complement existing approaches based on similarity and familiarity. According to the authors, merely being similar or familiar with a person does not imply directional interest, yet it is essential for establishing new ties. We also contribute to the diversity of analytical procedures by combining different analyses for social tie identification accompanied by retrieving the topic similarity from the content of users’ tweets. In addition, we consider contextuality through boundary specification for the recommendation pool: geographically bounded shared interests serve as a common ground across members with inherent internal diversity of expertise sectors.

From the perspective of social networks, people recommendation approaches are limited to the analysis of followership networks and typically utilize triadic closure principles [19]. Smith et al. [51] argued that the nature and topology of networks on Twitter are underutilized, and conventional filtering or recommendation algorithms are trapping users to homogeneous content and social connections. Following the call for diversifying social network structures, Sanz-Cruzado and Castells [52] proposed recommending weak ties derived from dynamic interactive networks (e.g., based on retweets or mentions) of Twitter users. However, their evaluation study is based purely on comparing generated recommendations for forming connections in real life. Importantly, they do not ask users about their perceptions of the recommendations. Although we firmly subscribe to the overall goal of Sanz-Cruzado and Castells’ work, we approach the network-based recommendation mechanisms and the evaluation procedure differently. We propose utilizing both followership networks to identify weak ties and mention-based networks to reveal interaction-driven weak and tacit connections. Aiming at user-centered evaluations beyond the accuracy [15], we also measure subjective perceptions of recommendations from different structural network positions.

2.3. User-Centric Evaluation of Recommender Systems

Recommender systems traditionally utilize system-centric evaluation methods and rarely assess the quality of recommendations with user-centric experiments [15, 53]. System-centric methods algorithmically simulate the accuracy by comparing the estimated opinions regarding the value of recommendations with pre-built ground truth datasets [54]. User-centric evaluation collects opinions and observes behavior during the interaction with the recommender system [55]. Prior research suggests that system- and user-centric measures could lead to contradictory results [56]: recommendations that the system estimate to be relevant may not be perceived the same way by the user. Therefore, the need for operationalizing subjective measures to evaluate recommendations’ quality has been raised [13].

Nevertheless, the research on defining and applying subjective evaluation measures in practice remains scarce. Existing research primarily focuses on assessing the system’s objective aspects, such as interaction effort and efficacy [56]. The measures that aim to reveal users’ attitudes regarding recommendations are mainly driven by the idea of evaluating trust toward the system and its functional effectiveness. For instance, measures such as perceived accuracy [55] and familiarity [57] were proposed assuming that recommendations that best match the user’s interests and are perceived as familiar increase the trust toward the system and imply efficiency. The subjective measures of novelty and diversity are driven by the goal of revealing the users’ satisfaction [58, 59]: the novel and diverse recommendations can increase the subjectively perceived usefulness.

In summary, existing evaluation measures have been proven suitable for item recommenders (suggestions on products and content). However, as objects of recommendations, people represent more complex quality criteria that can affect decision-making regarding evaluating their value. Considering social matching scenarios for professional social networking, the subjective perceptions on recommendation relevance can be influenced by a particular need for partnering. This calls for context-specific operationalization of evaluation metrics [2]. Besides, there are no established measures for subjective perceptions of the people recommendation to our best knowledge. This paper proposes evaluating the relevance of recommendations from two perspectives—the value of recommended people for professional activities and their topics’ usefulness. In addition, we operationalize measures for evaluating low- and high-cost follow-up activities.

3. Exploring and Defining Structural Network Positions

Our overall matching strategy is to introduce people who share similar interests based on the tweets’ content (e.g., shared scientific interests) but have only an indirect or inactive connection in the social network. Thus, by controlling content similarity, we can compare subjective perceptions of recommendations based on the different structural positions defined as follows:(i)Dormant tie (Dorm)—reintroducing existing followee with whom user did not have any explicit interactions; as a recommendation mechanism, this could remind about possibly ignored or forgotten ties [60];(ii)Mentions-of-Mentions (MoM)—a friend-of-a-friend type of a connection [61] in the mention network; in contrast to typical followership networks, the mention network is based on more explicit interactions;(iii)Community membership (Com)—a user identified to belong to the same community cluster [62]; this could introduce new people in a computationally identified network cluster with no explicit followership and mention-based ties;(iv)The rest of the population (Rest) as a baseline condition—all the other users who follow at least one of the institutional accounts, considered as the most random source of recommendations.

In the following, we describe the procedure for exploring and defining the three structural positions as potential recommendation strategy, including details on data collection, data processing, and analysis methods.

3.1. Data Source, Cleaning, and Preprocessing

We used the official Twitter API to collect followers of to-be-merged universities and their recent tweets (See Figure 1, step 1). The raw Twitter data were stored on a MongoDB database (https://www.mongodb.com/), a flexible data model that allows development without a predefined data schema. The system is implemented in Python, and we use the PyMongo package (https://pypi.org/project/pymongo/) to set up a communication channel with a MongoDB database. We collected tweets and followership data using separate modules for each task, respectively, “GetTweets” and “GetFollowers.” The preprocessing phase takes care of the tweet text cleaning task (See Figure 1, step 2) in three stages: (1) converting letters to lowercase, (2) removing the English stop words, and (3) removing the nonletter characters and URLs. Since we aimed to generate person-to-person recommendations, preprocessing also consisted of manually filtering out the organizational accounts (e.g., Twitter profiles of local companies). Next, we generated models of target users (See Figure 1, step 3) by collecting the statistical information regarding followers of universities’ accounts, including the total number of user’s tweets, the number of languages in use, and the number of tweets in each language. An index is created for each user in the database. The text corpus from all the cleaned tweets of a user is collected as the corpus profile (“corp_profile”).

We selected users who follow at least one of the selected four university-related Twitter accounts in Tampere, Finland. For each follower, we collected their 500 most recent tweets. The collected dataset consisted of 12,809 distinct followers and 3,523,397 tweets. The content analysis could only be done within one language corpus because of a lack of analysis procedures supporting the local language. Therefore, we excluded Twitter profiles that contain less than 50 English tweets (out of the most recent 500) from the analysis and the pool of potential recommendations. It is noteworthy that English is actively and proficiently used by most of the users in the dataset. Tweet language distribution is as follows: 58.70% are Finnish, 31.06% are English, and 10.24% are other languages. As a result, the final dataset comprises 4,474 users and 933,785 English tweets.

3.2. Social Network and Content Analyses

The data analysis (See Figure 1, step 4) comprises building mention-based and egocentric followership networks for detecting structural network positions and content analysis to obtain cosine distances for measuring the content similarity. To identify structural network positions (See Figure 1, step 5), we utilized the NetworkX Python package (https://networkx.github.io). NetworkX allows to model, manipulate, and analyze the structure of networks by specifying nodes and edges between them. In our use case, the nodes represent distinct actors in the Twitter network, and the edges illustrate the relationships between them. We created a directed mention network from the collected dataset. We used the mention network to identify the MoM and Com groups. The Mentions-of-mentions group follows the triadic closure principle—if user A directly mentions user C and B, then C would be recommended to B and vice versa. For the Community membership, we utilized the Louvain Modularity algorithm [63] directly in the mention-based network to detect groups of strongly related users without an explicit connection. The Dorm structural position is identified by utilizing the followership network populated directly from the Twitter API. An edge connecting two nodes in the followership network represents an existing followership link between two users on the platform. We identify the Dorm structural positions by utilizing both the mention and the followership networks.

For the experiment stage, we set the requirement for users to have at least three potential recommendations per each structural network position and apply the filter accordingly (See Figure 1, step 6).

We run the content-based similarity analysis utilizing the unsupervised topic modeling technique Latent Dirichlet Allocation (LDA) [64]. Each topic in the LDA model is constructed with a multinomial probability distribution of words. Given a document, in our case, a user’s corpus profile, the LDA model can calculate the probabilities of being in each topic of the document. Thus, a vector of LDA topic representation for each document can be generated, where a given number of topics Z within a corpus comprises documents t. In our case, one document consists of a set of tweets per user. LDA defines each topic over a set of n-grams w. Accordingly, the process can be formulated as follows:

Next, we use cosine distance to measure the similarity between two users as each of them has its own LDA topic vector. The lower the cosine distance value, the higher the similarity. The cosine distance is a similarity measure metric between two nonzero vectors. Given two vectors u and v, the cosine distance between them is calculated as follows:

Therefore, content analysis (LDA + Cosine distance) allows sorting the recommendations within each group of structural positions from the most to least similar in relation to the target user (See Figure 1, step 7). Next, for each follower of the target university accounts, the top three content-wise similar users were identified within each structural position group (See Figure 1, step 8). Thus, all the given recommendations for each eligible participant have a maximum degree of similarity but belong to different structural positions. Finally, three sets of recommendations are generated for each identified eligible participant (See Figure 1, step 8). Each set consists of four recommendations: one from each of the studied structural network positions and the baseline group (Rest).

4. Experiment Design

The objective comparison of the number of possible recommendations from each group of structural positions per participant demonstrated an insignificant overlap (see Table 2). The numbers vary between users, depending on the number of followees and activity on Twitter (see examples in Figure 2). We aimed to provide a minimum of four and a maximum of twelve recommendations for each participant (1–3 from each group), which introduced the requirement for eligible respondents to have at least three other Twitter users in each structural position. 574 users out of the 4,474 met this requirement. Evaluating one set (4 recommendations) was mandatory for each participant, and the other two sets of four recommendations were voluntary. By providing up to three sets of recommendations for each participant, we wanted to achieve a higher number of evaluations per each structural network position. Some respondents evaluated all three sets of recommendations (12 in total), some only one or two sets (4 or 8 in total). This procedure resulted in subjective perceptions on 72 recommendations from each structural network position to compare them statistically.

4.1. Procedure

The evaluation of the proposed structural network positions was carried out with two online surveys deployed on Google Forms: (1) a background questionnaire querying about demographics and the participation consent; (2) a survey with a personalized list of four other Twitter users (See Figure 3) and a set of questions to evaluate the recommendation. We chose to use Google Forms because it allows scripting-based automation to generate personalized surveys with tailored lists of recommendations.

The respondents were given no information about the types of structural network positions or why these individuals were particularly recommended to them. The order of recommendations was randomized within each set. The evaluation survey measured several subjective constructs, including perceived familiarity and perceived relevance of the recommendations and one’s willingness to follow-up on them. No existing subjective measurements for perceived relevance could be found in literature, particularly for people recommendations. Therefore, the statements were operationalized based on the authors’ personal experiences and insights on academic collaboration and user experience evaluation. Initially, over 20 candidate items were iteratively assessed and refined within the project team and close colleagues, resulting in 13 items used in the survey. Perceived familiarity was measured using a 5-point Likert scale: Very unfamiliar (1)–Very familiar (5). All other items were measured using a 7-point Likert scale: Strongly Disagree (1)–Strongly Agree (7). In addition, open-ended questions inquired about the overall impressions about the recommended person and the respondent’s reasoning behind the evaluations.

4.2. Recruitment and Respondents

We subscribed to the importance of research integrity and followed the policies provided by the National Ethical Committee in Finland. Accordingly, a study does not require an ethical review if it includes informed consent and does not involve any of the following: underage subjects, exposure to strong stimuli, potential long-term mental distress, or intervention with the physical integrity of participants. The study was identified as low-risk and, hence, did not require an ethical review. The participants were provided with a consent form that included a link to a detailed ethics disclaimer explaining the integrity and data management principles.

We invited eligible participants over e-mail. The targeted participants’ contact information was publicly available on their Twitter profiles, and anyone could access it. The invitation consisted of a short description of the study, including links to the Background and Consent survey and a detailed ethics disclaimer. As an incentive for participation, we organized a raffle of Amazon vouchers. Out of the 574 eligible respondents, 68 signed up, and 39 participated in the experiment. They evaluated 288 recommendations in total—72 recommendations per each structural network position. The sample includes 19 male and 19 female Twitter users (one unspecified), all but one being Finnish and residents of Finland. The ages vary from Min 24 to Max 63, with an average of 43.9 and a median of 45. The majority (N = 29) of respondents are full-time workers, mainly university researchers. However, many other knowledge work professions are included, such as lecturers, entrepreneurs, community managers, coordinators, and project managers. The number of followers per respondent varies between 150 and 65,300, the average being 2,994 and the median 762. The number of followees varies from 311 to a maximum of 65,200, with an average of 3,289 and a median of 1254. This indicates that the respondents, on average, are active Twitter users with an extensive number of connections.

Along with the background information, we queried the respondents’ typical behavior and attitudes to professional social networking with 7-point Likert statements (See Figure 4). On average, the respondents frequently network with other people, maintain their networks, and are typically careful in choosing with whom to network. From a professional perspective, their occupation typically requires intensive collaboration. Being successful in their work depends on established social ties, and they use Twitter to support their professional networks. This implies that social networking is an essential element in their professional lives. The majority also indicated that they mostly interact with like-minded people at work.

The participants were provided with a consent form that included a link to a detailed ethics disclaimer explaining the integrity and data management principles.

4.3. Survey Data Analysis

The collected responses were imported to SPSS for statistical analysis, addressing two objectives. First, we tested if the proposed structural positions are perceived to be different from each other and could thus serve as alternative analytical mechanisms. The evaluation has a thrice-repeated within-subjects design with four categorical data points per respondent. As the collected data are ordinal, we utilized a nonparametric Friedman test with Bonferroni corrected pairwise comparison to measure statistical differences. The input data for the Friedman test was in a rank format, where rank represents the frequency of each Likert scale value per evaluation statement. The second objective was to identify correlations among experimental variables. We were particularly interested in revealing whether perceived familiarity or attitude toward social and professional networking correlate with perceived relevance and willingness to follow-up on the recommendation. We utilized a nonparametric bi-variate Spearman correlation test as the collected data are ordinal (Likert scale).

5. Findings

We first provide results on objective measurements of similarity and respondents’ perceived familiarity with recommended Twitter users, followed by subjective perceptions regarding the relevance of the recommendations. Next, we describe the respondents’ readiness to engage in follow-up activities with the recommended people. Finally, we report bivariate correlation tests on associations between various variables, which provide additional insights and future research directions. In all the subsections, we first report statistical results and continue to present related qualitative findings.

5.1. Objective Measures of Similarity

The respondents were unaware of the cosine distances (similarity measures) to avoid biased evaluations of recommendations based on the structural network positions. We aimed to pick recommendations with as equal cosine distances as possible (i.e., smallest possible variance in terms of content similarity). However, the measures are personal for each respondent and depend on the size of the recommendation pool. Since the Com and Rest recommendation pool sizes are larger, there is a higher probability of having potential recommendations with a smaller cosine distance (See Figure 5(a)). The recommendation pool of Dorm and MoM is significantly smaller, and therefore, on average, the distance is higher.

The scatterplot of respondents’ scores given to recommendations over the cosine distance values demonstrates a somewhat random distribution, indicating no dependencies between them (See Figure 5(b)). A correlation test further supports this fact in Section 5.5. Therefore, we argue that the slight variance in content similarity does not prevent comparing different structural position groups.

5.2. Differences in Perceived Familiarity across the Structural Positions

The descriptive statistics results confirm that the most familiar recommendations belong to the Dorm group (See Figure 6). 44% of the recommendations from the Dorm structural network position fall into the category of either familiar or very familiar, 18% are somewhat familiar, and the remaining 38% are either unfamiliar or very unfamiliar. The other three structural groups primarily consist of unfamiliar people, with few outliers. In the MoM group, 71% of the recommended Twitter users were regarded as unfamiliar, 14% were considered familiar, and 15% somewhat familiar. The recommendations from Com and Rest groups have almost similar proportions of unfamiliar people—94% and 91%, respectively. The Friedman test indicated a statistically significant variation in respondents’ ratings of perceived familiarity across different structural positions. The pairwise comparison identified substantial differences in the evaluations of the Dorm group versus other groups.

As expected, in the open-ended questions, many respondents stated that they already follow many of the recommended Twitter profiles that belong to the Dorm group. Although the respondents might be aware of the recommended person, the analysis of the social network structures revealed a lack of explicit interactions between them. The feedback in open-ended questions also supported the cases of users being unfamiliar with their followees. A relatively large number of unfamiliar followees in the Dorm group could imply that followership indeed is a weak indicator of actual social relationship and familiarity. As the act of following is typically a low-cost action, it might be even hard to keep track of and maintain their connections, especially when the number exceeds a thousand:

(Dorm) “This person has very versatile tweets and retweets. […] I already follow her, but I did not remember that.” (R25, Staff Scientist; 1,139 followers, 1,472 followees)

While the Twitter user interface allows seeing followership relationships, it is more challenging for users to reveal connections based on mentions or especially mentions-of-mentions. Only a few respondents recognized that they have a bridging tie with the recommendations from the MoM group. For instance, one noticed that they have a shared professional connection with the recommended person:

(MoM) “The person and her tweets are really interesting for me. She is perhaps the only one of the groups I find likely to contact and discuss future research collaboration. […] Profile appears approachable, and she has been apparently already collaborating with some people I know.” (R1, Principal Research Scientist; 2,566 followers, 3,120 followees)

Being unaware of different structural network positions, the respondents were positively surprised by receiving many unfamiliar and diverse recommendations. In what follows, the findings demonstrate that being familiar with a person seems to increase the perception of relevance and willingness to follow-up on recommendations.

5.3. Differences in Perceived Relevance across the Structural Positions

The respondents’ subjective perceptions of recommendations provided additional confirmation of the distinct nature of the three proposed structural positions. There is an apparent prevalence of positive attitude toward recommendations from the Dorm and MoM groups in the evaluations of both content relevance and professional relevance (See Figure 7). The Dorm group is perceived as the most favorable, while in the evaluations of content relevance, the opinion regarding recommendations from the Com and Rest groups split in half. Regarding the evaluation of professional relevance, the proportions of negative scores prevail.

Sometimes, I am unsure whether I am answering as a professional me or a private me. For instance, when it comes to one user profile, where a private me starts to think that it is interesting due to tweets about kids, and I have kids myself.” (R5, Project Researcher; 1,206 followers, 944 followees)

The pairwise comparison further demonstrated statistically significant differences mainly between Dorm and Com, and Dorm and Rest. Interestingly, the difference between MoM and Com is strongly significant only in evaluating whether topics of the recommended person are of interest to the participant (Statement 1 in Figure 7).

As for the qualitative feedback, when rationalizing the relevance of recommendations, the respondents often address the importance of having similar interest topics. There is a clear positive tone in the qualitative feedback regarding recommendations from the Dorm and MoM groups. As addressed earlier, familiarity plays a significant role in evaluating relevance, and respondents often start their rationalization by explicating an existing connection with the person, if there is any. In the following example, followership relationships between the respondent and recommended person started after the face-to-face encounter at the conference:

(Dorm) “Lively, energetic, knows a lot about a host of topics, loves traveling. She is somebody I met at a conference a couple of years ago, and we have been in touch on social media as well.” (R3, Senior Lecturer; 295 followers, 584 followees)

In the next example, the respondent highlights that the recommended person is unfamiliar yet addresses the relevance of topics and the benefit of making a professional connection with a person from another university:

(MoM) “Seems active, topics relevant to me. I did not know him probably because he is in a “distant” university; it is always good to know new people from other universities.” (R13, University Researcher; 953 followers, 1,010 followees)

When evaluating recommendations from Com and Rest groups, the respondents seem to consider a variety of dimensions. For instance, the activeness of users in publishing tweets and their self-representation also play an important role in choosing whom to follow or with whom to interact:

(Com) “Tweets very seldom; the topic is somewhat interesting, but not enough for me to follow.” (R39, HR Director; 721 followers, 1,001 followees)

(Rest) “Publisher! Seems interesting at first, but as I scrolled down, it seemed a bit too professional for my taste. The tweets in English somehow made me lose my interest.” (R12, Researcher; 448 followers, 1,260 followees)

None of the respondents identified being socially connected with recommendations of the Com group, yet one respondent noticed the size of the community they share with the recommended person:

(Com) “An interesting personality. I was able to see that we have something in common only after a closer look at the profile. Good Tweets and Retweets. […] Also, the fact that we have 88 shared followers creates trust […].” (R6, Project Manager; 550 followers, 763 followees)

To sum up, the data imply that the perceptions of relevance vary across different structural positions, thus supporting the objectively observed differences between the proposed network-based matching mechanisms. Yet, relevance seems to be a weak motivator for respondents to follow-up and interact with recommended Twitter users, as discussed next.

5.4. Willingness to Follow-Up on Recommendations across the Structural Positions

The assessments of follow-up activities (See Figure 8) illustrate the weakest difference between structural positions, meaning that respondents are less open toward social interactions despite the level of recommendation relevance. Passively exploring tweets (statement 7) of recommended people from the Dorm and MoM groups is positively perceived, while other activities brought up primarily negative attitudes. The Friedman test demonstrated statistically significant differences in the evaluation of recommendations from three structural positions within all variables. The pairwise comparisons reveal apparent differences between the Dorm and Com, Dorm and Rest, MoM and Rest structural network positions regarding the intention to continue reading recommended person tweets (statement 7). There is a statistically significant difference between Dorm and Com, as well as Dorm and Rest groups regarding the attitude toward mentioning recommended people. Other activities do not illustrate very significant differences.

The respondents also addressed challenges in estimating the relevance of received recommendations and their intention for follow-up activities in the open-ended responses. For instance, one respondent mentioned that decision-making on the interestingness of a recommendation might be affected by the overall sympathy toward a Twitter user, making it challenging to draw a line between personal and professional interests:

According to one of the respondent’s reasoning, as Twitter is designed mainly for distributing knowledge, it was challenging to envision social interactions with a recommended person beyond the features that the platform offers:

Taking a look at a person’s Twitter account does not really tell me anything about what would happen if I would meet the person in real life. That is why I gave a neutral answer about the consequences of meeting with someone face-to-face. […] Real-time one-to-one conversation enables quick learning and multiple ways to dig up common interests. […] Twitter or other social media platforms do not provide the same opportunities; they give a very narrow view to a person, their expertise, views and what we could learn from each other.” (R39, HR Director; 721 followers, 1,001 followees)

Even though few respondents were somewhat positive about recommendations from the baseline group (Rest), being unfamiliar with the recommended person seemed to play a role in determining the willingness to follow-up:

Quite general Twitter profile. Professional but also other content as well, such as news. I liked how she tweeted about the academia/academic world, although we do not work in the same field of study. I would follow her if I knew her somehow other than through Twitter. She seems nice and relatively active.” (R12, Researcher; 448 followers, 1,260 followees).

In summary, while respondents expressed the readiness to engage in low-cost interactions, such as exploring the recommended profiles or starting to follow them, they were hesitant to consider initiating interaction beyond the Twitter platform so soon after seeing the recommendation. This is particularly the case if the actions require face-to-face interactions or direct contact. This is understandable given the short time frame for exploring the costs and benefits of potential social interaction and the limited view of the recommended person’s profile. Besides, as one of the respondents admitted, Twitter is perceived as a platform for passive social behavior to broadcast and consume content. The users are accustomed to Twitter not providing features to extend the interactions to other channels.

5.5. Correlations across the Evaluation Variables

In addition to looking at how the evaluations differ across predefined structural positions, we explored various statistical associations between the variables to identify future research questions. The Spearman test revealed several statistically significant positive correlations (see Table 3). In particular, perceived familiarity positively correlates with all the other variables on subjective evaluations, especially with the perception of professional interest (statement 4) and follow-up activities such as an intention to mention the recommended Twitter user (statement 10). The respondents’ background variables and social networking attitudes, such as frequency of socialization and activeness of maintaining existing ties, demonstrate a relatively strong correlation, particularly with the willingness to follow-up on recommendations (statements 8–12). In addition, the test indicates that the scores of high-cost follow-up activities (statements 10–12) increase with higher ratings of Twitter use for professional networking. The test results also imply no dependencies between objective measures of similarity and subjective perceptions on recommendations, which were aimed for. Overall, while correlation tests do not infer causal relations between the variables, the test hints at interesting statistical associations that should be investigated in more detail, for instance, to cover not only correlations between the recommendation evaluations but also personality and attitude-related aspects.

6. Discussion

Diversity has been a central concept in the design of information systems and social technologies, particularly CSCW and HCI research exploring its different forms (e.g., cognitive, physiological, demographic) for more inclusive and accessible technologies. This paper extended the discourse around diversity by focusing on the importance of structural diversity in social networks. We proposed identifying structural network positions in a multidimensional space of social networks, allowing the exposure of Twitter users to a variety of potential connections that they would otherwise likely miss.

To answer our research question, the findings illustrate that recommendations are perceived differently. Thus, both the objective measurements and subjective perceptions indicate the distinct nature of the proposed three structural positions. The respondents’ relatively positive evaluations of relevance suggest that the proposed recommendation strategy is a meaningful approach for diversifying people recommendations.

Furthermore, the fact that the respondents could identify relevant others from all groups implies various internal and external factors that might influence the subjective perceptions. It provides evidence that identifying recommendations within a latent community of interest (academic institutions at a specific locality in this case) is a promising approach to boundary specification.

6.1. Contributions and Reflections

Our findings contribute to the research on diversity exposure of people recommendations by defining structural positions in egocentric networks and analytical procedures for identifying related recommendations from Twitter data. The following pinpoints the possible ways of how our strategy and the different structural positions might contribute to the diversification agenda:(i)A recommendation from the dormant ties group can remind users about their existing connections that possess expertise or perspectives that they need at the moment;(ii)Mention-of-mention recommendations can motivate interaction with new people in a trustworthy manner as there are bridging actors in-between;(iii)Recommendations based on community membership structural position can motivate the user to enter a latent community in a different area of the overall network, however with shared topics of interest.

Thus, the proposed structural positions can help introduce diversity exposure in different ways, prospectively suggesting connections that Twitter users would otherwise overlook.

At the same time, the findings indicate a significant role of familiarity in the subjective evaluation of recommendations. As expected, most recommended Twitter users were unfamiliar to the respondents. However, there were some familiar people, even in the “community membership” and the “rest” groups. This could be explained by the empirical context of the experiment, where boundary specification was based on the followership of the selected institutional Twitter accounts bound to a locality. We assume that this would have a strengthening effect on perceived familiarity. Research on social psychology has also revealed that homophily bias increases the perception of familiarity [65]—the stronger the perceived similarity, the more preferable and familiar the person would seem to be.

A relatively large number of unfamiliar followees in the Dorm group could imply that followership indeed is a weak indicator of actual social relationships. Thus, even though there is an established followership link, it is worthwhile to remind users about people belonging to the Dorm structural network position, which could result in more explicit social interaction. The presence of a relatively high number of outliers in the familiarity evaluation can also be explained by some respondents having a large number of followers and followees, all of whom they practically cannot remember. It has been shown that cognitive and temporal limitations prevent people from maintaining the number and quality of their relationships [66, 67]. Besides, interactions on social media might create a false sense of connection [68] and do not match with offline relationships or predict the degree of familiarity between the actors.

In addition, this study contributes to user-centric evaluation methods in the context of people recommender systems. Prior research on evaluating recommender systems is largely built on the assumption that the more accurate the algorithms, the better the user experience [15]. While this approach is useful in evaluating item recommendations (e.g., products or multimedia content), recommending people involves a different notion of recommendation quality. The operationalized measures of subjective perceptions presented in this work can be utilized in future research on evaluating the relevance and familiarity of social recommendations. However, measuring the potential follow-up activities beyond the intention to start following the recommended person worked poorly: the findings illustrate that respondents are generally not interested in speculating high-cost follow-up actions (e.g., face-to-face meetings). Thus, it is questionable to utilize such a measure as an indicator of recommendation quality, at least in the context of controlled experiments. That said, we acknowledge an inherent challenge in measuring the relevance of people recommendations. As the benefits of more heterogeneous social networks only surface over time, measuring the immediate impression about the relevance of a recommendation will likely not reflect their long-term value as a connection.

6.2. Practical Implications

Existing recommendation mechanisms shape the choices people make, influencing not only the diversity of interests and opinions but also social structures [69]. Mindful of the threat that such a high agency could strengthen structural issues like polarization and echo chambers, we believe that the recommendation strategy proposed in this article is worth pursuing in practical applications. Notably, the proposed structural network positions could contribute to systems design that can lead to more diverse exposure in individuals’ social networks. The analysis procedures could be transferred to many other social media platforms; after all, the proposed content analysis is not limited to hashtags, and the notions of followership and mentions are common in other services as well.

When applying the proposed strategy and the structural network positions in a real-life people recommender system, the restrictions on eligibility criteria for the users, driven by the experiment setup, can naturally be disregarded. There might be scenarios when Twitter users do not have enough recommendations from the Dorm type of a structural position. However, our finding demonstrated that the MoM and Com groups result in numerous options even for the users with a small number of followers and followees. The effectiveness of the proposed strategy can be strengthened by increasing transparency regarding the recommendation logic in the actual system and explicating the potential value of recommendations from each group. This, in turn, might also improve the willingness to follow-up on them.

6.3. Limitations and Future Work, Experimental Setup and Generalizability

The conducted experiment naturally comes with limitations that can affect the validity and reliability of the findings. First, although the presented diversity-enhancing recommendation strategy seems promising, we could not yet compare that with other strategies in this pioneering study. Thus, the assessment of the goodness of the recommendation strategy remains quite preliminary based on this experiment. In future work, a comparison with conventional recommendation algorithms would show the goodness in relation to the currently used standards.

In addition, we only tried three calibrations within the proposed strategy and with a sample size limited by practicality and data availability. Regarding generalizability, the respondents mostly represent the same geographical area and cultural background due to the selected focus of introducing users who feel some affinity to the to-be-merged universities. The sample of participants for the experiment might also be considered biased, as the number of respondents’ followees and followers is higher compared to an average Twitter user. Large-scale studies and comparisons against different baseline recommendations are required to prove the effectiveness of the overall matching strategy and the structural positions as recommendation mechanisms. Nevertheless, as the paper lays the groundwork for a new diversification strategy, it is essential to show that such an alternative strategy is sensible from the users’ viewpoint and technically feasible before comparing it with others. We call for follow-up research to also compare the effectiveness of current and other alternative algorithmic approaches.

Network analysis. Our recommendation approach utilizes social network analysis, which has been critiqued due to several issues [70]. First, modeling networks is limited in terms of deriving personal roles and interpersonal experiences. It is an oxymoron to reduce multi-faceted and dynamic social relationships into network structures with simple node and edge features. Second, network-based analysis can hardly reveal a full and truthful picture of real-world relationships. Nevertheless, our study demonstrates that it is possible to analyze Twitter social networks in new ways that can advance the social matching of individuals.

Content Analysis. Due to the lack of accurate content analysis procedures for the local language, the content analysis was limited to English tweets. In addition, we did not distinguish between personally created tweets and retweeted tweets. These factors might have decreased the accuracy of representing an individual’s topics of interest, which, in turn, could affect the subjective perceptions of the recommendations. This study also does not consider the segmentation of the participants according to the types of Twitter use or personality traits [71]. However, the results of correlation tests imply that such factors can be relevant. This opens avenues for investigating various personality-related and other background variables that might affect the perceived relevance of recommendations and readiness to follow-up on them.

7. Conclusion

Despite the extensive prior research on people recommendations on Twitter, the social network structure perspective has been generally underutilized. To address this gap, we proposed a new recommendation strategy for a Twitter-based implicit community of interest, prospectively producing more heterogeneous people recommendations and positively diversifying the users’ social networks. The novelty of the proposed strategy lies in combining mention-based and followership-based networks to identify different types of structural network positions: dormant (followership with no explicit interactions), mention-of-mention (a friend-of-friend connection in the mention network), and community based (users belong to a shared community with no explicit followership and interactions). The findings illustrate that the proposed structural positions are indeed distinct from each other and that the respondents could find relevant users from all groups. However, the willingness to follow-up on recommendations is relatively low and primarily driven by perceived familiarity. We call for more design-oriented research to identify solutions that could increase the probability of follow-up actions. We conclude that a more comprehensive analysis of social networks and more human-centric methods to evaluate recommendations are necessary to improve the benefits and effectiveness of people recommendations on Twitter and other social networking services.

Data Availability

The data generated or analyzed during this study are included in this article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was conducted under Business Finland Project Big Match (3166/31/2017 and 3074/31/2017).