#### Abstract

Grouping based on social relationships is a complex problem since the social relationships within a group usually form a complicated network. To solve the problem, a novel approach which uses a combined sociometry and genetic algorithm (CSGA) is presented. A new nonlinear relation model derived from the sociometry is established to measure the social relationships, which are then used as the basis in genetic algorithm (GA) program to optimize the grouping. To evaluate the effectiveness of the proposed approach, three real datasets collected from a famous college in Taiwan were utilized. Experimental results show that CSGA optimizes the grouping effectively and efficiently and students are very satisfied with the grouping results, feel the proposed approach interesting, and show a high repeat intention of using it. In addition, a paired sample *t*-test shows that the overall satisfaction on the proposed CSGA approach is significantly higher than the random method.

#### 1. Introduction

Grouping optimization based on social relationships attracts more and more attention as social networks [1–11] have been growing rapidly in recent years. A social network is a special structure made of individuals (or organizations). It includes the ways in which individuals are connected through various social familiarities [1]. Through the analysis of relational network and the measurement of social relationships, grouping optimization is obtainable and can achieve a group objective.

In dealing with the grouping optimization problem concerning social relationships, some important issues should be addressed. Firstly, there is a deep discrepancy between the official and the secret behaviour of members [12]. A good approach to the grouping optimization should be able to find out the hidden information behind a group. Relations can be conceptualized at three levels of social complexity: individual, dyad, and group [13]. The approach should be able to disclose the hidden information in these levels. In addition, the approach should be cost-effective. Consequently, a simple method which can explore indeed thinking of individuals is needed. Another issue is that most social networks such as Facebook employ a linear model to measure the links between group members. They use the number of connections that a node has to evaluate degree, betweenness, closeness, and network centralization [6]. However, a nonlinear relation model should be used since the real relationship tends to be nonlinear. For example, suppose that two nodes have the same number of connections. The connections for the first node are all its first choices, while for the second one the connections are its, for example, second and third choices, respectively. The disappointment in being grouped with a second choice partner over a first is unlikely to be linearly related to that of being grouped with a fourth choice over a third. A nonlinear model should be developed to suitably describe the relative importance of relationships. Thirdly, the approach should be able to help organizations improve their performances. Finally, but not the last, many grouping optimization problems subject to some constraints are known as NP-hard [14–16]. In addition, the structure of social relations usually forms a complex network, causing the problem to be very difficult to deal with. Another cause to make the grouping optimization more complicated is the constraints that require each group to include some different attributes, such as ability, specialty, gender, and position.

To tackle the above-mentioned complex problem, a new approach based on a combined sociometry [17–21] and genetic algorithm (GA) [22–28] is employed. For convenience, we call this novel approach CSGA in short. To measure the social relationships between members and to explore the hidden information behind a group, a choice-making method [29] is employed. Group members are first asked to indicate their choices (or preferences) to other members in an index system that a one stands for the first choice, two the second choice, and the like until an allowed maximum number of choices is reached. Based on the choices, a sociogram can be drawn by using a sociometry tool [17–21], as illustrated in Figure 1. Previous studies showed that sociometric choices do tend to predict such performance criteria as productivity, combat effectiveness, training ability, and leadership [30]. Moreover, some indices of social status can be calculated. In this paper, a new nonlinear model for measuring social status is presented. From the sociogram and social status indices, the socially vulnerable or the potential bullying members can be found. Then the grouping optimization can be done according to the social status of individuals and the objective of the organization. Results from this study show that the proposed approach is effective in dealing with the grouping problem mentioned above. Furthermore, experiments founded on real data also show high satisfaction and repeat intention with this method.

The remainder of this paper is structured as follows. The grouping problem is briefly introduced in Section 2. In the subsequent section, the sociometry is briefly introduced and a new modified model for measuring relations between members is presented. Then proposed approach is presented in Section 4. Subsequently, results and discussion are presented. Finally, concluding remarks are drawn in Section 6.

#### 2. The Grouping Problem

The grouping problem belongs to a large family of problems which partitions a set of items into a collection of mutually disjoint subsets of , such that and . In this study, the grouping is to optimally assign members (individuals) to subgroups with members in each subgroup. Let be a set of members, a set of subgroups, a set of partners of a member, a set of attributes (or position), and a set of choices, where is the number of members in a group, is the total number of subgroups, is the number of members in each subgroup, and is the number of attribute . The members can list a number of preferred members with whom they would like to work with or sit beside by an index system that a “1” stands for the first choice, “2” for the second choice, and the like, up to a preassigned maximum integer . Note that a member cannot select himself or herself as his or her own partner. Slavin suggested that the best group size is from two to six [31]. For , , define as the preference given by member to being grouped with his/her partner and as the preference given by partner to being grouped with member . If partner is the first choice of member is equal to one, the second choice, is equal to two, and the like. A lower coefficient of preference means that a member has more preference with his/her partner. If member does not include his/her partner in their list of preferences, then is assigned a relatively large penalty value . The mathematical formulation, hence, can be expressed as follows: where. The objective is to minimize the total scoring value, as shown in (2.1). Note that the scoring value is composed of two main parts: the priority weight of member and scoring function . The priority weight can be assigned by two ways: directly assigned by the manager of the organization or calculated based on the social relationships. As for the scoring function , a popular function used is the squared function [29]. Equation (2.2) requires that each member is assigned exactly one subgroup. Equation (2.3) ensures that one subgroup is composed of members. Equation (2.4) requires that there are exact members with attribute in a subgroup. Note that depends on attribute . For example, if a basketball team is composed of a center (attribute 1), two forwards (attribute 2), and two guards (attribute 3), then , , and , respectively.

As the number of members in a group increases, the number of possible grouping outcomes, , grows at an unacceptably rapid rate; it grows exponentially. For example, for , when , when , and when . The optimal solution cannot be found in polynomial time. In addition, the relationship structure becomes a complex network as increases. To deal with the optimization problems concerning a complex network, it is impractical to use an approach like the exhaustive method since it takes a huge amount of computation time. Instead, some useful algorithms such as a genetic algorithm [22–28] for finding approximate or near-optimal solutions or strategies for improving efficiency, solution quality, exactitude, and more have been presented [32–35]. In this paper, a genetic algorithm [22–28], which is proven to be very effective in dealing with complex optimization problems, is employed to find feasible solutions.

#### 3. The Sociometric Analysis

Sociometry was developed by Moreno in 1934. Moreno defined sociometry as “the inquiry into the evolution and organization of groups and the position of individuals within them” [17]. It is a quantified method aiming at measuring and determining social relationships in groups [18]. Sociometric explorations disclose the hidden structures that give a group its form: the alliances, the subgroups, the hidden beliefs, the ideological agreements, the stars of the show, and so on. One of Moreno’s innovations in sociometry was the development of the sociogram, a systematic method for graphically representing individuals as points/nodes and the relationships between them as lines/arcs [19]. Given choices or preferences within a group, sociograms were developed and the structure and patterns of group interactions can be drawn on the basis of many different criteria [19]: social relations, channels of influence, lines of communication, and so on. Some terms in sociometry are the following:(1)Stars: those who have many choices;(2)Isolates: those with few or no choices;(3)Mutual Choice (MC): individuals who choose each other;(4)One-Way Choice: referring to individuals who choose someone but the choice is not reciprocated;(5)Cliques: groups of three or more people within a larger group who all choose each other.

There are many indices using values instead of categories to indicate the social status of individuals. Among the most popular indices, threeindices are introduced in the following paragraphs.

##### 3.1. Status Score Index (SSI)

One of the most critical events in the history of sociometric methods is the use of both positive and negative nominations [13]. The difference between the positive and negative nominations is called status score [20]. To easily figure out the social status of a member, a relative or a normalized score called was developed and defined as where denotes the total choice number by other members, denotes the total rejection number by other members, and is the total number of members in a group. Note that a member cannot choose himself or herself. Thus, rather than is used in (3.1). The value of is in the range []. A higher value of means that a member is more popular within a group. However, this index considers only one-way choice, and thus one cannot understand the mutual interaction between members from this index.

##### 3.2. Index of Sociometric Status Score (ISSS)

To consider the mutual choices between members, an index called [21] is defined as On the right part of the equation, and represent the number of mutual choice and the number of mutual rejection, respectively. is the allowed maximum number of choices that a member can make. The value of is also in the range []. A higher value of indicates that a member is more popular within a group.

##### 3.3. Modified Index of Sociometric Status Score (MISSS)

uses a linear function to express relationship. However, it fails to consider the relative importance of different choices. For instance, satisfaction about being grouped with a first-, a second-, or a third-choice partner should be different, and the satisfaction difference between the first to the second and the second to the third is generally quite unlike. Consequently, a modified model for measuring the status score should be presented. In consideration with this relative importance of different choices, a new index called is developed and defined as is composed of four parts: the contributions from the total choices, the total rejections, the mutual choices, and the mutual rejections, respectively. and , , , and if a member i is chosen by other members, rejected by other members, choose mutually with other members, and reject mutually with other members, respectively. , , , and are their priority weights, respectively. The value of is in the range []. A higher value of MISSS means that a member is more popular within a group. gives a more real description of relative importance of relationships. An important issue about is how to express the relations in the equation. A scoring function , which is a function of choice (preference), is used to measure the relations. A simple method to find out the function is the use of a questionnaire asking members to provide disappointment scores of being grouped with a second-, third-, fourth-, or fifth-choice partner relative to receiving their first-choice one. To illustrate the nature of the scoring function, choices were collected from two classrooms in a college and the values were calculated. The results are shown in Figure 2. The scoring functions appear nonlinear rather than linear. As mentioned in Section 1, a linear scoring function is unreasonable. To measure the social relationships, a nonlinear model should be used.

#### 4. The CSGA Approach

##### 4.1. The Procedure

The proposed CSGA approach is illustrated in Figure 3.

To begin with, the choices of members are collected and aggregated. A questionnaire is developed to collect the choices of the members and the reasons for choosing their partners. The questions are “whom in the group do you want to have a lunch or dinner with?” as a selection of lunch/dinner partner and “whom in the group do you want to sit beside and discuss with?” as a selection of learning partner. Table 1 illustrates a part of the questionnaire.

To be fair with all members, each member should fill in the same number of choices. Chen et al. [29] showed that a fewer number of choices filled in will have more chances to be assigned to their top choices. After aggregating the choices, a sociometric tool is employed to draw the sociogram and some social indices are measured.

If the remainder of , dummy members will be added and then grouping is performed using GA. Otherwise, GA is directly employed to group members. Adding dummy members does not influence the optimization since doing this only adds a constant value in the objective function. Note that the optimization can be done by different priority weights or the same priority weights. A higher value of the priority weight means that a member has a higher priority to be assigned his/her first choice. In the case of the same priority weight, all members have the same chance to be assigned to their top choices.

##### 4.2. The GA Structure

The GA flowchart is illustrated in Figure 4. The details are depicted in the following paragraphs.

###### 4.2.1. Encoding

The encoding of a chromosome is illustrated in Figure 5. Since there are members, the number of genes is thus equal to . Each gene is assigned a number which stands for a member. As illustrated in Figure 5, the values of the genes are in the order of , and 4. If , that is, a subgroup has two members, then member 5 and member 2 are at the same subgroup, member 6 and member 10 are at the same subgroup, and the like.

###### 4.2.2. Initialization of Population

The random method was employed to generate the initial solutions. Before producing the initial solutions, a mutual choice table was established. Table 2 illustrates the mutual choice. A member (say the 1st member) is first randomly selected, and the program checks with whom they choose mutually. Then from the mutual choice table, possible partners are 2, 6, and 7. Of these three possible partners, one is randomly selected to be a partner of the 1st member.

###### 4.2.3. Evaluation of Fitness Function

To evaluate the chromosomes, the fitness value for each chromosome in the population was computed. The function used to measure the fitness is
A lower fitness value represents a better grouping, and the optimal fitness value is kept. After producing new chromosomes, we can evaluate the chromosomes based on the members’ preferences. The fitness function contains , the scoring function and , where represents the priority weight of student , represents the preference indicated by member to partner , and the like*. * If a member is grouped with a partner which is not in his/her preference list, the GA program will give a sufficiently large value *. *As for the scoring function , a common used function is the squared function [29]. For example, if a squared scoring function is employed, , and a subgroup is composed of three members whose partners are their 2nd and 3rd, 1st, and 3rd, and 2nd and 2nd choices, respectively, then the total score is . After computing the fitness of a subgroup , the fitness of the chromosome is calculated as

###### 4.2.4. Selection

The Roulette Wheel Method [25] was employed to select fitter individuals. The fitness values of all the chromosomes in the population were first calculated and were then sorted. A fitter chromosome with a lower value of fitness has a higher probability to be selected. For example, if there are 50 chromosomes in the population. The chromosome with the lowest fitness value has a probability of to be selected, the second lowest has a probability of to be selected, and the like.

###### 4.2.5. Crossover

Two chromosomes were randomly selected first. The fitness values of these two chromosomes were compared and from the better one some genes were randomly selected and placed at the beginning positions of the offspring chromosomes, as illustrated in Figure 6. The number of the selected genes is the same as the number of members per subgroup, . The rest of the genes in the offspring chromosome are filled in the order of fitter genes by the greedy method.

###### 4.2.6. Mutation

In this study, the swap method was employed to mutate. The method is illustrated in Figure 7. Two genes were randomly selected and then their values were interchanged. The swap method has the advantage of avoiding value duplication. Since a member is exactly assigned one subgroup, the values in the genes should be different.

###### 4.2.7. Elitism Strategy

In order to preserve the best chromosome in every generation, a simple elitism strategy [23] was employed. The best chromosome of each generation was duplicated to the next generation to ensure that it was preserved in the present population if it was the best when compared with other chromosomes of the population. This strategy assures that the best chromosome of each generation will be, even if not better, at least equal to the best chromosome of the previous generation. In addition, the elitism strategy did not lead the GA to converge prematurely since it was applied only to one chromosome of each generation.

###### 4.2.8. Termination Condition

The termination condition we use in this study is the generation number defined by the user. The calculation will be repeated until the number of generation reaches the preassigned value. Once the termination criterion is satisfied, the solution is displayed with a chart. We can clearly know the result from the chart.

#### 5. Results and Discussion

To numerically investigate the influences of grouping parameters such as the number of member per subgroup, the choice number, and the number of member in a group on the grouping results, a GA program was developed by Microsoft.NET 4.0. The program was run on an ASUS K52Jc Series notebook with Intel(R) Core (TM) i5 CPU M460 @2.53 GHz and 1.86 GB RAM. The operating system is Windows 7.

To evaluate the effectiveness of the proposed approach, three real datasets including three classes of undergraduate students were tested. The numbers of students are 54, 55, and 32, respectively, as shown in Table 3. For the dataset C, the method of grouping at the first time was by random and the second is by CSGA. A comparison can thus be made between these two grouping methods. For the dataset A, a maximum allowed number of choices, , was set to be 9 so that the complexity of relationship network can be observed. Students were asked to show their preferences to the dining partners in dataset A and learning partners in datasets B and C.

For easy description, the following variables are defined. The population size is represented as , the generation number as , the crossover rate as , and the mutation rate as . In addition, we use as the minimal fitness value in each generation, as the best solution at each trial. In each case, the GA program performs 30 trials. We designate the best result of 30 trials as OPT, their average value as AVG, and the coefficient of variation as .

##### 5.1. Validation of GA

To ensure the validation of the GA program, some tests were made. The results from GA were compared with those from the Branch and Bound (BB) method that can obtain optimal solutions. The results are shown in Table 4. When the number of members is small, the GA program can obtain optimal solutions that are the same as those from BB, indicating that the GA program is valid. As increases, the BB method requires a huge number of computation time. For example, for and , the required time is about 1100 minutes or 18.3 hours. However, the average runtime for GA is about 1.64 minutes, much smaller than that for BB shown in Table 4.

##### 5.2. The Influences of Genetic Parameters

To evaluate the influences of genetic parameters on the results, a two-stage approach was used. The generation number and population size were decided at the first stage, while at the second stage crossover rate and mutation rate were decided. The variation of minimal fitness value at each generation with different population sizes, , is shown in Figure 8. As we can see from the figure , can obtain good results. At the second stage, the mutation rate as well as the crossover rate were changed and tested. Experimental results show that the best parameters are and . In the following experiments, therefore, and were used.

##### 5.3. The Influences of Choice Number

To easily observe the solutions, the sociograms (relationship structure) are drawn and shown in Figure 9. Note that only mutual-choice relationships are shown in the figure. A female member is represented with a circle, while a male member is represented with a square. From this figure we can see that the relationship network is quite complicated, especially for those with a large value of . An optimal grouping is hard to be achieved by intuition or exact search methods.

**(a)**

**(b)**

##### 5.4. The Influences of the Number of Members in a Subgroup

As the number of members in a subgroup increases, the computation time will also increase. To observe the influences of on the results, was changed from two to five. Note that in Table 5 the number of choice was fixed at 5, whereas in Table 6 the number of choice was set to be equal to .

##### 5.5. The Influences of the Number of Members in an Organization

As the number of members in a group increases, the complexity of grouping also increases. An apparent influence is the increase of the computation time, as illustrated in Table 7. Note that the computation time increases linearly with , showing the time efficiency of the GA program. Originally, the computation time grows exponentially. Consequently, GA is an efficient approach to solve grouping optimization problems considering social relationships.

##### 5.6. Homogeneous versus Heterogeneous Grouping

A good grouping approach should be able to meet the requirements of a group, such as a homogeneous or heterogeneous grouping. To investigate the effectiveness of the proposed approach, experiments were performed with different priority weights, which were based on . The result is illustrated in Figure 10. Members grouped into the same subgroup are circled. A solid line between two members means that members choose mutually. A dashed line with an arrow, on the other hand, represents a one-way choice, where the arrow-headed direction is the choice direction. The grouping can be done with different priority weights based on . For homogeneous grouping, can be used as a priority weight. For heterogeneous grouping, can be sorted and then divided into several parts, with the first part composed of the highest values, the second part composed of the second highest values, and the like. Using the proposed CSGA approach, homogeneous or heterogeneous grouping can be easily performed in fulfilling the requirements of a group.

##### 5.7. The Influences of Scoring Function and Penalty Value

The penalty value was changed to investigate its influences on the grouping results. The results are shown in Table 8. From the table we see that the effect of penalty value is not very apparent. As for the effect of the scoring function, a squared function converges sooner than a linear function, as illustrated in Figure 11.

##### 5.8. Overall Satisfaction and Repeat Intention

A survey was conducted to understand the overall satisfaction and repeat intention of using the proposed approach after the learning partner groups were decided. A typical five-level Likert item was employed. The question is “In overall, are you satisfied with the method?” and the format of the item is “very satisfied, satisfied, neither satisfied nor dissatisfied, dissatisfied and very dissatisfied.” Most of the students are satisfied or very satisfied with the new method. The average points are 4.39 and 4.28, respectively, in dataset B and dataset C, indicating that the students have high satisfaction with the proposed method (see Table 9). As for repeat intention, the question is “Do you agree to use the new method to select learning partners next time?” and anchored by “strongly agree,” “agree,” “neither agree nor disagree,” “disagree,” and “strongly disagree.” The average points of this question are 4.52 and 4.09, respectively. The high values indicate that the repeat intention of using the proposed method is also very high. The high overall satisfaction and the high repeat intention, in part, reflect the effectiveness of the present approach.

A paired sample *t*-test was performed on dataset C to see if there is no difference between these two methods.

*Hypothesis 1. *There is no difference in satisfaction between using the CSGA approach and using the random method.

The result shows that the overall satisfaction of using the proposed CSGA approach is significantly higher than using the random method, as shown in Tables 10–12.

#### 6. Conclusions

In this paper, we have employed a novel approach to group members based on their social relationships. Members are first asked to show their preferences (choices) to partners with whom they would like to be together in an index system, where a one stands for the first choice, two for the second choice, and so on up to a preassigned number. Subsequently, the choices of members are aggregated and a combined sociometry and genetic algorithm (CSGA) scheme is employed to optimize the grouping. To investigate the effectiveness of the proposed approach, three real datasets collected from a university were used. In addition, a nonlinear model is developed to calculate the social scores of individuals. Experimental results show that CSGA can optimize the grouping effectively and efficiently and students are very satisfied with the grouping results and show high repeat intention of using it. Moreover, a paired sample -test shows that the overall satisfaction on using the proposed CSGA approach is significantly higher than using the random method.

For further studies, it is recommended to apply the combined sociometry and genetic algorithm approach to solve other kinds of grouping optimization problems. Methods of rating or paired comparisons as well as peer nominations can be used to measure social relationships.

#### Acknowledgments

The author wishes to express appreciation to the reviewers for their valuable comments and suggestions on this paper. Thanks are also extended to both Ms. Jui-Hui Chen, at the Department of Social and Public Affairs, Taipei Municipal University of Education, and Professor Tung-Shou Chen, at the Department of Computer Science and Information Engineering, National Taichung Institute of Technology, Taiwan, for their suggestions on the paper. Appreciation is also extended to Lindy Chen, Gordon Lee, Jyun-You Fan, Shu-Ping Suen, and Jyun-Yang Li for their help during the course of writing this paper. This work was supported by the National Science Council under Grant no. 100-2221-E-025-016.