#### Abstract

In traditional Web-based learning systems, due to insufficient learning behaviors analysis and personalized study guides, a few user clustering algorithms are introduced. While analyzing the behaviors with these algorithms, researchers generally focus on continuous data but easily neglect discrete data, each of which is generated from online learning actions. Moreover, there are implicit coupled interactions among the data but are frequently ignored in the introduced algorithms. Therefore, a mass of significant information which can positively affect clustering accuracy is neglected. To solve the above issues, we proposed a coupled user clustering algorithm for Wed-based learning systems by taking into account both discrete and continuous data, as well as intracoupled and intercoupled interactions of the data. The experiment result in this paper demonstrates the outperformance of the proposed algorithm.

#### 1. Introduction

Information technology and data mining have brought great changes to education field. Web-based learning is a significant and advanced type of education, which utilizes computer network technology, multimedia digital technology, database technology, and other modern information technologies to learn in digital environment.

At present, many education institutions and researchers commence the study of Web-based learning systems. They mainly study the systems’ composition, the construction of a learning mode, the design and development of hardware, relevant supportive policies and services, and so forth. Meanwhile, an increasing number of Web-based learning systems develop rapidly, for instance, online study communities and virtual schools [1]. MOOCs (Massive Open Online Courses) are open online study platforms which provide free courses to students. It was initiated by America’s top universities in 2012 and had a participation of more than 6 million students from around 220 countries, within a year [2]. In these systems, all learners received same learning resources but no customized or personalized learning services. They are short of analysis on learners’ behaviors and individual features; thus, scientific guidance and help is necessarily needed. In addition, there is a mass of learning resources in the systems, which leads to a big challenge, how to tease out the most wanted and suitable resource.

User clustering can dig out hidden information from a large amount of data. By clustering users in different ways, Web-based learning systems can provide personalized learning guides and learning resources recommendation to learners. This can greatly improve learning efficiency in these systems.

Recently, there have been some cases of applying user clustering algorithms in Web-based learning systems. In order to choose suitable learning method, clustering was addressed [3]. Lin et al. proposed the kernel intuitionistic fuzzy -means clustering (KIFCM) and applied it in e-learning customer analysis [4]. Another clustering approach applied in detecting learners’ behavioral patterns to support individual and group-based collaborative learning was put forward by Köck and Paramythis [5].

All the above methods combine traditional clustering algorithms and apply them in Web-based learning systems, where learners’ attributes information is extracted consequently through analyzing their learning behaviors and finally utilized for user clustering. Most of these attributes are in the category of continuous data. From learners’ behavioral information, we can easily find quite a lot of continuous data, such as “total time length of learning resources” and “comprehensive test result.” In contrast, there also exist attributes data with categorical features which is easily neglected, like “chosen lecturer,” “chosen learning resource type,” and so forth. Although this kind of data is a smaller component of learning behavior information, it also plays a significant role in learner clustering.

In addition, the mixed data of discrete and continuous data, which is extracted from learning behaviors in Wed-based learning systems, are interrelated. There are implicit coupling relationships among them. Clustering is often ignored by the traditional clustering algorithms, which leads to massive significant information loss during the process of similarity computation and user clustering. Consequently, the quality of relevant services provided, like learning guides and learning resources recommendation, is not satisfactory. For example, we have the common sense that “total time length of learning resources” has positive impact on “comprehensive test result.” Generally, if the “total time length of learning resources” is longer, the “comprehensive test result” is better. However, there are also some special groups of students who behave differently. They can either get better “comprehensive test result” with shorter “total time length of learning resources” or worse “comprehensive test result” with longer “total time length of learning resources.” The special correlation between attributes, which is often ignored, is considered in user clustering of our approach. This will lead to certain effect on user clustering accuracy but will not lead to guaranteeing that all users can get highly qualified personalized services easily. An effect mechanism is needed to respond to the loss of the ignored information.

To solve the above issues, this paper proposed a coupled user clustering algorithm based on mixed data, namely, CUCA-MD. This algorithm is based on the truth that both discrete and continuous data exist in learning behavior information; it, respectively, analyzes them according to their different features. In the analysis, CUCA-MD fully takes into account intracoupling and intercoupling relationships and builds user similarity matrixes, respectively, for discrete attribute and continuous attributes. Ultimately we get the integrated similarity matrix using weighted summation and implement user clustering with the help of spectral clustering algorithm. In this way, we take full advantage of the mixed data generated from learning actions in Web-based learning systems. Meanwhile, the algorithm well considers the correlation and coupling relationships of attributes, which enables us to find interactions between users, especially users of previously mentioned special groups. Consequently it can provide suitable and efficient learning guidance and help for users.

The contributions of this algorithm can be summarized from three aspects. Firstly, it takes into account the coupling relationships of attributes in Web-based learning systems, which is frequently neglected before, and improves clustering accuracy. Secondly, it fully considers different features of discrete data and continuous data and builds user similarity matrix based on mixed data. Thirdly, it captures and analyzes individuals’ learning behaviors and provides customized and personalized learning services to different groups of learners.

The rest of the paper is organized as follows. The next section introduces related works. The clustering algorithm model is proposed in Section 3. Section 4 introduces detailed utilization of the clustering algorithm. Discrete and continuous data analysis are also studied in this section. In Section 5, experiments and results analysis are demonstrated. Section 6 concludes this paper.

#### 2. Related Works

Using mixed data to do user clustering has been achieved in some fields but rarely in Web-based learning area. Ahmad and Dey came up with a clustering algorithm based on updated -mean paradigm which overcomes the numeric data only limitation, which works well for data with mixed numeric and categorical features [6]. A -*prototypes* algorithm was proposed, defined as a combined dissimilarity measure, and further integrates the -means which deals with numeric data and -modes algorithm which uses a simple matching dissimilarity measure to deal with categorical objects, to allow for clustering objects described by mixed numeric and categorical attributes [7]. Another automated technique, called* SpectralCAT*, was addressed for unsupervised clustering of high-dimensional data that contains numerical or nominal or mix of attributes, suggesting automatically transforming the high-dimensional input data into categorical values [8].

Recently an increasing number of researchers pay special attention to interactions of object attributes and have been aware that the independence assumption on attributes often leads to a mass of information loss. In addition to the basic Pearson’s correlation [9], Wang et al. addressed intracoupled and intercoupled interactions of continuous attributes [10], while Li et al. proposed an innovative coupled group-based matrix factorization model for discrete attributes of recommender system [11]. An algorithm to detect interactions between attributes was addressed, but it is only applicable in supervised learning with the experimental results [12]. Calders et al. proposed the use of rank based measures to score the similarity of sets of numerical attributes [13]. Bollegala et al. proposed method comprises two stages: learning a lower-dimensional projection between different relations and learning a relational classifier for the target relation type with instance sampling [14]. From all our viewed documents, we hardly find anything of taking into account coupling relationships of user attributes for user clustering in Web-based learning systems.

#### 3. Clustering Model

User clustering model plays a significant role in user evaluation framework [15]. In this section, the coupled user clustering model based on mixed data is illustrated in Figure 1. This model, respectively, takes into account the discrete data and continuous data generated in learning behaviors and incorporates intracoupled and intercoupled relationships of attributes in user clustering. Compared with traditional algorithms, it captures the hidden user interaction information by fully analyzing mixed data, which improves clustering accuracy.

The model is built on the basis of the discrete data and continuous data extracted from learning behaviors. According to their different features, we tease out the corresponding attributes in parallel through analyzing the behaviors. Then intracoupled and intercoupled relationships are introduced into user similarity computation, which helps to get user similarity matrixes, respectively, for discrete attributes and continuous attributes. Finally we use weighted summation to integrate the two matrixes and apply Ng-Jordan-Weiss (NJW) spectral clustering algorithm [16] in user clustering. With the clustering result applied in Web-based learning systems, various personalized services are provided accordingly, such as learning strategy customization, learning tutoring, and learning resources recommendation.

#### 4. Clustering Algorithm

This paper proposed a coupled user clustering algorithm based on mixed data, which is suitable to be applied in education field. It fits for not only user clustering analysis in Web-based learning systems but also corporate training and performance review, as well as other Web-based activities, in which user participation and behavior recording is involved. The implementation of the CUCA-MD in Web-based learning systems is introduced in this section.

##### 4.1. Discrete Data Analysis

Among the data generated from users’ learning behaviors, discrete data plays a significant role in user behavior analysis and user clustering. In the following section, the procedure of how to compute user similarity using discrete data in Web-based learning systems is demonstrated, during which intracoupled similarity within an attribute (i.e., value frequency distribution) and intercoupled similarity between attributes (i.e., feature dependency aggregation) are also considered.

###### 4.1.1. User Learning Behavior Analysis

In Web-based learning systems (https://www.khanacademy.org/), (https://www.coursera.org/), usually various discrete data will be generated during learning process, such as chosen lecturer, chosen learning resource type and chosen examination form, evaluation on lecturer and learning content, main learning time period, and uploading and downloading learning resources. To make it more explicit, we choose “chosen lecturer,” “chosen learning resource type,” and “chosen examination form” as the attributes for later analysis, respectively, denoted by , , and ; we also choose 5 students as the study objects, denoted by , , , , and . The objects and their attributes values are shown in Table 1. Thus, we discuss the similarity of categorical values by considering data characteristics. Two attribute values are similar if they present analogous frequency distributions for one attribute [17]; this reflects the intracoupled similarity within a feature. In Table 1, for example, “Wang,” “Liu,” and “Zhao” are similar because each of them appears once. However, the reality is “Wang” and “Liu” are more similar than “Wang” and “Zhao,” because the “chosen learning resource type” and “chosen examination form” of lecturer “Wang” and “Liu” are identical. If we need to recommend a lecturer to students who like Wang’s lectures more, we will prefer Liu instead of Zhao because Liu’s lecture is more easily accepted by students. It indicates that the similarity between “chosen lecturer” should also cater for the dependencies on other features such as “chosen learning resources” and “chosen examination form” over all objects, namely, the intercoupled similarity between attributes.

###### 4.1.2. Intracoupled and Intercoupled Representation

Data objects with features can be organized by the information table , where is composed of a nonempty finite number of users, is a finite set of discrete attributes, is a set of all attribute values, is a set of attribute values of the th attribute, namely, , and , is an information function which assigns a particular value of each feature to every user. We take Table 2 as an example to explicitly represent intracoupled and intercoupled similarity of discrete attributes.

To analyze intracoupled and intercoupled correlation of user attributes, we define a few basic concepts as follows.

*Definition 1. *Given an information table , 3 set information functions (SIFs) are defined as follows:where is the mapping function of user set to attribute values, is mapping function of attribute values to user, and is mapping function of attribute value set to user. , , and denotes the th attribute.

These SIFs describe the relationships between objects and attribute values from different levels. For example, and for value , while if given that .

*Definition 2. *Given an information table , an Interinformation Function (IIF) is defined asThis IIF is the composition of and . It obtains the th attribute value subset for the corresponding objects, which are derived from th attribute value . For example, .

*Definition 3. *Given an information table , the th attribute value subset , and the th attribute value , the Information Conditional Probability (ICP) of with respect to is defined asWhen given all the objects with the th attribute value , ICP means the percentage of users whose th attributes fall in subset and th attribute value is as well. For example, .

Intracoupled and intercoupled similarity of attributes are, respectively, introduced as follows.

*Intracoupled Interaction*. Based on [9], intracoupled similarity is decided by attribute value occurrence times in terms of frequency distribution. When we calculate an attribute’s intracoupled similarity, we consider the relationship between attribute value frequencies on one feature, demonstrated as follows.

*Definition 4. *Given an information table , the Intracoupled Attribute Value Similarity (IaAVS) between attribute values and of features is defined asGreater similarity is assigned to the attribute value pair which owns approximately equal frequencies. The higher these frequencies are, the closer such two values are. For example, .

*Intercoupled Interaction*. We have considered the intracoupled similarity, that is, the interaction of attribute values within one feature . However, this does not cover interaction between different attributes, namely, and .

Cost and Salzberg [18] came up with a method which is for measuring the overall similarities of classification of all objects on each possible value of each feature. If attributes values occur with the same relative frequency for all classifications, they are identified as being similar. This interaction between features in terms of cooccurrence is taken as intercoupled similarity.

*Definition 5. *Given an information table , the intercoupled relative similarity based on Intersection Set (IRSI) between different values and of feature regarding another feature is formalized aswhere denote , respectively.

The value subset is replaced with , which is considered to simplify computation.

With (5), for example, the calculation of is much simplified since only ; then we can easily get . Thus, this method is quite efficient in reducing intracoupled relative similarity complexity.

*Definition 6. *Given an information table , the Intercoupled Attribute Value Similarity (IeAVS) between attribute values and of feature is defined aswhere is the weight parameter for feature , , , and is one of the intercoupled relative similarity candidates.

In Table 2, for example, if is taken with equal weight.

###### 4.1.3. Integrated Coupling Representation

Coupled Attribute Value Similarity (CAVS) is proposed in terms of both intracoupled and intercoupled value similarities. For example, the coupled interaction between and covers both the intracoupled relationship specified by the occurrence times of values and , 2 and 2, and the intercoupled interaction triggered by the other two features, and .

*Definition 7. *Given an information table , the Coupled Attribute Value Similarity (CAVS) between attribute values and of feature is defined aswhere and are IaAVS and IeAVS, respectively.

In Table 2, for instance, CAVS is obtained as .

With the specification of IaAVS and IeAVS, a coupled similarity between objects is built based on CAVS. Then we sum all CAVSs analogous to the construction of Manhattan dissimilarity [9]. Formally, we have the following definition.

*Definition 8. *Given an information table , the Coupled User Similarity (CUS) between users and is defined aswhere is the weight parameter of attribute , , , and are the attribute values of feature for and , respectively, and and .

In Table 2, for example, = = if is taken with equal weight.

In this way, a user similarity matrix of entries regarding discrete data can be built aswhere , .

For instance, we get a user similarity matrix of entries regarding discrete data based on Table 2, as

##### 4.2. Continuous Data Analysis

Continuous data is with different features when compared with discrete date. In the following section, user similarity computation is demonstrated using Taylor-like expansion, with the involvement of intracoupled interaction within an attribute (i.e., the correlations between attributes and their own powers) and intercoupled interaction among different attributes (i.e., the correlations between attributes and the powers of others).

###### 4.2.1. User Learning Behavior Analysis

After students log onto a Web-based learning system, the system will record their activity information, such as times of doing homework and number of learning resources. This paper refers to a Web-based personalized user evaluation model [19] and utilizes its evaluation index system to extract students’ continuous attributes information. This index system, as shown in Table 3, is based on evaluation standards of America (kindergarten through twelfth grade) [20] and Delphi method [21], which is a hierarchical structure built according to mass of information and data generated during general e-learning activities. It is defined with 20 indicators and can comprehensively represent the students’ attributes. Due to the different units used for measuring extracted attributes, like times, time length, amount, percentage, and so forth, we need to normalize them firstly: result is shown in Table 4.

###### 4.2.2. Intracoupled and Intercoupled Representation

In this section, intracoupled and intercoupled relationships of above extracted continuous attributes are, respectively, represented. Here we use an example to make it more explicate. We single out 6 attributes data with continuous feature of the same 5 students mentioned in Section 4.1.1, including “average correct rate of homework,” “times of doing homework,” “number of learning resources,” “total time length of learning resources,” “daily average quiz result,” and “comprehensive test result,” denoted by , , , , , and in the following representations, shown in Table 4.

Here we use an information table to represent user attributes information. means a finite set of users; refers to a finite set of continuous attributes; represents all attributes value sets; is the value set of the th attribute; , is the function for calculating a certain attribute value. For example, the information in Table 4 contains 5 users and 6 attributes ; the first attribute value of is .

The usual way to calculate the interactions between 2 attributes is Pearson’s correlation coefficient [9]. For instance, the Pearson’s correlation coefficient between and is formalized aswhere and are, respectively, mean values of and .

However, the Pearson’s correlation coefficient only fits for linear relationship. It is far from sufficient to fully capture pairwise attributes interactions. Therefore we expect to use more dimensions to expand the numerical space spanned by and then expose attributes coupling relationship by exploring updated attributes interactions [22].

Firstly, we use a few additional attributes to expand interaction space in the original information table. Hence, there are attributes for each original attribute , including itself, namely, . Each attribute value is the power of the attribute; for instance, is the third power of attribute and is the th power of . In Table 4, the denotation and are equivalent; the value of is the square of value. For simplicity, we set in Table 5.

Secondly, the correlation between pairwise attributes is calculated. It captures both local and global coupling relations. We take the values for testing the hypotheses of no correlation between attributes into account. value here means the probability of getting a correlation as large as possible observed by random chance, while the true correlation is zero. If value is smaller than 0.05, the correlation is significant. The updated correlation coefficient is as follows:

Here we do not consider all relationships but only take the significant coupling relationships into account, because all relationships involvement may cause the overfitting issue on modeling coupling relationship. This issue will go against the attribute inherent interaction mechanism. So based on the updated correlation, the intracoupled and intercoupled interaction of attributes are proposed. Intracoupled interaction is the relationship between and all its powers; intercoupled interaction is the relationship between and powers of the rest of the attributes .

*Intracoupled Interaction*. The intracoupled interaction within an attribute is represented as a matrix. For attribute , it is matrix . In the matrix, is the correlation between and . Considerwhere is the Pearson’s correlation coefficient between and .

For attribute in Table 5, we can get the intracoupled interaction within it as , which means that the correlation coefficient between attribute “average correct rate of homework” and its second power is as high as 0.986. There is close relationship between them.

*Intercoupled Interaction*. The intercoupled interaction between attribute and other attributes is quantified as matrix as

Here refers to all the attributes except for , and is the correlation coefficient between and .

For attribute in Table 5, the intercoupled interaction between and others is calculated as

The values between and others are calculated as

Based on the result, we can find that there is hidden correlation between user attributes. For instance, all the values between attributes and are larger than 0.05, so the correlation coefficient is 0 based on (12), indicating there is no significant correlation between “average correct rate of homework” and “times of doing homework.” Meanwhile, the correlation coefficient between and and and is quite close to 1; it indicates that “daily average quiz result” and “comprehensive test result” have close relationship, respectively, with “average correct rate of homework,” which is consistent with our practical experiences. In conclusion, comprehensively taking into account intracoupled and intercoupled correlation of attributes can efficiently help capturing coupling relationships between user attributes.

###### 4.2.3. Integrated Coupling Representation

Intracoupled and intercoupled interactions are integrated in this section as a coupled representation scheme.

In Table 5, each user is signified by updated variables . With the updated function , the corresponding value of attribute is assigned to user . Attribute and all its powers are signified as , while the rest of the attributes and all powers are presented in another vector . For instance, in Table 5, , = .

*Definition 9. *The coupled representation of attribute is formalized as vector , where component corresponds to the updated attribute . One haswhere is a constant vector, is a vector concatenated by constant vectors . denotes the Hadamard product, and represents the matrix multiplication.

Taking an example in Table 6, the coupled representation for attribute is presented as . The reason why we choose such a representation method is explained below. If (17) is expanded, for example, we get the element which corresponds to of the vector as below, which resembles Taylor-like expansion of functions [23]

Finally we obtained the global coupled representation of all the original attributes as a concatenated vector:

Incorporated with the couplings of attributes, each user is represented as vector. When all the users follow the steps above, we then obtain coupled information table. For example, based on Table 4, the coupled information table shown in Table 6 is the new representation.

With the new user attributes information of the coupled information table, we utilize the formula below [16] to compute user similarity and build a matrix of entrieswhere , , and denotes scaling parameter. Here we take . Detailed parameter estimation procedure is introduced in experiment 5.2

For instance, we get a user similarity matrix of entries regarding continuous data based on Table 4, as

##### 4.3. User Clustering

In Sections 4.1 and 4.2, we get separate user similarity matrix regarding discrete attributes and regarding continuous attributes. With weighted summation, an integrated matrix of entries based on mixed data can be obtained aswhere and are the respective weights of discrete attributes and continuous attributes. , , and denotes the number of former attributes while denotes that of the latter ones.

For example, in the former examples we listed 3 discrete attributes and 6 continuous attributes; then and . For users of , , , , and , the user similarity matrix based on the mixed data is obtained as follows:

With consideration of intracoupled and intercoupled correlation of user attributes, we get the user similarity matrix based on mixed learning behavior data. Next, with NJW spectral clustering, user clustering procedure is described. Detailed clustering result is demonstrated in experiments part.

#### 5. Experiments and Evaluation

We conducted experiments and user studies using the coupled user clustering algorithm proposed in this paper. The data for the experiments are collected from a Web-based learning system of China Educational Television (CETV), named “New Media Learning Resource Platform for National Education” (http://www.guoshi.com/). As a basic platform for national lifelong education, which started the earliest in China, has the largest group of users and provides most extensive learning resources, it meets the needs of personalization and diversity of different users through integrating a variety of fusion network, terminals, and resources. So far, the number of registered users has reached more than two million. Experiments are carried out to verify the algorithm’s validity and accuracy. The experiment is composed of four parts: user study, parameter estimation, user clustering, and result analysis.

##### 5.1. User Study

In the experiment, we asked 180 users (indicated by ) to learn Data Structures online. The whole learning process, including the recording and analysis of learning activities information, was accomplished in CETV mentioned above.

Recently public data sets regarding learners’ learning behaviors in online learning systems are insufficient, and most of them do not contain labeled user clustering information. Meanwhile, because learners’ behaviors are always with certain subjectivity, the accuracy of labeling learners with different classifiers only based on behaviors but without knowing the information behind is not full. Therefore, we adopt a few user studies, directly and, respectively, collecting relevant user similarity data from students and teachers, as the basis for verifying the accuracy of learners clustering in Web-based learning systems.

Through analyzing the continuous attributes extracted from Table 3 according to user evaluation index system, we can easily find that they can be mainly classified to two kinds: one kind of attributes reflecting learners’ learning attitude, like “times of doing homework,” “number of learning resources,” and “total time length of learning resources”; the other kind of attributes reflecting learners’ learning effect, like “average correct rate of homework,” “daily average quiz result,” and “comprehensive test result.” Meanwhile, we also analyze attributes with categorical features, for example, “chosen lecturer,” “chosen learning resource type,” and “chosen examination form,” which all reflect learners’ learning preferences. Therefore, we ask the students and teachers together to comprehensively estimate students’ similarity, respectively, from three perspectives, which are learning attitude, learning effect, and learning preference. We request each of the 180 students to choose the top 5 of other students who are mostly like himself and 5 who are hardly like himself, taking into account the three perspectives; each of the lecturers who are giving the lesson of data structure also makes options for the same request for every student in his class. For instance, student chooses the lesson of lecturer* Liu* and the options, respectively, made by them are shown in Table 7.

##### 5.2. Parameter Estimation

As indicated in (20), the proposed coupled representation for numerical objects is strongly dependent on the maximal power . Here, we conduct several experiments to study the performance of with regard to the clustering accuracy of CUCA-MD. The maximal power is set to range from to since becomes extremely large when grows, which means is probably large enough to obtain most of the information in (20). The experiment verifies that, with the increasing value of , the clustering accuracy goes higher. When , it reaches a stable point for accuracy change; when , compared with the former, there is only very tiny improvement of accuracy. Therefore, with the precondition for experiment accuracy, we take , reducing the algorithm complexity as much as possible.

We keep adjusting the number of clusters with a large number of experiments. Finally we take the number as , considering the user features in online learning systems. Besides, (22) is needed when computing user similarity using continuous data. The scaling parameter of the equation should be set manually, so we test different values to get different clustering accuracy and then pick up the optimal one. In Figure 2, the relation of values and clustering accuracy is illustrated. When , the clustering accuracy is the best; when , the clustering results stay in a comparatively stable range with no much difference in between. Thus, we take .

##### 5.3. User Clustering

In the following experiments, we take use of the 20 continuous user attributes in Table 3 and the 8 discrete attributes (chosen lecturer, chosen learning resource type, chosen examination form, learning time, evaluation of lecturer, evaluation of learning resources, upload, and download) to do user clustering. Because the procedure of recording and analyzing users’ learning behaviors is persistent, we divide the learning process to six phases, namely, 5 h, 10 h, 15 h, 20 h, 25 h, and 30 h. Then with the data of different phase, we do user clustering with different algorithm.

To verify the efficiency of CUCA-MD in user clustering in Web-based learning systems, we compare it with three other algorithms, which are also based on mixed data, namely, -*prototype* [7],* mADD* [6], and* spectralCAT* [8]. Besides, to demonstrate the significance of learning behavior with both categorical feature and continuous feature, as well as the different effect of clustering with and without coupling, we take six different methods to, respectively, do clustering. The first one is Simple Matching Similarity (*SMS*, which only uses 0s and 1s to distinguish similarities between distinct and identical categorical values) [9], used to analyze users’ discrete attributes and compute user similarity and then applied in user clustering with the help of NJW. This method is named NJW-DD. The second one is described in Section 4.1, which analyzes users’ discrete attributes considering intracoupled and intercoupled relationships and then computes user similarity and does user clustering combined with NJW algorithm. This method is called CUCA-DD. The third one is to get clustering result through analyzing continuous attributes and utilizing NJW, namely, NJW-CD. However, the fourth one takes advantage of users’ continuous attributes and their intracoupled and intercoupled correlation to get the user similarity and then, with the help of NJW, get the user clustering result. It is introduced in Section 4.2 already, named CUCA-CD. The fifth method is utilizing NJW to do user clustering based on both discrete and continuous attributes but without considering intracoupled and intercoupled correlation of attributes. It is named NJW-MD. The sixth is the one proposed in this paper, CUCA-MD.

With the clustering result, we use statistics to make an analysis. In Table 7, taking student , for example, 4 of the “top most similar” students chosen by stay in the same cluster with himself which indicates the clustering accuracy is 80%; thus, we take it as “similarity accuracy.” In contrast, none of the “top least similar” students chosen by stays in the same cluster with him, indicating the clustering accuracy is 100%; thus, we take it as “dissimilarity accuracy.” In the same way, we also get two accuracy values based on the options made by ’s lecturer* Mr. Liu*, 80% and 80%. Taking the weight of ’s option as 55% while taking* Mr. Liu*’s as 45%, we get the comprehensive accuracy values of 80% and 91%, using weighted summation. In this way, we get a pair of accuracy values for each of the 180 students, verifying the efficiency of the different clustering methods.

##### 5.4. Result Analysis

We do comparison analysis on the clustering results using user similarity accuracy and user dissimilarity accuracy. Figure 3 illustrates the clustering accuracy comparison of CUCA-MD and other algorithms, when the average learning length of the 180 students is 30 h. From Figure 3(a), we can easily find that the accuracy of CUCA-MD is higher than that of others regarding mixed data, no matter on similarity or dissimilarity. In Figure 3(b), the clustering results of NJW-DD, CUCA-DD, NJW-CD, CUCA-CD, NJW-MD, and CUCA-MD 6 are demonstrated and compared. We observe that both of the similarity accuracy and dissimilarity accuracy of NJW-DD are the lowest, respectively, 26.2% and 32.3%, when compared with others, while those of CUCA-MD are the highest. Meanwhile, algorithms considering coupling relationship have higher accuracy than NJW which does not consider it, regardless of discrete data, continuous data, or mixed data. The comprehensive results above verify the outperformance of algorithms which considers coupling relations of attributes; they can efficiently capture the hidden information of behavior data and greatly improve clustering accuracy.

**(a)**

**(b)**

The collection and analysis of learning behaviors are a persistent action, so we illustrate the relationship between average learning length and user clustering accuracy. From Figures 4(a) and 4(b), we can see that, with the extension of learning time, the clustering accuracies of the algorithms based on mixed data become higher, among which CUCA-MD grows the fastest, especially after 20 h. Figures 4(c) and 4(d) show that all accuracies of the algorithms grow, except for NJW-DD. At the same time, algorithm CUCA which considers coupling relationships grows faster than NJW which does not. With the result above, we make the conclusion that is based on only a few attributes and little extracted information: the clustering accuracy of NJW-DD regarding discrete data is not improved much, even with more time and behavior data, while the clustering accuracy of CUCA-MD regarding mixed data, which considers coupling relationship of attributes, is distinctly improved with the increase of behavior data.

**(a)**

**(b)**

**(c)**

**(d)**

Besides, we can verify clustering accuracy through analyzing the structure of user clustering results. The best performance of a clustering algorithm is reaching the smallest distance within a cluster but the biggest distance between clusters; thus, we utilize the evaluation criteria of Relative Distance (the ratio of average intercluster distance upon average intracluster distance) and Sum Distance (the sum of object distances within all the clusters) to present the distance. The larger Relative Distance is and the smaller Sum Distance is, the better clustering results are. From Figure 5, we can see that the Relative Distance of CUCA-MD is larger than that of the other algorithms, while the Sum Distance of CUCA-MD is smaller. It indicates that CUCA-MD regarding mixed data, which also considers coupling relationships, outperforms the rest in terms of clustering structure.

**(a)**

**(b)**

**(c)**

**(d)**

#### 6. Conclusion

We proposed a coupled user clustering algorithm based on Mixed Data for Web-based Learning Systems (CUCA-MD) in this paper, which incorporates intracoupled and intercoupled correlation of user attributes with different features. This algorithm is based on the truth that both discrete and continuous data exist in learning behavior information; it, respectively, analyzes them according to different features. In the analysis, CUCA-MD fully takes into account intracoupling and intercoupling relationships and builds user similarity matrixes, respectively, for discrete attribute and continuous attributes. Ultimately we get the integrated similarity matrix using weighted summation and implement user clustering with the help of spectral clustering algorithm. In experiment part, we verify the outperformance of proposed CUCA-MD in terms of user clustering in Web-based learning systems, through user study, parameter estimation, user clustering, and result analysis.

In this paper, we analyze discrete data and continuous data generated in online learning systems with different methods and build user similarity matrixes regarding attributes with discrete and continuous features, respectively, which makes the algorithm more complicated. In the following studies, we hope to realize the simultaneous processing continuous data and discrete data, while taking into account coupling correlation of user attributes, which will definitely further improve algorithm efficiency.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work is supported by the National Natural Science Foundation of China (Project no. 61370137), the National 973 Project of China (no. 2012CB720702), and Major Science and Technology Project of Press and Publication (no. GAPP_ZDKJ_BQ/01).