With the development of mobile platform, such as smart cellphone and pad, the E-Learning model has been rapidly developed. However, due to the low completion rate for E-Learning platform, it is very necessary to analyze the behavior characteristics of online learners to intelligently adjust online education strategy and enhance the quality of learning. In this paper, we analyzed the relation indicators of E-Learning to build the student profile and gave countermeasures. Adopting the similarity computation and Jaccard coefficient algorithm, we designed a system model to clean and dig into the educational data and also the students’ learning attitude and the duration of learning behavior to establish student profile. According to the E-Learning resources and learner behaviors, we also present the intelligent guide model to guide both E-Learning platform and learners to improve learning things. The study on student profile can help the E-Learning platform to meet and guide the students’ learning behavior deeply and also to provide personalized learning situation and promote the optimization of the E-Learning.

1. Introduction

As an effective way for education, the E-Learning supported more knowledge and skills than the traditional education and also is beyond the restriction of time and space based on new information and communication technologies [1]. MOOC (massive open online courses) is a representative online education platform. And the Coursera is the largest MOOC platform in the world, established by the USA top universities network learning platform. At present, there are about 1563 courses and more than 17 million registered students on the platform. Based on edX, the largest MOOC platform (http://www.xuetangx.com) is developed by Tsinghua University, in China. It has about 3 million members which are from more than 200 countries and regions [2].

E-Learning education has had a rapid development. Figure 1 shows that the number of Chinese E-Learning users reached 90.992 million in 2015, which is with an annual growth rate of 56%; it will grow to 120 million people by 2017.

However, although more and more people are concerned about the E-Learning platform, there are only 7%–9% learners who completed MOOC’s course according to Coursera statistics data [3]. Therefore, it is very necessary to improve the quality of learning and optimize the teaching mechanism to push the course accurately. The student profile is a novel method to analyze the basic information and learn the behavior of online learners. Through the establishment of student profile, it is to achieve personalized situation construction and learn process guidance, which plays a positive role in promoting online learning.

The student profile is a figure portrait analysis based on the big data and labeling. We collect, process, and analyze the data generated in the learners’ behavior, for an information description of individual students or groups. According to the theory of behavioral psychology, use of the student profile to analyze the data on student behavior can reflect the students’ behavior characteristics and psychodynamics. For example, the Education Big Data Research Institute of UESTC (University of Electronic Science and Technology of China) cooperate with other departments in developing the Student Profile System, which can give an early warning about failing the exam [4]. Similarly, Southwest Jiaotong University collected and analyzed the big data drawn students on campus “behavior track” model to predict students’ future development [5]. In foreign countries, researchers have proved that using the big data to analyze the students’ learning behavior such as reading course information online, submitting homework, and exchanging detected the warning information in poor learning performance. According to this warning information, the teachers made recommendations for improvement and given some guidance, to ensure that the students learn effectively. However, “student profile” has been applied in the education field; the big data profile technology combined with education has a very important practical significance.

In view of the E-Learning data, we use the big data technology to analyze the E-Learning characteristic, and the main research contents of this paper are as follows: () analysis of factors affecting the student profile. The students are classified according to age, and then we define the relations between student behavior and duration; () building a student profile model. We collected and preprocessed data and dug out the connection on the students’ learning behavior attributes by the Jaccard algorithm, to form the student profile finally; () analysis on the student profile. It is to contribute to the E-Learning platform to better understand the learning behavior of students.

2. The Definition of the Student Profile

The student profile described the learning characteristics from multidimensions and multiangle. It includes the analysis indicators and influencing factors, such as student behavior, data collection, data cleaning, and student profile building and analyzing [6].

2.1. Student Definition

The main research on the student profile is the students in the school or E-Learning platform. Assume the student set as follows:where indicates the students classified by age; means the individual; denotes the student age level (for convenience, the superscript is usually omitted as ), and there are 5 types of students classified in different age, which includes less than 17 years old, 18–24 years old, 25–34 years old, 35–54 years old, and over 55 years old, as shown in Table 1.

According to age, we can predict learner profile information and further dig into the characteristics of students learning.

2.2. Definition of Learning Behavior

The online learning behavior is the kinds of learning behavior under the network environment. We focus on digging out the characteristics of learners from online learning behavior after analysis, in order to understand the student’s performance. The core of learning behavior is the operation of online learning behaviors [7, 8]. For example, learner clicks on a course, browses the page, plays the video, and downloads the relevant courseware. The “click” and “download” are the two operations about learning behaviors. The behavior set (behavior) in the student profile is defined aswhere indicates varieties of behavior and includes 12 kinds of learning behaviors, such as learning goal, text learning, online practice, and making notes, as shown in Table 2.

Since the online learning is the period of time process with online learning behavior, it is an important parameter to evaluate the quality of online learning. In particularly, it reflects the degree of focus on learning. The duration set (timeslot) in student profile is defined as follows:where indicates various of durations; it is divided into periods as 1–10 minutes, 10–20 minutes, 20–30 minutes, 30–40 minutes, 40–50 minutes, 50–60 minutes, and so forth.

According to the above definitions, means the behavior of the student . We suppose behavioral differences ; it represents the students’ differences in a certain behavior. For example, suppose college learner , job learner , and online training . We adopt to analyze the differences between college learner and job learner in the behavior of the training online.

3. Student Profile Model

The student profile has a complete model to guide us to analyzing the students’ online learning process. The student profile model (Figure 2) includes data collection, data cleaning, and portrait analysis. Firstly, data collection can obtain the original data by means of E-Learning platform or questionnaire survey. Secondly, we utilize the attribute reduction to clean the original data and then employ the Jaccard coefficient algorithm for data analysis and data mining. Finally, we label the students according to results of the analysis in order to form the student profile. Simultaneously, we build the knowledge base (KB) for storage of knowledge sheet about the student profile [9]. In student profile model, KB is parallel to the data mining level and interaction. From the knowledge base, we take some of the student profile set to dig into and analyze and store the results in the KB. Therefore, the KB of the student profile has the self-growth and self-perfection ability.

3.1. Data Acquisition

Data acquisition includes four categories, such as student user registration data, web log data, learning behavior data, and learning content preference data. The student user registration data is mainly analysis on the characteristics of the learners, including user name, sex, date of birth, geography, occupation, and hobbies. The web log data reflects the operation of E-Learning platform, including active number, page views, access time, activation rate, and learning path. The learning behavior data is helpful for statistics analysis of online learning performance, including learning time, learning activities, learning resources, and examination results. The learning content preference data can be used to analyze the preference of courses or teachers, including browse/collection content, review content, and interactive content. It can be helpful for pushing the course accurately.

3.2. Data Cleaning

Data cleaning preprocesses the original data, removes redundant data, retains the useful data for the analysis, and organizes the data into a standard format. Because the interference of abnormal values often results in data mining distortion [1012], data cleaning improves the accuracy of data analysis and ensures the reliability of data mining.

Attribute induction is the most important process of collecting the data source pretreatment. Suppose the original data field to , where is the dimension of the original data field. Set the vector , where and means desirable property. By the property statute of data preprocessing to give all the desired properties, the attribute induction method is defined as sig, cleaning the data to get the following results:

in which is an important property for the dimension data field. In our solution, we calculate the importance of the property and select the same behavior analysis related to the desired attributes. Our solution does not deal with the concrete implementation of the attribute induction about sig.

4. Student Profile Analysis

In this section, we calculate the similarity in the behavior set of different students, through the Jaccard coefficient similarity algorithm compared with the online behavior characteristics and duration of learners, similar properties classified as a class, and the difference properties classified to different classes.

4.1. Student Behavior Feature Similarity Calculation

Similarity among the behavioral characteristics of different students objects belongs to nonnumeric objects; we adopt Jaccard coefficient calculated similarity [13, 14]. The similarity formula is as follows:where and represent the behavior of students and . Suppose the student belongs to KB; we compare to . If and similarity difference is too large, it will be added to the KB as a new class.

User similarity is defined as

in which indicates the similarity between students and ; indicates the behavior dimension attributes of student set ; represents the similarity about property between students and , and .

According to similarity calculation, we obtain as follows:

It is an upper triangular matrix, where , denoted line and row , and it is the similarity of behavior characteristics between student and student .

Jaccard coefficient algorithm is described as shown in Algorithm 1.

() Input:   is a set of students
() Output:   
() similarity of behavior characteristics
  between student and .
() Dim As Float; similarity matrix  
() Dim , , As int;
() Dim   As float;
() Begin
() .length;
() get the number of Student set
() For    To  
()  For    To  
()   For    To  
()   is the number of behavior types
()    ;
() Return
() End

According to the result of Jaccard coefficient algorithm, we can label the learners. Suppose that the calculation is from two dimensions about learning behavior and duration. The student could be labeled “depth learning type” if learning performance takes more than 60 minutes. Similarly, the student could be labeled “tasted type” if learning performance takes less than 10 minutes. Additionally, based on the frequency of online question and online training, we labeled “inquisitive type,” “application type,” and “perseverance type.” The specific labeling method does not repeat them here. We only proposed Jaccard coefficient algorithm and labeling idea for readers.

4.2. Learning Attitude Analysis

The students’ learning attitude makes a difference to learning effect. We collected data from 18–24-year-old student group and statistically analyzed it, such as whether having the clear learning goal or not and whether having the learning plan or not. As shown in Figure 3, it reflects the learners’ subjective initiative and recognition to E-Learning courses, which contributes to analyzing the interference factors of E-Learning.

In the 18–24-year-old group, there are about 94.3% learners who believed that E-Learning courses are helpful for them. There are 18.47% learners who have the clear learning objectives, and 58.6% learners have clear learning objectives occasionally. This ratio reflects that most students are quite blindly taking the E-Learning course. There are 45.9% learners who have no learning plan, and 55.73% learners are learning online while doing other things, such as QQ chat and listening to music. According to Figure 3, we can conclude that E-Learning course requires definite objective, inner motive, synchronous feedback, and independence of the learners. Since the online learners have much recognition of online courses, the E-Learning platform has the broader prospects for development.

4.3. Duration Analysis of Online Learning Behavior

The learning behavior of online learners is diversity. To a certain extent, the frequency of learning behavior reflects the attention of learners to the learning resources. According to the frequency statistics [15], we analyzed which course resources are more likely to be accepted by the learners as shown in Figure 4.

In Figure 4, behaviors 1–7 are an independent learning behavior. Behaviors 8–11 are an interactive learning behavior. Behaviors 11-12 have nothing to do with learning. Most learners browse the text and make notes frequently; therefore the text resource is the most popular type of resources. About 90% learners will first browse multimedia resources. Only 50% of the learners will participate in online exercises and only 1–5 times. About 60% learners will choose to view the learning objectives before studying the course, and the number of views is generally less than 5 times. 80% learners will rest; listen to music; or QQ chat 1–5 times during the learning process. This shows that most learners have a sense of learning strategies. They are interested in multimedia resources, but they are more used to learning through reading text resources. Online interactive learning behavior is low [16, 17]. Learning in the network easily causes fatigue and is affected by chat and other factors.

4.4. Intelligent Learning Guide
4.4.1. Achievement Predicting

MOOC is a popular E-Learning platform, whose importance is reflected in the pass rate of the course. In view of the low pass rate on MOOC [15, 18], it is supposed that a course is expected to be predicting whether the learner will eventually get the certificate according to the characteristics of the learner’s behavior. And we also try to verify the previous analysis and conclusions.

We suppose that the behavioral data was collected from the registered learners on Data Structures and Algorithm Analysis (DSAA) course for the first 5–7 weeks. After filtering the behavior of unregistered learners in the data record, the sample statistics are shown in Table 3.

Define each course having learners and each learner having characteristic values

Predictive value is

in which denotes a course; means it is unlikely to get a certificate, and means you might get a certificate; and are a predictive function.

We have chosen the characteristic values of the courses, and they have an impact on result about learner’s study. From Table 2, they are the number of text learning behaviors (), the number of multimedia learning behaviors (), the number of online practice behaviors (), the number of download courseware behaviors (), and the number of online questions behaviors ().

According to this course, the data set is divided into training set, validation set, and test set randomly; the ratio is 3 : 1 : 1. To use the training parameters with the training set for each experiment, select the optimal parameters for the validation set, and then use the test set to calculate the indicators. We used three classification models: linear discriminant analysis (LDA), logistic regression (LR), and linear support vector machine (LSVM). They are used to predict the course, and the experimental results are shown in Table 4.

The experimental results show that the three classifiers show consistent performance, and the accuracy is higher. Figure 5 shows the time series changes of the predicted F-core by DSAA course. According to the learners’ learning behaviors in the first half of the semester, we accurately predicted the final study results, whether it can obtain the certificate [2]. In fact, if a learner performs well in the first half of the semester, it is shown that he is firm and capable. He is more likely than others to get the certificate finally.

4.4.2. Emotional Guide Analysis

Achievement prediction can help E-Learning platform to discover the abnormal situation, so as to timely intervention and guidance for students. Because online learners are mainly independent learners, they are in isolation and lack emotional communication, which makes them lack emotional support and have difficulty in maintaining long-term learning enthusiasm [19]. It is an effective way to solve the emotional deficiency by constructing an intelligent learning guiding mechanism in the student profile and providing some emotional help and support services to the learners during the learning process [20].

According to the E-Learning resources and learners’ behaviors, we can present an evaluation model supported by the duration, frequency of access, concentration, and other parameters to evaluation learners’ emotion as shown in Figure 6.

5. Conclusion

In this paper, we deeply study the online learning behavior and build the student profile with big data processing technology. Firstly, we analyze the characteristics of learners and the factors that influence the learning behavior and use the method of attribute reduction to cleaning the data. Then, we calculate the similarity of students’ behavior and use the Jaccard coefficient algorithm to classify the students. Finally, the student profile has been established as well as visual analysis. We confirm that E-Learning course requires definite objective, inner motive, synchronous feedback, and independence of the learners. The student profile helps the student to understand their learning situation, to find their own problems, and to improve the completion rate of online courses. With the continuous accumulation of education data and in-depth development, the student profile is bound to promote the healthy development of E-Learning. In the future, we will conduct in-depth study on the fragmentation of knowledge aggregation online.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


This work was supported by the Tianjin University of Science and Technology Youth Innovation Foundation (no. 2016LG28).