Abstract

Internet addiction refers to excessive internet use that interferes with daily life. Due to its negative impact on college students’ study and life, discovering students’ internet addiction tendencies and making correct guidance for them timely is necessary. However, at present, the research methods used in analyzing students’ internet addiction are mainly questionnaires and statistical analysis, which relies on the domain experts heavily. Fortunately, with the development of the smart campus, students’ behavior data such as consumption and trajectory information in the campus are stored. With this information, we can analyze students’ internet addiction levels quantitatively. In this paper, we provide an approach to estimate college students’ internet addiction levels using their behavior data in the campus. In detail, we consider students’ addiction towards the internet is a hidden variable which affects students’ daily time online together with other behavior. By predicting students’ daily time online, we will find students’ internet addiction levels. Along this line, we develop a linear internet addiction (LIA) model, a neural network internet addiction (NIA) model, and a clustering-based internet addiction (CIA) model to calculate students’ internet addiction levels, respectively. These three models take the regularity of students’ behavior and the similarity among students’ behavior into consideration. Finally, extensive experiments are conducted on a real-world dataset. The experimental results show the effectiveness of our method, and it is also consistent with some psychological findings.

1. Introduction

Internet addiction disorder refers to excessive internet use that interferes with daily life [1]. Some research shows that the addiction towards the internet has a negative impact on college students, such as the backwardness of study, health, and social relationship [13]. Therefore, it is necessary to discover students’ addiction tendencies towards the internet and make correct guidance for them.

At present, related works of internet addiction are concentrated on psychological fields. Such works focus on the causes, the influence of internet addiction, and internal mechanisms leading to internet addiction, together with methods to eliminate internet addiction. There are few works on calculating internet addiction levels quantitatively. Besides, the methods used for analyzing are mainly questionnaires and statistical analysis, which are cumbersome and relies on the domain experts heavily. Therefore, it is necessary to develop an approach to explore students’ internet addiction level quantitatively and automatically.

Fortunately, with the development of the smart campus, students’ behavior data are collected, such as the access data and consuming data. With these data, it is possible to analyze students’ internet addiction levels quantitatively.

To this end, in this paper, we propose an approach to estimate students’ internet addiction levels using their behavior data. Currently, there is no method to evaluate students’ addiction level precisely, so we are unable to study it with supervised methods explicitly. Instead, we can calculate students’ internet addiction level through another task. In detail, based on the definition of internet addiction, we consider that the student’s internet addiction level is a hidden variable, which will affect students’ daily time online. Besides, student’s behavior data such as consuming data and the internet access gap reflect student’s daily activities, which may also influence the time they spend online. Then, we can predict students’ online time with their behavior data and internet addiction level. Through such a task, the internet addiction value can be inferred. Along this line, we propose a linear internet addiction (LIA) model, a neural network internet addiction (NIA) model, and a clustering-based internet addiction (CIA) model to capture the relationship between students’ behavior data, internet addiction, and the time they spend online every day.

Furthermore, students have fixed disciplines every week, which leads to the regularity of time they spend online every week. LIA and NIA models take the regularity of students’ behavior into consideration, and the CIA model mainly uses the relationship among students’ behavior to learn their internet addiction level. Finally, we conduct extensive experiments on a real-world dataset from a Chinese college, including internet addiction calculation, internet addiction verification, and internet addiction analysis experiments. Particularly, to verify the internet addiction value we calculate is credible, we compare our results with the results evaluated from the psychological scale. The experimental results demonstrate the correctness and effectiveness of the model we propose. And the results are also consistent with some psychological findings.

The main related work of this paper can be divided into two parts: internet addiction analysis and campus data mining.

2.1. Internet Addiction Analysis

Internet addiction analysis is a research direction in the psychological field. Some works focus on the causes of internet addiction. Researchers found that interpersonal difficulties, psychological factors, social skills, etc., are all reasons for internet addiction [1, 4, 5]. Other works aim at finding the influence of internet addiction. Upadhayay et al. claimed that excessive use of the internet would lead to the drawback of the study [2]. He et al. explored internet addiction’s influence on the sensitivity towards punishment and award [6]. Their result shows that people with serious internet addiction are more sensitive to risk. There are also some works about the inner mechanism of forming internet addiction. Zhang et al. focused on the inner reason of family function’s negative influence on internet addiction. They revealed that the stability and development of family might affect users’ mental situations such as dignity and loneliness, and then such mental situations will have an influence on internet addiction [7]. Zhao et al. noticed that stressful life events make users feel depressed, which causes the user addicted to the internet [8].

2.2. Campus Data Mining

Data are produced everywhere in our daily life activities, for example, the consumption records, chatting records, web browsing records, and so on. Using such data, we are able to make some interesting applications, such as tag recommendation, which suggests a list of tags when a user wants to annotate an item. Wang et al. proposed the TAPITF model to combine both time awareness and personalization aspects into tag recommendation task [9]. Campus data mining refers to solving problems on campus with data mining methods. Some works mainly analyze students’ daily behavior in life. Guan et al. predicted students’ financial hardship through their smart card usage, internet usage, and students’ trajectories on campus (Dis-HARD model) so that the school can offer those students with stipend portfolios [10]. Based on this work, Ye et al. proposed a model [11], which predicted stipend portfolios with multimodal data. Their work has higher accuracy compared to the Dis-HARD model and protects students’ privacy. The Bayesian method is widely used in many fields. Wang et al. proposed a Bayesian probabilistic multitopic matrix factorization model for rating prediction [12]. And similarly Zhu et al. proposed an unsupervised method under the framework of empirical Bayes to calculate students’ procrastination value with their borrow info in the library [13]. Peng et al. proposed a deep topical correlation analysis approach to track students’ thoughts and serve the development of smart campus using multimodal data [14]. There are also some works aiming at analyzing students’ studying process and improving their performance in class, which is called educational data mining (EDM). For example, Burlak et al. identified if a student is cheating in an exam by analyzing their interactive data with online course systems such as start time, end time, IP address, and access frequency [15]. Abdi et al. predicted students’ grades based on their answers to usual work and duration of stay on a question [16].

Above all, to the best of our knowledge, there is no work on analyzing internet addiction using students’ daily behavior. And we are the first to analyze internet addiction based on their behavior data with data mining methods.

3. Preliminaries

Internet addiction is an abstract concept in the psychological field, so it is hard to give a measurable definition of internet addiction. To solve this problem, we first make a reasonable assumption about internet addiction. Then, based on this assumption, we calculate the internet addiction value using students’ behavior data.

3.1. Internet Addiction Assumption

Psychological research shows that most college students are addicted to the internet [17]. And we mentioned that internet addiction refers to excessive use of internet interfering with daily life. Therefore, students with different internet addiction levels are very likely to spend different time online. Besides, different behaviors show the different activities in school, which in turn also leads to different online time. And students of different genders or departments will also have some differences in the internet use.

Based on such fact, we assume that internet addiction is a hidden factor, which may influence students’ daily time online together with their behavior and profile information. Therefore, we will learn such factors by modelling how students’ internet addiction and behavior influence daily online time. To simplify the problem, we also assume students’ internet addiction level will not change in a semester.

3.2. Problem Formulation

Since we do not have any label about internet addiction level, we cannot use supervised methods to study students’ internet addiction value. Thus, we need to estimate it through some known data. Based on our assumption that the internet addiction value is a hidden variable, which may affect the time students spend online, the value can be learned by predicting students’ daily online time.

Formally, we define as the internet addiction level of student u. Daily time online sequence of student u during a period T is represented as . And the daily behavior sequence of u during the same period is represented as . We also define the personal profile information of student u as . Our task is to model the relationship , which is how students’ behavior and internet addiction influence their daily time online. Then the internet addiction level can be calculated from this model. Note that t above is in the set T.

4. Internet Addiction Calculation Model

To calculate students’ internet addiction level, we propose three internet addiction calculation models: the linear internet addiction (LIA) model, the neural network internet addiction (NIA) model, and the clustering-based internet addiction (CIA) model. For the LIA model, we mainly consider the linear relation between students’ behavior, internet addiction level, and their daily online time. Furthermore, since the neural network is powerful to capture the higher order relation among features, we explore the NIA model to find that nonlinear relation between students’ behavior, internet addiction level, and their daily online time.

As for the CIA model, instead of directly studying the relation between students’ behavior, internet addiction level, and their daily online time, we think that students who spend more time online than the normal online time are more likely to be addicted to the internet. So we devise a clustering-based method to find the normal online time and then regard the difference between students’ actual online time and the normal online time as their internet addiction level.

In this chapter, we first describe these three models in detail, and then we will discuss the advantages and disadvantages of each model.

4.1. Linear Internet Addiction (LIA) Model

In this section, we first introduce how we use a linear model to reveal the relationship of . Then to strengthen the model, we take the regularity of students’ behaviors into consideration.

4.1.1. Naive LIA

Based on the internet addiction assumption, the behavior is a factor which will influence students’ online time. However, different kinds of behavior may have a different effect. Therefore, a weight vector is necessary to represent the different effects of each kind of behavior. The impact of behavior on online time is not different in individuals, so every student shares this weight vector. We deal with different kinds of personal attributes in the same way. Besides, even two students have the same behavior and personal attributes, and they may still spend different time online because of the difference in their addiction level towards the internet. We suppose that different internet addiction level is the only reason which causes different time online with the same behavior and personal attributes. Here comes our naive linear internet addiction model:where represents the duration student u spend online at time t. refers to the combination of behavior vector and personal attributes of student u at time t, and is the weight vector of that combined vector. here is the internet addiction level of student u. Our task is to find the value of and that minimize the loss function, that is,

The item is used to prevent the model from overfitting. can be used to adjust the weight between the behavior and internet addiction.

4.1.2. LIA with Regular Behavior

College students usually have a fixed curriculum. Therefore, their behavior has some regularity every week, which will also lead to the regularity of the time they spend online. Take student u as an example; courses on Monday are kind of boring, so he spends a lot of time surfing the internet. However, courses on Tuesday are hard, which means he must pay attention to the class, so he may not surf the internet in class. Based on such facts, it is necessary to take the regular online time into consideration.

So, we modify our linear internet addiction model by adding an item to represent the regular online time of student u at time t. Due to the characteristics of the college study, they perform similar online habits every week. So here means which day of time t is of the week it belongs to, and means the regular online time of the day x of the week. Here comes our new model:

For the convenience of calculation, we define as an 8-dimensional vector with the first item one standing for the internet addiction and others being a one-hot representation of the week. The formula above is equal towith being equal to

Our task is to find a suitable and that will minimize the loss function, the first item of is the internet addiction level of student u:

Similarly, we add to prevent the formula from overfitting, and we use the formula to adjust the weights between behavior, personal attributes, internet addiction level, and regular habits.

4.2. Neural Network Internet Addiction (NIA) Model

The neural network is able to model the high-level relationship among features. It is powerful in a variety of application scenarios [1820]. For example, in the tag recommendation task, Yuan et al. utilized the multilayer perceptron to model the nonlinearities of interactions among users, items, and tags [21]. In this section, we develop a neural network internet addiction (NIA) model to represent the nonlinear influence of students’ behaviors, personal attributes, internet addiction, and their regular behavior on their daily online time.

4.2.1. Network Structure

The neural network consists of two parts: the public part and the private part. We use the public part to represent that the effect of the behavior and personal attributes on daily online time is not different in individuals, which means the input of the public part is the combination of the behavior vector of student u on time t and his personal attributes vector . The weight matrix and the threshold vector of this part will update every iteration.

Because the internet addiction level and regular behavior are different in individuals, we use a private part to depict such characteristics. Every student has his own weight matrix and threshold vector , and the parameters will only be updated when the corresponding student’s data are used as the input. The private input of student u on time t is the same as vector (5). To ignore the influence of regular behavior, we can also only keep the first item of vector (5).

The target output of the model is the actual online time of student u on time t: .

The structure of the network is shown as Figure 1.

Using the symbol we mentioned, the output of the public hidden layer is

The output of the private hidden layer isand the output of the network iswhere are the activation functions of the public hidden layer, private hidden layer, and the output layer and is the threshold of the output layer. The network will update for every input, and the loss function we use is the mean square error:where represents the actual online time of some student u on time t and is the output of the whole model.

4.2.2. Internet Addiction Calculation

After the neural network training is completed, the sum of the contribution that internet addiction gives to the private hidden units is the value of students’ internet addiction levels. We will calculate the internet addiction value as below:where stands for the number of private hidden layer units. j is the corresponding index of internet addiction in the private part input vector, and here the index is one. is the matrix, which connects the input layer and hidden layer of the private part. represents the i-th row and the j-th column value of the matrix .

4.3. Clustering-Based Internet Addiction (CIA) Model

In this section, we develop a clustering-based method to calculate students’ internet addiction value, which takes the similarity among students’ behavior into consideration.

4.3.1. Internet Addiction Calculation

As the smartphone becomes an indispensable part of students’ daily life, even the one not addicted to the internet will spend some time online, maybe for fun or just for killing time. However, those who are addicted to the internet heavily will spend much more time online than those who are not addicted to the internet. So, we believe that there is a normal online time corresponding to students’ behavior, and those who spend more time online tend to be internet addicts. And the more time online compared with normal online time, the heavier the internet addiction level is. Therefore, here comes our online time prediction formula:where represents the duration student u spends online at time t. refers to the normal online time for student u at time t. here is the internet addiction level of student u. Our task is to find the value of that will minimize the loss function, that is,

The item is used to adjust the weight between the normal online time and internet addiction.

4.3.2. Normal Online Time

Due to this, the students have different activities every day, and normal online time differs from behavior to behavior. To find the normal online time of student u at time t, we first need to find those who behave similarly with student u at time t. The average online time of those who behave similarly with student u at time t is approximately equal to the normal online time. That is,where represents the behavior vector of student u at time t. S stands for the similar behavior set, and is one similar behavior vector of , which means the behavior vector of student at time is similar with that of student at time . Students from different departments may behave differently because of the discipline characteristic, which will lead to a slight difference in normal online time. For example, students from the software engineering department may tend to spend more time online than the other students. So, we also take the profile information into consideration, the symbol here is equal to the vector in Section 4.1. And the formula sim (a, b) is the similarity value of vector a and vector b.

Considering the calculation amount, we do not compare every behavior of all students at all the time. Instead, we first aggregate students’ behavior into k categories. When we need to find the similar behavior set S of the behavior vector , we first find the category the behavior vector belongs to; let us assume the category is c, and then we start to calculate the similarity between and all the other behavior vectors in the category c. Finally, we keep those behavior vectors that have similarity greater than a threshold in the set S; based on which, we will get the normal online time .

4.4. Model Comparison

The idea of LIA and NIA is direct, and the target of these two models is to find how students’ behavior and internet addiction level influence their daily online time. The LIA model is easier to train because it has fewer parameters than the NIA model. Though the NIA model is much more powerful, it is hard to train the network as there are so many parameters.

The idea of the CIA conforms to our intuitive thinking that those who spend more time online than the normal time are more likely to be addicted to the internet. However, it is hard to find the normal online time. In this paper, we calculate the normal online time of a student u on a specific day t by averaging the online time of those students who behave similarly with u on t. The correctness of internet addiction calculation may be influenced by the precision of the clustering results.

5. Experiments

5.1. Data Description

Our data come from a Chinese college, including students’ consuming records in the school restaurant and internet access records. Besides, they also include the personal attributes information of students such as department, gender, and age.

The consumption records consist of students’ profiles, time, place, and amount of one consumption. Students have various consumption behaviors like normal dining, snack, shower, and deposit. Here we consider deposit is a special behavior, which is saving money to the school card. The behavior category can be identified through the place where consumption behavior takes place. For example, consumption in the school restaurant must be normal dining behavior, and consumption in the bathhouse must be shower behavior. Therefore, we first divide the places into different categories and then extract the consuming amount on dining, snack, shower, deposit, and total consuming amount per hour from the consuming records. We also count students’ daily consumption frequency.

Besides, students can access the internet using campus Wi-Fi only when they get authenticated. Based on the authentication record, we extract the time student accesses the campus Wi-Fi per hour. And such time is approximate to the time they spend on the campus. Similarly, at each time when a student visits a website, a connection record is generated. When the visit is completed, there will be a disconnected record. Based on these records, we can extract the student’s actual online time and the average gap between two internet access per day. After feature extraction, combining the daily consuming behavior and online behavior (actual online time is excluded), the behavior of a student in a day can be represented as a vector. We also represent every student with the one-hot method using their profile information.

Due to some reasons, we do not have students’ internet access records in the dormitory and library. It is considered that students’ activities are mainly centralized around classrooms and canteens as well as some college student activity centers. In class, students need to listen to the teachers most of the time, and at the restaurant, they always play with a phone to kill time. Therefore, the actual online time we extract is mainly about the entertainment. Intuitively, the entertainment time is suitable to be used to calculate the internet addiction level.

We choose the records of undergraduate students enrolled in 2016 and 2015 from September 1, 2018 to November 11, 2018. After dropping students with record number less than 35 days, there are 3767 students. The first 50 records are used for training, and the left records are used for testing. Students’ profile representation and daily behavior vector are shown in Table 1.

5.2. Internet Addiction Calculation

LIA, NIA, and CIA models can be used to study the internet addiction level by predicting students’ online time every day. To show the correctness of our models, we conduct several experiments.

For LIA and NIA models, we conduct three experiments. The first experiment removes the internet addiction and regular behavior part of LIA and NIA models and predicts students’ daily online time using students’ behavior data and profile information, which is considered as a baseline. The second experiment only takes internet addiction into consideration. For LIA, it means using the naive LIA model, and as for NIA, it means there is only one item of the input of the private part. The last experiment takes internet addiction and regular behavior into consideration. For LIA, it means using LIA with regular behavior model, and for NIA, it means there is 8 items of the input of the private part. For the CIA model, we conduct two experiments: the first experiment uses the average online time in the similar behavior set S as the prediction, which is considered as a baseline. And the other experiment first calculates the internet addiction value of each student using equation (14) and then predicts students’ online time using their neighbors’ actual online time and the internet addiction value by equation (13).

For the linear model, the value of is set to 0.6, and is set to 0.4. For the neural network model, the activation function of the hidden layer is , and the activation function of the output layer is . In addition, the number of public hidden layer units is 10, and the number of private hidden layer units is 2. The learning rate is set to 0.01, and the number of the epoch is 40. Note that for the third experiment of the NIA model, we set the learning rate to 0.05 which will get the best prediction accuracy. For the clustering-based model, the threshold is set to 0.7 and the cluster number is set to 50. The MSE performance of each method is shown in Table 2.

From the results in Table 2, we know that no matter which model, the prediction accuracy increases with our internet addiction assumption. Such results guarantee the correctness of our internet addiction assumption. However, for the LIA and NIA models, adding the assumption of regular behavior, the accuracy does not improve compared to the results without such an assumption. One possible reason is that there is some volatility in students’ behavior; however, LIA and NIA are not able to model it. Generally, the results of the neural network model and clustering-based model are worse than that of the linear model. Maybe it is because the linear model is strong enough to represent the relationship between students’ behavior, internet addiction, and online time. And there are too many parameters in the neural network model, which is not easy to train. Though clustering students into several categories before calculating the similarity will reduce the computing complexity, the prediction results depend on the clustering results, and that may cause some error. The bias of the clustering results may be a reason leading the worst prediction accuracy of the CIA model.

5.3. Internet Addiction Verification

In this section, we conduct some experiments to verify the correctness of the methods we propose. First, we show the consistency of the internet addiction value we calculated using the models we proposed and the value evaluated through the psychological scale. And then, we devise regression and classification tasks to verify the critical role the internet addiction value we calculated plays on daily online time prediction task.

5.3.1. Comparison with the Psychological Scale

In psychology, researchers usually use the internet addiction scale to measure if people are addicted to the internet. Therefore, we use a questionnaire to test if a student is an internet addict and compare the results calculated by the questionnaire with that by our method.

In consideration of the national condition of China, we choose the internet addiction scale devised by professor Fan [22], which is widely used in Chinese psychological researches. As the situation today is not exactly the same with that several years ago, we cut some questions on that scale and only keep five necessary questions. And we use 4 points Likert scale to measure the degree of each question. See Table S1 in the Supplementary Material section for the details of the scale we used.

After giving the questionnaire to students, we retrieve 128 questionnaires, which are enough to analyze students’ internet addiction levels in the psychological field. The students who complete the questionnaire consist of 78 males and 50 females, and there are around 81 students in grade 3 and 47 students in grade 4, which shows the samples are evenly distributed.

To show the effectiveness of the new scale we use, we calculate the reliability and validity of our scale, which are two dimensions to test if a scale is credible to use in psychology. The reliability and validity of our scale are 0.789 and 0.731 separately. The higher the value of the reliability and validity is, the better the scale is, and 0.7 means that our scale is credible enough to test the internet addiction.

On the principle of voluntariness, we did not force students to write down their student id or name. Since there are only 39 students who volunteer to give us their student id, we mainly compare those students’ results judged by the psychological scale and that by our methods. There are five questions on our scale. Because we use 4 points Likert scale to measure, the total grade is 20. The greater the grade a student gets, the more likely this student is addicted to the internet. We define those whose grade less than ten is not addicted to the internet, and the others are internet addicts. As for the results calculated by the LIA model, we consider those whose value greater or equal than 0.45 is addicted to the internet. The threshold of NIA and CIA models is set to 0.5 and 0.35 separately. 0.45, 0.5, and 0.35 are approximate to the average value of the corresponding method. We use F1 score to evaluate the consistency between the results of the LIA model, the NIA model, the CIA mode, and the psychological scale. The results are shown in Table 3.

From Table 3, we see that all the internet addiction values calculated through these three models are consistent with the results evaluated from the psychological scale. Particularly, though the CIA model performs poorly in the internet addiction calculation task, comparing with the NIA model, the internet addiction value of the CIA model is more consistent with the psychological scale results. Such results show the correctness of our methods and give us a clue that the relationship among behaviors is an important factor when calculating the internet addiction value.

5.3.2. Online Time Prediction

Based on our assumption, internet addiction is a hidden variable, which will influence students’ daily time online. Therefore the learned internet addiction value should be a useful feature to predict students’ online time. We devise two tasks to verify the correctness of our learned internet addiction value.

The aim of the regression task is to predict students’ daily online time. The baseline experiment takes the daily behavior vector and the profile information as the input. The contrast experiment predicts the daily online time using students’ internet addiction value, daily behavior vector, and profile information. For the classification task, it is similar to the regression task. First, the records are divided into two parts: one part with online time greater than or equal to the average online time, the other part with an online time less than the average online time. The aim of the classification task is to predict which part online time belongs to. The experiment settings are the same as the regression task. The methods used in the regression task and classification task consist of the decision tree (DT), support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF), gradient boosting decision tree (GBDT), bagging and extremely randomized trees (ET).

MSE is used as the evaluation method for the regression task, and F1 score for the classification task. The results are shown in Tables 4 and 5.

From Table 4, we observe that, for the regression task, the SVM model gets a huge mean square error. One possible reason may be that it is not suitable for this task, so we will ignore the SVM results in the discussion below. After adding the internet addiction value calculated by LIA and CIA models, all the prediction accuracies lift. And after adding internet addiction value calculated by NIA, although the promotion of prediction accuracy is not as remarkable as that of adding the value calculated by LIA or CIA models, most of the methods still get some promotion.

For the classification task, no matter which internet addiction value is added to the behavior vector, except for the effect of the SVM method has not changed, the effect of all the other methods has evidently been improved.

Generally speaking, after adding the internet addiction value calculated by LIA, NIA, or CIA, both regression and classification tasks get a remarkable promotion, which shows the effectiveness of the internet addiction value learned by the models we propose.

5.4. Internet Addiction Analysis

To show the internet addiction situation in college, we analyze the distribution of internet addiction and the differences of the internet addiction level among different groups such as different gender and department. Because the naive LIA model has the best prediction accuracy when studying students’ internet addiction value and the value learned through the naive LIA model is the most consistent with the psychological results, the following analysis is based on the value calculated by the naive LIA model.

5.4.1. Internet Addiction Distribution

Figure 2(a) illustrates the number of students with respect to the calculated internet addiction value. The greater the internet addiction value is, the more serious students’ addiction towards the internet is. We observe that internet addiction distribution is similar to a normal distribution. To show the distribution of the internet addiction value clearly, we delete the value greater than 0.7 or less than 0.2, which is shown in Figure 2(b). If we define internet addiction less than 0.45 is normal, from Figure 2(b), we observe that most of the students are addicted to the internet with different levels.

5.4.2. Internet Addiction Differences among Groups

To reveal the differences in internet addiction between genders, we count the average internet addiction value of different genders. And we also count the average online time of different genders. Figure 3 shows that girls spend more time on the internet than boys. However, boys are more addicted to the internet than girls. Such a result is consistent with the finding in the psychological field. Wei et al. investigated the internet addiction situation of the college student in Hubei Polytechnic University using questionnaires.

They point out that boys are usually not good at communication, and therefore, the communication in real life is not enough to meet their actual communication needs. The way of communication with the network as the medium is easier to control; that is, they can improve the quality and quantity of communication in this way, which meets their needs of communication. Besides, Girls are better than boys in time management ability and deal with network use time more reasonably. So boys are more addicted to the internet than girls [23]. The consistency with the findings of psychology further proves the correctness of the internet addiction value we learned.

Figure 4(a) illustrates the average internet addiction level of different departments. In general, except the internet addiction level of a few departments is extremely high, it fluctuates around 0.43. Furthermore, we statistically analyze the differences in internet addiction levels among students in different disciplines. In Figure 4(b), we can observe that there is no significant difference in internet addiction levels among students in different disciplines. The result is also consistent with the psychological finding in [23]. Experiments conducted by Wei et al. that demonstrate though there is some difference in the interpersonal health and time management ability among students in different disciplines, the difference is not significant. And the difference in internet addiction is not significant. The consistent result with psychological findings is also evidence of the effectiveness of the internet addiction value we learned.

5.4.3. Effect of Internet Addiction on Online Time

The decision tree is a classical machine learning model. It is good at classification and regression tasks, and it is interpretable. Therefore, the decision tree model has plenty of applications in various fields [2426]. To show the role internet addiction plays when predicting students’ online time, we extract students’ daily Wi-Fi access time, consuming amount, consuming frequency, average internet access gap, and actual online time. Then we conduct two binary classification experiments using classification and regression decision tree method: one predicts online time interval with daily Wi-Fi access time, consuming amount, consuming frequency, and average internet access gap, and the other predicts online time interval with daily Wi-Fi access time, consuming amount, consuming frequency, average internet access gap, and internet addiction value. Because the whole tree is too big to be put here, we select two representative branches. Note that all the values are normalized. The average value of the internet addiction value, consuming amount, consuming frequency, Wi-Fi access time, internet access gap, and online time is 0.45, 0.009, 0.044, 0.062, 0.004, and 0.015 separately.

From Figure 5(a), we know that Wi-Fi access time and average internet access gap are important features when predicting the online time. It is consistent with our intuitive thinking that less Wi-Fi access time and a long internet access gap will cause less online time. Figure 5(b) illustrates that after adding the internet addiction value, the value is critical for predicting daily online time. Particularly, in this branch, the relatively high internet addiction value is a reason leading to long online time.

5.4.4. Effect of Internet Addiction on Grade

The psychological research shows that internet addiction will damage students’ study [1]. To show the bad influence of internet addiction and to verify the correctness of the internet addiction value we calculated, we do some statistics about the grades of those who are addicted to the internet and those who are not.

As we mentioned before, there are only 39 students who volunteer to give us their student id, and one of them does not have any grade records, so the analysis of this part is mainly based on the grades of the remaining 38 students.

First, we define that students whose internet addiction values equal to or more than 0.45 are internet addicts, and the others are not. We divide students into two groups based on their internet addiction values. Then we calculate their average grade point of the second semester in 2018. At last, we count the average grade point and student number who failed at least one course of each group. The average grade of each student is calculated with the formula below:where refers to the average grade point of student u of the second semester in 2018, stands for all the courses student u takes in this semester, is the credit of course c, and is the grade point of course c student u gets.

The analysis results are shown in Table 6.

From this table, we see that almost half of the students are addicted to the internet. And the average grade point of students who are addicted to the internet is significantly lower than the normal students. There are more students who failed the exam in the internet addicts group than that in the other group. The statistics conform to the psychological findings that internet addiction has a bad influence on students’ study. Such results further verify the correctness of the internet addiction value we calculated.

6. Conclusions

In this paper, we estimate college students’ internet addiction levels quantitatively using their behavior data on the campus. Specifically, we define the internet addiction value as a hidden variable which will affect students’ online time and formulate the problem as a regression problem.

Along this line, we first propose a linear internet addiction (LIA) model, which depicts the linear relationship among students’ internet addiction level, behavior data, and time they spent online. To model the nonlinear relationship, we also provide a neural network internet addiction (NIA) model. Besides, we also develop a clustering-based internet addiction (CIA) model, which calculates the internet addiction based on the differences between students’ actual online time and normal online time. These three models also take students’ regular behavior and the similarity among students’ behavior into consideration.

Finally, we conduct excessive experiments on a real-world dataset from a Chinese college, and the experimental results demonstrate the effectiveness of our model. The analysis results are consistent with some psychological findings, which also verify the correctness of the models we propose.

Data Availability

The behavior data used to support the findings of this study have not been made available due to privacy concerns.

Disclosure

It is an extension of the paper Using Behavior Data to Predict the Internet Addiction of College Students [27] which is published in the International Conference on Web Information Systems and Applications (WISA) in 2019.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Key R&D Program of China (no. 2017YFC0803700) and NSFC grants (nos. 61532021 and 61972155).

Supplementary Materials

Table S1. Internet usage survey of college student. (Supplementary Materials)