Abstract

Aiming at the problem of supply-demand matching of online reading, an analysis method of children’s online reading behavior oriented for family education has been put forward. The data-based classification method is constructed to classify the sample population by statistical methods, and the traditional index classification is carried out by using K-medoids clustering and logistic regression analysis. The matching degree of population classification is discussed through comparison. R language and Mplus are used to analyze the data for the objective classification of the sample data set. Based on the reading response behavior of children’s online reading users, a differential item functioning (DIF) test of socioeconomic status is carried out. At the same time, the population is divided by traditional economic classification indicators to carry out a DIF test and explore the differences in the reading ability of different classification groups. By comparing the results of the two grouping methods, the main family socioeconomic status factors affecting reading performance are explored and targeted countermeasures are put forward. The experimental results show that when analyzing children’s online reading behavior, using machine learning algorithms such as cluster analysis, logistic regression analysis, and so on can get consistent results and then using the DIF test to explore the responses of category groups can effectively distinguish group differences.

1. Introduction

In the information society, the radiation range of computer networks and digital technology is becoming wider and wider, which has become a necessity for people’s daily communication and reading [17]. For the younger generation growing up with the Internet, known as the “net generation,” online reading has become one of the indispensable reading methods [8, 9].

“Demand” refers to the various needs for objective things derived by people (including individuals, groups, strata, and the whole group) to maintain their own growth and continuity. The whole process of users purchasing and using products is a process of meeting their needs. In this process, the old and new needs of users may appear alternately. User needs generally have the following characteristics: explicit needs and implicit needs coexist. Explicit needs are the needs that users themselves can clear, know, and express and implicit needs are the needs that users cannot express or even perceive. Different users have their own particularity, so the demand also shows the characteristics of coexistence of individuality and commonness. In addition, the demand also has hierarchy and fuzziness. Users’ demand for online reading is high-quality reading content, and users of different natures have different needs for online reading. As reading is a branch of humanities, it is more vulnerable to the influence of multidimensional index systems such as regional culture and economic level. Therefore, we need to use appropriate methods to carry out family education oriented analysis and study the impact of regional and economic differences on children’s online reading behavior. Differential item functioning (DIF) [1012] has been paid more and more attention. From the initial fairness research to the consideration of the validity and reliability of the test itself, DIF research has always played an important role. DIF refers to that when subjects with the same ability from different groups have different response probabilities when answering the same question, and there is a deviation in the question. With the continuous in-depth exploration of DIF methods, DIF methods have a wide variety and rapid development, moving towards a more comprehensive and scientific direction. DIF analysis is also increasingly used in the field of psychometrics, language testing, intelligence testing, educational evaluation, and other fields to detect the deviation of project level.

Online reading users come from all over the world, with different economic and cultural backgrounds, and their needs for the types of reading vary from person to person. Based on the online reading comprehension test of grade 2 in primary school, developed by an internet education enterprise, this paper investigates the subjects of grade 2 in primary school in 19 provinces, cities, and autonomous regions, and recovers 1309 valid data.

The technical scheme of this study is that when discussing the differences in reading ability of different groups, we can get more consistent results by using machine learning algorithms such as cluster analysis, potential category analysis, k-nearest neighbor algorithm, discriminant analysis, and logistic regression analysis, and then exploring the responses of category groups by the DIF test, which can effectively distinguish group differences.

2.1. Gender Differences in Reading

Breland and Lee [13] observed the scores of reading, writing, and listening of male and female candidates in the English Language Ability (ELA) test, and found that there were significant differences in the scores of men and women in the writing part, which was more beneficial to boys. In the PISA2009 reading literacy assessment, Chen and Jiao [14] took gender as the traditional dominant group variable and found five medium DIF items, which were obviously biased towards boys. In other gender studies on reading literacy, Aryadoust [15] used a recursive segmentation Rasch tree to investigate the DIF source of reading comprehension test. One of the grouping variables used is gender. In the test, candidates with high grammar scores are affected by gender differences, and girls are at a disadvantage. It can be seen that when investigating the gender differences of the DIF in reading test research, there is a consistent bias, which may be related to the objective and fixed gender grouping.

2.2. Socioeconomic Status Differences in Reading

Chen and Jiao [14] explored the DIF items with economic, social, and cultural status (ESCs) as nontraditional dominant group variables in the PISA2009 reading literacy assessment, and found that three DIF items were more favorable for subjects with high ESCs. Cadime et al. [16] took urban and rural areas as the division standard of economic level, tested the DIF items of the reading test of Portuguese students in high and low economic level groups, and found that five moderate DIF items were beneficial to students in high economic level groups from cities. Little et al. [17] showed that living in the neighborhood of communities with poor economic conditions can also predict lower reading test scores. Leu et al. [18] analyzed the students in economically developed school districts and economically less developed school districts, respectively. The results showed that due to the imbalance of economic development levels, students’ online reading ability was significantly predicted. Morrow [19] believes that the difference in the regional economic situation of the school is the main reason for the differences in the application strategies of middle school students in online reading. However, some studies have pointed out that family economic status has no significant impact on children’s reading performance [20].

In conclusion, in the DIF analysis, gender grouping is the most basic and important traditional dominant grouping variable. Due to the clear grouping boundary, the DIF results are usually consistent. In some areas of reading comprehension test, question answering is more beneficial to boys, and such test results appear. However, when the grouping variables are nontraditional explicit grouping variables such as socioeconomic status and cultural region, the results are often different or even contradictory. This may be related to the uncertainty of nontraditional dominant grouping variables.

Analyzing the concept and composition of socioeconomic status, it is not difficult to find that with the progress of measurement technology and the accumulation of theoretical achievements, the conceptual boundary of socioeconomic status as a multiindex system is gradually blurred and the extension is gradually expanded. When researchers choose the classification index of SES, it is impossible to be completely consistent, and the classification results have a direct impact on the DIF test, so they can get a variety of DIF test results.

Summarizing the above research, it can be found that when researchers try to explore the differences of the online reading ability of different types of users, they usually need to classify users. When a multiple index system (such as SES) is selected as the classification basis, the classification method is usually more empirical and subjective, resulting in the reduction of the reliability of the results. At that time, when objective criteria were used for analysis (such as DIF), more consistent conclusions could be obtained and multiple indicators (family income, parents’ education level, and parents’ professional status) could not be divided. Whether cluster analysis and latent category analysis can be used to objectively classify online reading users under multiple indicators and then to explore the response of category groups by DIF is the main problem of this study. This study will take the online reading test of grade 2 of primary school as an example to investigate the influence of socioeconomic status on students’ online reading tests, and achieve the analysis goal of children’s online reading behavior oriented for family education.

3. Overall Research Framework

Reading is not only an important way for people to obtain information but also a basic way to improve their literacy. The strength of reading ability determines a person’s knowledge reserve to a certain extent. Especially in the context of the rapid popularization of online reading, users’ demand for high-quality content is increasing. The key to developing high-quality content is to fully meet the reading needs of different people and develop content and test questions suitable for reading at different ages and levels. It is particularly important to analyze children’s online reading behavior for family education. The most appropriate method to detect the availability of content topics specifically for different groups is DIF.

The population of online reading users is diverse and complex, which is different from the deterministic population with school classes as units. Moreover, the population with different socioeconomic status (SES) discussed in this study is usually not a single grouping variable, but a compound or multidimensional grouping variable, that is, a multiindex evaluation system. There is no unique standard for the population category, so it needs to be objectively classified with the help of statistical classification methods based on the data itself.

The above literature analysis shows that the biggest advantage of cluster analysis is when the population is not clearly classified. It can be better classified according to the real characteristics of the data itself. In addition, latent category analysis is a common method to classify latent variables. Therefore, based on a more robust clustering analysis method K-medoids [2123], this study first classifies the subjects and verifies them with potential category analysis to clarify the rationality of secondary classification by using statistical methods. Research on the influencing factors of reading ability has always been a topic of continuous exploration by researchers. Based on K-medoids clustering grouping, taking the second grade online reading test as an example, this paper carries out the DIF test to investigate the impact of socioeconomic status on students’ responses to reading tests, and realizes children’s online reading behavior analysis oriented for family education.

The main significance of this study is as follows:(1)This study explores and empirically uses quantitative research methods for user analysis, provides a new idea for the general environment that focuses on qualitative research, makes the demand for research clearer and clearer from a multidimensional perspective, and obtains an objective and scientific division of the population on the basis of statistical classification methods.(2)This study explores the important family socioeconomic status factors affecting users’ reading ability and can provide corresponding countermeasures and suggestions for vulnerable reading users. Therefore, this study can improve user product experience satisfaction and increase user stickiness and retention rates for enterprises.

The content of this study is mainly divided into the following two parts:(1)In order to analyze the online reading needs of users with different socioeconomic statuses, it is necessary to classify the population first. Due to the variety of traditional SES classification indicators, it is necessary to classify with statistical methods to ensure the true characteristics of response data and realize objective grouping. The R language is used to make the clustering analysis diagram under the K-medoids clustering analysis method and then use latent category analysis (LCA) to verify the reliability and stability of clustering results.(2)Based on K-medoids cluster grouping, the DIF test was carried out to study the differences of children with different economic levels in the reading test. This paper probes the influence of the difference of families’ socioeconomic status on students’ responses to reading test. Reasonable suggestions are put forward for enterprises according to the results of project function differences. When it is unable to meet the reading needs of the two groups at the same time, matching the reading materials launched with the economic situation one by one is focused on.

The overall research framework is shown in Figure 1.

4. Research Methods

The main purpose of this study is to analyze the reading ability level of the two groups of subjects with high and low economic level by using the DIF test. By clarifying the differences, the influencing mechanism of family socioeconomic status is explored and the corresponding improvement measures are put forward.

4.1. DIF
4.1.1. MH Method

The reading test in this study is objective in the form of two-level scoring. Therefore, the Mantel–Haenszel method [24], abbreviated as the MH method, is one of the most widely used methods for DIF detection. The method starts by grouping the subjects according to their ability level. They were divided into five groups from lowest to highest according to test scores or ability . This process is automatically realized in R software. The MH method calculates statistic by comparing target groups with the frequency of correct and wrong answers on each question. The value of is between (0, +oo). is no DIF in this item. is beneficial to the target group. is beneficial to the reference group. In order to represent project functional uniformity with 0, is logarithmically converted to the following formula: When is positive, the project is beneficial to the target group. When was negative, the project benefited the control group.

Educational testing service (ETS) classifies DIF entries into three levels based on the MH method. Grade A is negligible. Grade B should be modified. Grade C is a problem path that has very serious project functional differences and should be removed.

4.1.2. LRDIF Method

It is found that different methods have different statistical test power and unique advantages. Therefore, a variety of methods used together can play their own advantages. This makes detection results more scientific and effective [25]. In this study, the LRDIF method was used to test. LRDIF is a DIF test method proposed by Swaminathan and Rogers [26] that is suitable for 0, 1 scoring, and multistage scoring tests. Again, this method can take test scores as matching variables. Its biggest advantage is that it can calculate both consistent and inconsistent DIF. The LRDIF method uses the model comparison to test the significance of each parameter in the following formula:

The logarithm is taken to obtainY is the dependent variable and can be 0 or 1. x1 is the test score, x2 is the grouping variable, and x1x2 is the interaction term. Regression parameters β0, β1, β2, and β3 were estimated by the maximum likelihood method (MLE) or the least square method (LSM). Different test results have different implications for DIF detection. If only P0 and β1 are significant in the equation, there is no DIF in this item. If β0, β1, and β2 are significant in the equation, it indicates that the item has consistent DIF. If the interaction parameter β3 is also significant, then the problem has a nonconsistent DIF.

4.2. Reading Achievement Difference Inspect

The reading achievement difference inspect is shown in Table 1.

According to Table 1, the scores of students in the group of high socioeconomic status were significantly higher than those in the group of low socioeconomic status. Specific performance included the average wage classification group(t = −7.322, , Cohen’s d = 0.411), per capita disposable income classification group(t = −0.951, , Cohen’s d = 0.208), regional GDP classification group (t = 8.762, , Cohen’s d = 0.487), and East-West geographical and economic division classification group (t = −11.134, , Cohen’s d = 0.452). According to Cohen’s standard, except for the small effect size of 0.2 in per capita disposable income classification group, the all other three effects reached the standard of medium effect size of 0.4 [27].

4.3. DIF Inspection

In order to explore whether reading score difference comes from the real difference of subjects or from deviation, we need to do further inspection of the project function differences.

4.3.1. Unidimensional Test

Before the DIF test, it needs to meet the unidimensional hypothesis, so a unidimensional test is conducted. The commonly used method to prove the unidimensionality of the test is factor analysis. The fitness test of factor analysis shows that the KMO of this study sample is KMO = 0.944 and Bartlett sphericity test X2 (2016) = 16468.933, . Therefore, sample data are suitable for factor analysis. If the ratio of the eigenvalue of the first component to the eigenvalue of the second component in factor analysis is greater than 3, the test can be considered as one-dimensional [28]. In this study, the eigenvalue of the first factor was 11.767 and the eigenvalue of the second factor was 1.721. The ratio of the two is much greater than 3, so it meets the regulations.

It can be seen from Figure 2 that the eigenvalue curve in the lithotripsy diagram tends to be flat after the first factor. Combined with the result that the ratio of the first eigenvalue to the second eigenvalue is greater than 3, a factor is finally retained. Therefore, it is considered that the test conforms to the unidimensional hypothesis.

4.3.2. DIF Results of Traditional Grouping

The MH method and the LRDIF method were used to test the two groups of high and low economic levels under the four traditional economic index groups. The calculation results of the MH method are divided into three levels according to the calculation standard of ETS in the United States, that is, based on the absolute value of ∆MH. If its absolute value is less than 1, it is marked as grade A DIF. If its absolute value is between 1 and 1.5, it is marked as grade B DIF. If it is greater than 1.5, it is a serious DIF item and will be marked as grade C DIF.

The effect size of the LRDIF method is Nagelkerke’s R2. According to Zumbo & Thomas standard labeling grades, (0, 0.13) is classified as grade A, namely slight DIF. (0.13, 0.26) is classified as grade B, namely moderate DIF. (0.26, 1) is classified as grade C, namely severe DIF.

The DIF inspection results of the average wage grouping are shown in Figure 3.

The DIF inspection results of per capita disposable income grouping are shown in Figure 4.

The DIF inspection results of East-West geographical and economic division grouping are shown in Figure 5.

The DIF inspection results of the regional GDP grouping are shown in Figure 6.

The DIF analysis was performed on 64 items of the online reading test in grade 2 using the MH and LRDIF methods. Two DIF test methods are used to test the DIF items under the grouping of the average wage, per capita disposable income, East-West geographical economic division, and regional GDP. As can be found, the regional GDP grouping has the largest number of DIF items, and most of them belong to grade A DIF, while the small numbers are grade B and C DIF.

The reading ability module results reflected by DIF items are further analyzed, as shown in Figure 7. In Figure 7, groupings 1, 2, 3, and 4 represent the groupings of the average wage, per capita disposable income, East-West geographical economic division, and regional GDP, respectively.

The online reading test includes six ability modules: language foundation, information extraction, understanding and inference, transfer and application, overall perception, and appreciation evaluation. As can be seen from Figure 7, DIF items detected in the four traditional grouping methods include the six ability modules. Moreover, the proportions of the modules are almost evenly distributed. It is difficult to distinguish which subdivision of reading ability the DIF items focus on.

4.3.3. DIF Clustering Results

Due to the low consistency of the four traditional groups in the initial group and the final DIF test results, the statistical grouping method of cluster analysis is used to conduct the DIF test again for the responses of online reading users. Compared with previous research results, the results obtained are shown in Tables 2 and 3 and Figure 8.

As can be seen from Tables 2 and 3, the number of detected DIF items is greatly reduced after grouping by the cluster analysis method. The results of the DIF test were consistent with those of the two methods. The distribution of modules also shows obvious rules. The details are as follows: first, only 5 DIF items were detected by the two methods, respectively, among which 3 questions overlapped. Second, as shown in Figure 8, 60% of the five titles detected by the two methods focus on the language foundation module and 20% or more focus on the understanding and inference module. The MH method detected three language foundation titles, 33, 49, and 57 as positive, which is favorable to the high economic level group.

5. Discussion

Previous studies have pointed out that socioeconomic status is one of the main influencing factors of reading. Using the existing population categories and DIF test, this study can clarify the impact mechanism of family socioeconomic status on children’s online reading users. Firstly, the difference test of reading performance shows that the reading scores of the four traditional index groups are significantly different. Secondly, continuing to do the DIF test, it is found that the difference in reading performance comes from the difference in the item function of the test. Under the four traditional groups, the number of DIF items detected is large, and the law is not obvious. The language basis, understanding and inference, information extraction, transfer and application, overall perception, and appreciation evaluation modules of DIF items in the reading test are distributed, which means that the subjects in the low socioeconomic status group are very inferior in their overall reading ability, that is to say, they are unable to put forward targeted demand suggestions to users. Then, it will be more difficult to continue reading practice, and the effect of improving reading level cannot be estimated. It is speculated that this situation may be related to inconsistent grouping.

When the two types of subjects grouped by clustering and LCA are used as the target group and the reference group for the DIF test, it is concluded that the number of DIF items is significantly reduced, and there are rules to follow, which is reflected in the language foundation and understanding inference module, and the students with low socioeconomic status are at a disadvantage. This is also the part of reading ability that students are most exposed to and mainly trained in daily learning. It is the module that is most likely to open the gap, which is consistent with the research expectation.

Reading comprehension is the ability to extract and construct meaning from text. Vocabulary and world knowledge are the two main predictors of reading comprehension test. The DIF items focus on the “language foundation” and “understanding and inference” modules in the reading test, and the subjects in the high-level economic group perform better in answering questions. It can be inferred that superior family conditions can provide children with a better family language environment and a variety of ways to help children understand text information. The Family language environment includes both software and hardware. Hardware refers to hard conditions such as entity language learning materials, books, and CDs, while software refers to the language quality input by children and their families. Studies have demonstrated that language environment and family input are closely related to children’s language development. The interaction between parents and children in high SES families has richer additional language, stronger modification, less punitive, and imperative language. The differences in the language models of parents in families with different economic statuses determine the differences in the development of the children’s language foundation.

Therefore, for children with low family socioeconomic status, online reading enterprises should provide more reading text exercises focusing on the improvement of basic language ability and even launch picture books with voice functions to let children listen and read together and fundamentally improve their basic language ability. For the cultivation of understanding and inference ability, businesses should focus on thinking exercise books in recommended books so that children can think wirelessly, enhance their learning motivation, and gradually improve their thinking, reasoning ability, and understanding. The supporting test questions can use VR technology to detect the changes of students’ mouth shapes during pronunciation and reading aloud to the greatest extent and give feedback, so as to ensure the accuracy and quality of children’s practice. On the other hand, for children from high SES families with a good foundation, enterprises should develop reading text contents and exercises conducive to the cultivation of high-level abilities such as overall perception, transfer application, appreciation, and evaluation, so that children can gradually develop the ability to think independently, form the habit of thinking and solving problems independently, draw inferences from one instance on reading materials of the same nature, and lay a foundation for learning complex reading texts in the senior grade.

6. Conclusions

This study explores the important family socioeconomic status factors affecting children’s reading ability and can provide corresponding countermeasures and suggestions for vulnerable reading users. So as to improve children’s product experience satisfaction and increase user stickiness and retention rate from the perspective of enterprises. Provide guidance and suggestions for future product iteration and updates and combine practical research with theory to make products that users are more inclined to agree with. This study explores and demonstrates the quantitative research methods suitable for user demand analysis and supplements the research by focusing on qualitative research to analyze user demand and user experience. The exploration from multiple perspectives makes the research on user needs closer to the real needs of users, and the population is objectively and scientifically classified on the basis of statistical classification methods.

The limitation of this study is that the online reading of other grades remains to be discussed. There are many methods of data classification, and other methods will be introduced in future research. The design of the topic content may not be balanced enough. In addition, this study is limited to actual sampling, which is difficult. When investigating the differences of family socioeconomic status, it is represented by the regional economic level, which has a certain deviation. Future research will focus on improving these aspects. When analyzing children’s online reading behavior oriented to their family education, we only start with gender and socioeconomic status differences in reading, which is not comprehensive enough. This is the biggest limitation of this study. Children’s growth environment, personality, and other factors will be taken into account in future research.

Data Availability

The experimental data used to support the findings of this study are available from the author upon request.

Conflicts of Interest

The author declares no conflicts of interest.