Abstract

Measurement invariance refers to the equivalence of measurement instrument in different groups. Research on social science often involves comparing different groups, such as whether the relationship between two variables is the same in male and female groups. Measurement invariance is a prerequisite of these studies because if the measurement tools are not equivalent, we cannot distinguish the difference between the degree of measurement tools and the empirical results. The causal model proposed by Michael Siegrist is one of the baseline models for studying public acceptance of genetically modified food, but only a few studies have tested the invariance of the causal model. Thus, it is difficult for researchers to judge the reliability of some conclusions about group comparison, such as whether the risk perception of men is lower than that of women. In this study, we use sample data about China (N = 1091) to test the invariance of the causal model among groups with different genders and knowledge levels. The test results show that the model has full invariance across gender, and only factor loading invariance has no measurement error invariance across knowledge levels. The results of this study show that the conclusion about group comparison on gender in previous studies is credible, but the reliability of the measurement of the differences between knowledge level groups needs to improve before meaningful comparison can be made.

1. Introduction

Measurement invariance (equivalence) means that measurement instruments, such as scales, used in the study are identical across relevant groups [1]. In social science research, research objects often come from different subpopulations, such as male and female. Thus, determining measurement invariance is a logical prerequisite to evaluating substantive hypotheses about differences in groups, regardless of whether the comparison is as simple as between-group mean differences test or as complex as testing whether some theoretical structural model is invariant across groups [2]. Moreover, it is impossible to determine whether the relationship between variables observed in the study reflects the hypothetical relationship proposed in the study or it reflects an artificial relationship caused by differences in the measurement instruments [3].

If the variables used in the study are directly observable, such as income and education, then measurement invariance is easy to judge. However, if the variable is not directly observable, such as perceived benefit, it is often measured indirectly by manifest (i.e., observed) variables, and the measurement instruments are mostly scales. In this study, the measurement invariance needs to be tested using statistical techniques. Testing the equivalence of measurement instruments is called invariance analysis [4].

Genetically modified food (GMF) is an emerging food technology with multiple social and environmental benefits [5]. Like any other new food technology in history, public acceptance of GMF was low in the early stage of its development [69]. What factors influence public acceptance of GMF? This question has an important impact on the decision-making of stakeholders of food industries, such as policymakers, farmers, and agrobiotechnology enterprises [1012]. Many studies have examined this question [1317]. Some studies have shown that trust, perceived risk, and perceived benefit are the three most important factors that affect public acceptance [10, 13]. Siegrist examined these three factors and proposed a causal model to explain public acceptance of GMF. He found that perceived risk has a direct negative effect on public acceptance; perceived benefit has a direct positive effect on public acceptance, and trust indirectly affects public acceptance through perceived risk and benefit. In addition, perceived benefit has a negative direct effect on perceived risk [18, 19]. Owing to its explanatory power and simplicity, this causal model has been widely used to explain public acceptance in a variety of technological research, such as gene technology [15, 20, 21], financial technology [22, 23], nanotechnology [2427], renewable energy [2830], unmanned aircraft [31], and automated driving technology [32, 33]. The samples used in these studies are made up of individuals with different demographic characteristics, such as sex, education, and income. Therefore, for research about social science to be credible, it is essential to conduct invariance analysis of the measurement instruments. Unfortunately, only a few studies have explored the invariance of the causal model. In the existing literature, only Siegrist has tested the invariance of the gender group in this model [18].

Because it is impossible to test the invariance of all the possible individual features, the features that have a greater impact on the core variables of the model are tested [34]. Perceived risk is the core explanatory variable in the causal model [35]. Previous studies have found that gender and relevant knowledge level are important factors that affect an individual’s perceived risk [3641]. Therefore, this study analyzes the invariance of the model across gender and knowledge level variables.

This study complements previous studies in three aspects. First, the above discussion shows that invariance analysis is very important, but only a few studies have tested this aspect, so this study supplements the current research. Second, although Siegrist explored this issue, the data used in his study are about the United Kingdom. The data in this study are from China. Since China is quite different from most western countries in terms of culture, politics, and economic system, this study supplements Siegrist’s model. Finally, Siegrist tested only the measurement invariance of the causal model across gender, but this study considers both gender and relevant knowledge level, so this study is also an extension of the study of Siegrist.

The remainder of the study is organized as follows. The next section introduces the basic concepts, testing principles, and testing methods of the invariance analysis. The statistical hypothesis section briefly introduces the causal model of public acceptance of GMF and the statistical hypothesis (null hypothesis) used in the invariance test. The research method section introduces the measurement scales, samples, and data analysis methods used in this research. The results section shows the analysis results of the invariance test. The final section discusses the results, mainly the theoretical and policy implications, as well as the limitations.

2. Invariance Analysis

Measurement invariance (equivalence) was first proposed by Drasgow and Fritz: “Measurement equivalence holds when individuals with equal standing on the trait measured by the test but sampled from different subpopulations have equal expected observed test scores.” In particular, “individuals with equal standings on the latent trait, say verbal aptitude, but sampled from different subpopulations, say male and female, should have the same expected observed score” ([42], p. 134). Measurement invariance analysis helps to ensure that any comparisons made represent the true differences in the constructs being studied [43].

There are, essentially, four levels of measurement invariance, which are configural, metric, scalar, and strict invariances (for more details, the reader is referred to [2, 4]). Among them, the first two levels, configural and metric invariances, are collectively referred to as conceptual invariance, whereas the last two levels, scalar and strict invariances, are called psychometric invariance [44]. These levels are hierarchical: higher levels impose more restrictions on the measurement parameters while allowing a higher degree of comparability (see Figure 1) [4548]. The following paragraphs explain the four levels of measurement invariance in detail.

Configural invariance (the invariance of configuration) is also commonly referred to as pattern invariance. At this level, the focus is solely on testing whether the same items measure the given construct across multiple groups. To test this, both factor models are estimated simultaneously. Since this is the baseline model, it is only necessary to assess the overall model fit to test whether configural invariance holds.

Metric invariance is also commonly referred to as weak invariance. Metric invariance builds upon configural invariance; it requires that, in addition to the constructs being measured by the same items, the factor loadings of those items must be equivalent across multiple groups. The factor loadings reflect the degree to which differences in the responses of the participants to an item are due to differences in their levels of understanding of the underlying construct that is being assessed by that item. Thus, the invariance of the factor loadings suggests that the construct has the same meaning for participants across different groups. This is because if a construct has the same meaning across multiple groups, then there are identical relationships between the construct and participants’ responses to the items used to measure that construct.

Scalar invariance builds upon metric invariance; it requires that the item intercepts should also be equivalent across multiple groups. Item intercepts are the origin or starting value of the scale that a given factor is based on. Thus, participants who have the same value of the latent construct should have equal values for the items that the construct is based on.

The final level of invariance is called strict factorial invariance. Strict invariance refers to the invariance of the error terms of an individual indicator variable, representing the unique error of that indicator variable. Thus, when testing strict invariance, what is essentially being tested is whether the residual error is equivalent across groups.

Several methods have been proposed for testing measurement invariance. Vijver and Harsveld proposed that the factor parameters of the unconstrained model should be examined and those with the largest between-group differences should be classified as noninvariant [49]. Marsh and Hocevar suggested that the modification indices in the fully constrained model should be examined, and the large modification indices of the associated items that are indicators of noninvariance should be interpreted [50]. However, among all the potential methods, the method proposed by Byrne et al. and Byrne (2004) is the most widely used because of its rationality and rigor. In this method, multigroup confirmatory factor analysis (MGCFA) is used to estimate the unconstrained measurement model and a series of constrained measurement models; then the fitting indices of these models are obtained and compared to test the invariability of the scale [51, 52]. Since the constraints of the four levels of measurement invariance are progressively enhanced, the order of testing measurement invariance is generally the same as that of the four levels of measurement invariance; that is, testing is conducted in turn from configural invariance, metric invariance, and scalar invariance to strict invariance.

Based on the method of Byrne et al. and Byrne, measurement invariance analysis starts with configural invariance. Configural invariance is an unconstrained model and is used as the baseline model for subsequent tests. The null hypothesis of the test is “the same factor structure among groups.” If the unconstrained model fits well, it indicates that the measurement model has configural invariance, and a series of subsequent constrained model estimates can be conducted. If the fitting index of the unconstrained model is not up to the corresponding critical value, it implies that there is no configural invariance and subsequent invariance tests are not conducted [2].

The second model analyzed is the metric invariance model. Based on the configural invariance, intergroup equivalence restriction is applied to the factor loading to test whether there is invariance between each measurement item and its representative factor across groups. Metric invariance is the basis of measurement invariance. On the one hand, from the perspective of the moderate replication strategy, if the metric invariance is met, it has measurement invariance [52]. On the other hand, it is not possible to vary with higher-order invariance until metric invariance is met. Finally, the structural invariance analysis is based on the metric invariance [43].

The third model analyzed is the scalar invariance model. Based on the metric invariance, the intergroup equivalence restriction is applied to the regression intercept between each measurement item and its representative factor (latent factor) to test whether there is invariance in the intercept of the measurement item across groups. If a mild test strategy is adopted at this stage, the measurement model is said to be invariant [53].

The fourth model analyzed is the strict invariance model. Based on the scalar invariance, an intergroup equivalent restriction is applied to the variances of each measurement error to test whether there is cross-group invariance in the variance of the measurement error.

Figure 2 summarizes the sequence of these invariance tests.

3. Statistical Hypothesis

Our statistical test model of invariance is shown in Figure 3. It is based on Siegrist’s causal model. In our model, perceived risk has a negative and direct impact on public acceptance; perceived benefit has a positive and direct impact on public acceptance of GMF, and perceived benefit has a negative and direct impact on perceived risk. Trust indirectly affects public acceptance of GMF through perceived benefits and perceived risks; trust has a positive impact on perceived benefits and a negative impact on perceived risks [18]. Since the objective of this study is to examine the measurement equivalence of the causal model, the focus of the model is to determine whether gender and individual knowledge level affect the causal model.

Based on Section 2, we propose the following statistical hypotheses for the invariance analysis of the model; it is shown in Table 1.

4. Research Method

4.1. Measurement

Knowledge was measured using eight true/false statements (see Table 2). These statements covered areas of knowledge about gene technology. Content validity was established by having three independent experts in the field of biology and genetics review the questions. The goal was to assess the respondents’ knowledge of biology objectively. The response options were “true” and “false.”

The measurement scale used in this study comprised of four constructs and 15 items, which are based on several scales used in relevant studies (see Table 3 for specific studies) that demonstrate high reliability and validity. The idiosyncrasies of the Chinese language and culture were considered throughout the translation process, so minor modifications were made in the wording to suit these idiosyncrasies. The subjects were asked to indicate their agreement or disagreement with the statements provided, using a seven-point Likert scale (1 = strongly disagree and 7 = strongly agree). Table 3 shows the detailed scale items of the construct variables.

To assess perceived risks of GMF, we asked the respondents to indicate their agreement with the four items developed by Sjöberg, Chen, Ghoochani et al., and Sjöberg et al. [5457]. Two items reflect the possible harmful effect of GMF on human health, and the other two reflect its possible harmful effect on the environment. Examples of the items are “Eating genetically modified food will lead to infertility” and “The production of genetically modified food will destroy the diversity of animals and plants.”

Regarding social trust, although previous studies have shown that this aspect comprises multiple components, Lang and Hallman showed that these components are highly correlated and converge on a common factor [58]. Therefore, the public’s social trust of different objects can be measured holistically. Based on this argument, in this study, social trust was measured as the public’s trust in various institutions [18, 59]. Specifically, the participants were asked, “How much trust do you have in the following institutions: (1) regulatory agencies, (2) agricultural corporations, and (3) public research institutions in the GMF domain?” Participants had to indicate their level of trust on a 7-point scale, ranging from “no trust at all” (1) to “a very high level of trust” (7).

4.2. Sample

The data were collected through self-reported, structured questionnaires. The questionnaire was developed in Chinese and submitted to a panel of five experts at one of the key universities in Central China to evaluate the validity of the content. Two of these experts work at the Department of Biology and the rest work at the School of Public Management in this university. The panel approved both the initial list and question format and suggested revisions to clarify certain questions for the general public to fully understand and respond accordingly. Prior to the formal survey, a pilot test was conducted. During the pilot test, respondents were asked whether they could clearly understand the questions and felt comfortable answering them. Based on their feedback, changes were made about the wording, expressions, and grammar to improve the questionnaire’s clarity, accuracy, flow, and validity. In the pilot test, 50 individual participants, 20 undergraduates and 30 ordinary people, that were randomly selected to represent the public were interviewed individually.

The questionnaire was comprised of four parts. The first section screened the question, “Have you heard of genetically modified food?” The respondent did not need to continue if his or her answer was “no.” The second section requested sociodemographic information, including gender, age, educational background, income, and knowledge about gene technology. The third section focused on the public’s acceptance of GMF. The final section inquired about perceived risks or benefits of GMF and social trust in different objects.

The survey used stratified sampling. First, to account for geographical differences and maximize representativeness, eight provinces from the east (Zhejiang), south (Guangdong), west (Sichuan, Xizang, and Xinjiang), north (Hebei), northeast (Jilin), and central (Hubei) regions of China were selected. Two high-income and two lower-income counties were randomly selected from each province, resulting in 32 counties. Then, four to six city communities or villages were randomly selected from each county, resulting in 150 city communities or villages. Finally, seven to ten households were randomly approached in each of these city communities or villages, resulting in a total sample of 1200 observations. In June 2019, through public recruitment, 100 university students were recruited as interviewers (3-4 interviewers per county) from Central China Normal University. The students were selected according to their home addresses, which had to be located in the 32 selected counties. The student interviewers then conducted face-to-face interviews from July to September 2019.

A total of 1200 paper questionnaires were distributed, with 1168 recovery and 1091 valid questionnaires after eliminating those with clerical errors or contradictions. The effective recovery rate was 93.41%.

4.3. Data Analysis

The data were analyzed using Statistical Package for the Social Sciences (SPSS) 24 and AMOS 24. The analysis was comprised of three steps, which are measurement model analysis for the full sample (full sample analysis), measurement model analysis for subgroups (subgroup analysis), and measurement invariance analysis. The aim of the full sample and subgroup analyses is to test whether the data support the measurement model. If the data do not support the model, it would be meaningless to conduct a subsequent measurement invariance analysis.

The full sample and subgroup analyses have the same idea. The method used to conduct the analysis is confirmatory factor analysis (CFA). Several fit indices, such as normed measure (i.e., the ratio of the divided by its degrees of freedom), the comparative fit index (CFI), the nonnormed fit index (NNFI), and the root mean square error of approximation (RMSEA), were used to evaluate the model fit. When the CFI and NNFI values are greater than 0.90 and the RMSEA value is less than or equal to 0.08, it is considered adequate for model fit [61]. The normed is used to identify two types of inappropriate models. First, values that are less than 1.0 indicate an “overfitted” model [62], and, second, values of more than 2.0 or a more liberal limit of 5.0 indicate that the model does not fit the observed data and has to be improved [63]. CFA was used to evaluate the standard factor load, internal consistency, convergent validity, and discriminant validity. The evaluation indices are as follows. The completely standardized item-factor loadings (≥0.60) and internal consistency of the constructs were measured using the composite reliability (CR ≥ 0.70) [64]. Convergent validity was evaluated with the average variance extracted (AVE ≥ 0.50) [64], whereas a discriminant validity was established when the AVE for each construct exceeds the squared correlations between that and another construct [65].

A multigroup CFA (MGCFA) was conducted to test measurement invariance [53]. Based on the principles of the measurement invariance test discussed in Section 2, the MGCFA was conducted to estimate and calculate the series of fit indices of each model (M0 to M3 in Table 1), and the fit indices of each model were compared with those of the reference model. Comparative analysis of the three indicators, CFI, and RMSEA, was conducted to determine whether invariance exists. According to the suggestion made in the study of Cheung and Rensfold (2002), invariance is established when is not significant () or CFI < 0.01 [44]. In addition, according to the suggestion of Hu and Bentler (1999), measurement invariance can be assumed when the point estimates of RMSEA are very close and the confidence intervals of RMSEA have large overlaps [66]. Since and are equally influenced by sample size and distribution pattern, Cheung and Rensfold (2002) suggested that when and ΔCFI tests are not consistent, the ΔCFI test results should be used as the basis to judge whether there is measurement invariance [44].

5. Results

5.1. Respondent Profiles

Table 4 lists the descriptive statistics of the data. The sample comprises 1091 individuals, with a mean age (standard deviation) of 32.93 (14.31) years. The self-reported knowledge of these 1091 respondents about gene technology ranges from 1 to 8. The mean of the knowledge score is 5.74, and the standard deviation is 1.61. The level of knowledge of 241 (22.1%) respondents is less than 5; they are categorized as the “low knowledge level group” (LK). The level of knowledge of 429 (39.3%) respondents is higher than 6, and they are categorized as the “high knowledge level group” (HK). The distribution of gender is roughly balanced, with 608 (55.7%) of the respondents being female.

The sample does not originate from strict random sampling, so the representativeness of the sample was evaluated. A test was conducted to ensure that the sample in this study is representative of the entire population. Table 4 presents the characteristics of the sample and the results of the test, which indicate that the sample roughly represents the Chinese population ().

5.2. Full Sample Analysis

Before conducting the invariance analysis, we examined the model fit of the data and parameter estimates for the entire sample (n = 1091).

The hypothesized measurement model had a chi-square of 340.189 with 84 degrees of freedom. The subjective fit indices indicate an adequate model-data fit of 0.052, 0.953, and 0.962 for RMSEA, NNFI, and CFI, respectively. These results indicate that the model is appropriate, a proper solution was obtained, and the solution fit the entire sample adequately.

As shown in Table 5, the results of the CFA show that the standardized factor loadings range from 0.663 to 0.877 and are significant (), which is more than the cut-off point of 0.60. All CR values range from 0.863 to 0.885, indicating acceptable levels of reliability of the constructs, since they are greater than the recommended 0.70 threshold. Moreover, all AVE values, which range from 0.613 to 0.687, are equal to or greater than the 0.50 standard for convergent validity, indicating acceptable levels of convergent validity of the constructs.

Table 6 lists additional descriptive statistics (i.e., mean and standard deviations) and the correlation matrix; the correlations among the constructs and the square root of the AVE are on the diagonal. The four diagonal elements of the latent variables are larger than their corresponding correlation coefficients, indicating that the metrics have appropriate discriminant validity.

5.3. Subgroup Analysis

Since the model fit the data adequately in the overall sample, we analyzed the model-data fit for each group.

The goodness-of-fit statistics for the single-sample CFA models are shown in Table 7. In all cases, the hypothesized models approached or exceeded the more stringent cut-off value for a well-fitting model, suggesting that the hypothesized model adequately accounts for the covariance matrices of the data of four samples, male, female, LK, and HK.

The results of the CFA for each country (see Table 8) show that the standardized factor loadings exceed the recommended minimum threshold of 0.60, ranging from 0.696 to 0.875 for males, from 0.640 to 0.899 for females, from 0.715 to 0.861 for LK, and from 0.615 to 0.883 for HK and were significant ().

As shown in Table 8, the CR values exceed the recommended threshold of 0.70, ranging from 0.873 to 0.882 for male, from 0.854 to 0.887 for female, from 0.836 to 0.877 for LK, and from 0.853 to 0.899 for HK, indicating acceptable levels of reliability of the constructs. Moreover, all the AVE values are greater than the 0.50 standard for convergent validity, ranging from 0.639 to 0.696 for male, from 0.595 to 0.681 for female, from 0.604 to 0.640 LK, and from 0.595 to 0.700 HK, indicating acceptable levels of convergent validity of the constructs.

In addition, the discriminant validity of the measures is accepted, since the AVE of each construct is greater than the squared correlation of the construct and other constructs in the model. Table 9 lists additional descriptive statistics (i.e., mean and standard deviations) and the correlation matrix, with the correlations among constructs and the square root of the AVE on the diagonal. The four diagonal elements of the latent variables of each group are larger than their corresponding correlation coefficients, indicating that the metrics have appropriate discriminant validity.

Based on the above CFA analysis, it can be concluded that the data support the measurement model of each subsample. The measurement models were replicable in each sample.

5.4. Invariance Analyses

Since the measurement models were replicable in each sample, we conducted a series of multisample structural equation models to identify any noninvariance in the measurement parameters across the gender and knowledge variables, respectively. We followed the invariance test process presented in Sections 2 and 3.

5.4.1. Male versus Female

The initial step was to test a model across the male and female groups simultaneously without imposing any equality constraints. The purpose of this step is to establish a baseline model to subsequently test the increasingly restrictive nested models. As shown in Table 10, the baseline model (M0) produced a good fit with the data. The result suggests that configural invariance is present in the gender groups; that is, males and females used the same pattern in measuring the items.

We then estimated a nested model that constrains the factor loadings to be invariant across the two samples. The invariance of the factor loadings is considered the minimum acceptable criterion for measurement invariance [67]. The analysis shows that the model exhibits a good fit with the data (Table 10, M1). According to the results in Table 11 (M0 versus M1), the change in () with is not significant, and the fit statistics of the two models are also quite comparable (, ), justifying the presence of metric invariance.

After the validation of the metric invariance, a scalar invariance test was conducted to ensure that the regression intercepts between each measurement item and its representative factor are noninvariant. The chi-square difference test between M1 and M2 ( = 19.112,  = 15, ) is not significant, justifying the scalar invariance in the gender.

Based on the scalar invariance, a strict invariance test was conducted to ensure that the error terms of the two subgroups are noninvariant. The chi-square difference test between M2 and M3 ( = 26.337,  = 15, ) is significant, indicating that the restricted model failed the test of strict invariance in the groups. However, the difference in the CFI between M2 and M3 is only 0.002 (see Table 11), indicating invariance. The point estimates of the RMSEA and RMSEA confidence intervals are almost the same in M2 (RMSEA = 0.038; 90% CI = 0.034–0.042) and M3 (RMSEA = 0.037; 90% CI = 0.034–0.041) (see Table 10), indicating invariance. Therefore, due to the small differences in the above goodness-of-fit indices, strict invariance between the unconstrained and constrained models is assumed.

5.4.2. LK versus HK

The initial step is to test a model across the LK and HK groups simultaneously without imposing any equality constraints. As shown in Table 12, the baseline model (M0) produced a good fit with the data. The result suggests that configural invariance exists in both knowledge level groups. LK and HK respondents used the same pattern in measuring the items.

The chi-square difference between M0 and M1 is significant ( = 42.704,  = 11, ) (see Table 13), whereas the difference in the CFI between M0 and M1 is only 0.008 (see Table 13), indicating invariance. The point estimates of the RMSEA and RMSEA confidence intervals are almost the same in M0 (RMSEA = 0.045; 90% CI = 0.039–0.050) and M1 (RMSEA = 0.046; 90% CI = 0.041–0.052). Therefore, since the differences in the goodness-of-fit indices are small, it can be assumed that metric invariance exists.

The next step (model 2) is to assess the scalar invariance. The chi-square difference between M1 and M2 is significant ( = 139.700,  = 15, ), whereas the difference in the CFI between M1 and M2 is 0.03, which is greater than 0.01. These results reveal a substantial decrease in the fit indices relative to M1, meaning that there is no invariance between the two groups in the regression intercepts between each measurement item and its representative factor.

According to the general invariance analysis procedure, since there is no invariance in M2, it is not necessary to conduct the next test of strict invariance; that is, it is not necessary to conduct the M3 estimation.

Overall, the causal model only shows metric invariance but not scalar and strict invariance between the HK and LK groups.

6. Discussion

The quality of the measurement used in research determines the credibility of its conclusions. Measurement invariance is a logical prerequisite in evaluating substantive hypotheses about differences in a group, whether the comparison is as simple as a between-group mean differences test or as complex as testing whether some theoretical structural model is invariant across groups. Although the importance of measurement invariance is self-evident, only a few studies have focused on it in the field of public acceptance of GMF.

In this study, we use data from China to analyze the invariance of the causal model, which is widely used in the field. Based on a series of invariance analyses, we conclude that the model is configural invariance, metric invariance, scalar invariance, and strict invariance across gender. However, regarding knowledge level, it only has configural and metric invariances. This finding suggests that, generally, the male and female groups and the LK and HK groups conceptualize the causal model constructs (SOT, PEB, PER, and ACC) in the same way. This is consistent with the conclusion of Siegrist.

In the model, scalar invariance is not present between the HK and LK groups, which shows that the two groups have different starting points in the test score. Because the intercept does not affect the relative comparison results of the test score, this noninvariance has little effect on the evaluation of substantive hypotheses about the differences in the groups.

In addition, regarding strict invariance between the HK and LK groups, we find that the model construct is noninvariant. Because the measurement error directly affects the reliability of the measurement, this noninvariance is probably due to the different understanding of the words and questions used in the scales between the two groups. There are some key implications of this finding of noninvariance for researchers. They need to revisit the questionable items and evaluate the wording, semantics, and structure of each question to ensure improvements. However, researchers must also be aware of the fact that developing a questionnaire free of misconceptions for all different sample subgroups is almost impossible. Therefore, researchers should consider and validate measurement invariance across a sample population when designing their survey instrument.

There are also some limitations to this study. First, the intertemporal invariance is an important feature of measurement tools. Owing to limited cross-sectional data, the intertemporal invariance is not discussed. Finally, in the invariance analysis, cross-cultural invariance is often an important topic of concern, but, due to limited data, this study did not explore it. Future research can be deepened in these three aspects to make the measurement tools more reliable.

Data Availability

Data are available upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was funded by Central China Normal University research funding (grant no. 2017980039).