Comparative Analysis of Some Structural Equation Model Estimation Methods with Application to Coronary Heart Disease Risk
This study compared a ridge maximum likelihood estimator to Yuan and Chan (2008) ridge maximum likelihood, maximum likelihood, unweighted least squares, generalized least squares, and asymptotic distribution-free estimators in fitting six models that show relationships in some noncommunicable diseases. Uncontrolled hypertension has been shown to be a leading cause of coronary heart disease, kidney dysfunction, and other negative health outcomes. It poses equal danger when asymptomatic and undetected. Research has also shown that it tends to coexist with diabetes mellitus (DM), with the presence of DM doubling the risk of hypertension. The study assessed the effect of obesity, type II diabetes, and hypertension on coronary risk and also the existence of converse relationship with structural equation modelling (SEM). The results showed that the two ridge estimators did better than other estimators. Nonconvergence occurred for most of the models for asymptotic distribution-free estimator and unweighted least squares estimator whilst generalized least squares estimator had one nonconvergence of results. Other estimators provided competing outputs, but unweighted least squares estimator reported unreliable parameter estimates such as large chi-square test statistic and root mean square error of approximation for Model 3. The maximum likelihood family of estimators did better than others like asymptotic distribution-free estimator in terms of overall model fit and parameter estimation. Also, the study found that increase in obesity could result in a significant increase in both hypertension and coronary risk. Diastolic blood pressure and diabetes have significant converse effects on each other. This implies those who are hypertensive can develop diabetes and vice versa.
Structural equation modelling (SEM) reduces several manifest variables to few related latent factors by explaining the covariance structure in the observed manifest variables using a combination of confirmatory factor analysis and path modelling in which the manifest relationships are hypothesized . Comparative analysis of SEM estimation methods in applications is not commonplace because methods such as the traditional maximum likelihood estimator (MLE) and generalized least squares estimator (GLSE) have proven robust and high performing . However, they are constrained by normality assumption unlike asymptotic distribution-free estimator (ADFE) and unweighted least squares estimator (ULSE) which do not require normality assumption, whilst ADFE is considered robust . The MLE was shown to perform better than ADFE, GLSE, and ULSE although ADFE was developed as a robust estimator [4, 5]. In preventing matrix singularity leading to nonconvergence due to small sample sizes in SEM, the ridge maximum likelihood estimators have shown to perform better than MLE . Yuan and Chan  developed the ridge estimator which adds the ratio of the number of manifest variables and sample size to the diagonals of the covariance matrix. This method did better than the traditional MLE but lacks much information from data. A follow-up to this estimator is a proposed ridge maximum likelihood estimator in a companion paper, which includes much information from the data by adding a constant to the diagonals of the covariance matrix. These estimators will be compared based on fit indices from a modelled real-life data on the relationship between hypertension, diabetes, obesity, and coronary risk.
The study also assessed the effect of obesity, type II diabetes, and hypertension on coronary risk and the converse effect of diabetes on hypertension. Hypertension is a silent killer due to its ability to cause heart failure, stroke, kidney dysfunction, and others without showing symptoms . Hypertension and diabetes studies showed that they frequently coexist, and people with diabetes are twice higher at risk of hypertension than those without it, and both conditions have similar causes [8, 9]. Most studies showed that hypertension results in diabetes, but limited studies considered the converse effect. Obesity is the outcome when a person accumulates extra weights greater or equal to 20% of total body fat , which could be harmful to the person’s well-being. Obesity or body fat measures such as body mass index (BMI), waist circumference (WC), and waist-to-hip ratio (WHR) were proven risk factors of type II diabetes and cardiovascular diseases [11–13]. BMI is usually used as the major measure of obesity and overweight. However, according to Jacobsen and Aars , WC, as abdominal obesity measure, could be used to measure obesity as it is able to give more information regarding diseases that result from excess weight . In a study by Dagan et al. , it was stated that although BMI is commonly used, it does not reflect the body shape, and in addition, even though both measures were endorsed by the American Heart Association, BMI is still mostly used for adiposity. According to Kurniawan et al. , visceral fat (VF), body fat percentage, WC, BMI, and body weight measures have been used in measuring obesity. Using these measures could help to measure obesity as a theoretical variable and could be studied to better assess obesity in humans.
The SEM approach involving model representation and estimation methods of model parameters are presented in Section 2. Section 3 discusses the theoretical background to the relationships in the dataset and results of SEM application to the data whilst Section 4 presents the results and discussion and Section 5 concludes the study with a recommendation for policy implementation.
2.1. Structural Equation Model
The SEM has a structural part represented in equation (1) as follows:
The manifest variables are used to measure the exogenous and endogenous theoretical variables in equation (1) which are represented, respectively, in measurement models 2 and 3 as follows:where contains the endogenous latent variables, contains the exogenous latent variables, contains the coefficients of variables, contains the random disturbances or errors associated with the structural model, is the matrix of the coefficients of exogenous latent variables, and are the random errors associated with the measurement models for determining, respectively, endogenous and exogenous latent variables, and and are the independent and dependent manifest variables .
2.2. Estimation of Model Parameters
Estimating the model parameters in equations (1)–(3), we seek to minimize using a function [3, 18]. Using the MLE by Jöreskog , we havewhere is the number of manifest variables in the structural equation model, and are sample and implied covariance matrices, respectively . Equation (4) is nonlinear, so iterative processes are employed in the minimization . The MLE performs well when data follow normal distribution, but breaks down with varying degrees of outliers. Nonconvergence of results does occur with MLE when the sample size is small. Yuan and Chan  proposed a ridge maximum likelihood estimator which models instead of , where is a constant derived as (). Since does not take much information from data , another constant was suggested in . Therefore, instead of modelling in , was modelled in , where , , and are the eigenvalues with mean .
The other estimators, namely, ULSE, GLSE, and ADFE, are computed by equations (5)–(7). The ULSE is not based on the normality assumption but requires similar scales of measurement for all manifest variables. The ULSE is computed by the following equation:whilst GLSE is given bywhere is the trace of the matrix. The GLSE computes the discrepancy function by minimizing the weighted difference of the sample covariance matrix and model implied covariance matrix , using the same assumptions underlying the MLE.
The ADFE which is in the family of weighted least squares (WLS) was proposed by Browne  to resist the effect of nonnormality in data for covariance structure models. It is computed by the following equation:where and are the column vectors of the nonduplicate elements of sample and implied covariance matrices, respectively, and is a positive definite matrix with size . The lavaan package in R was used to produce the results of the model fit and path coefficients whilst the DiagrammeR package was used to report the path diagrams.
2.3. Model Adequacy Test
The SEM, unlike the linear models, adopts fit indices . In this study, we considered the absolute, relative, and parsimonious fit indices.
The absolute fit indices are used in omnibus test, which is usually undertaken in SEM to test whether or not, where is the covariance matrix for the population which is estimated using sample covariance matrix . This test is distributed as , and nonsignificance of this test implies the discrepancy between these two covariance matrices is not significant. The chi-square test tests the hypothesis: versus that all the residuals are zero with a test statisticwhich follows the chi-square distribution with degrees of freedom, , where is the number of parameters to be estimated. The chi-square statistic with a large sample size rejects the null hypothesis and the test statistic from a small sample size lacks power, and as a result, the relative chi-square was developed by Satorra and Bentler  such that .
The goodness-of-fit index (GFI) assesses the amount of variance and covariance in the sample variance matrix that is predicted by the , which is affected by sample size. The GFI is computed by the following equation:which usually falls between 0 and 1, but becomes desirable if it is at least 0.95 .
The adjusted goodness of fit (AGFI) by Jöreskog and Sörbom  adjusts the for model complexity with degrees of freedom. Like the GFI, the AGFI falls within 0 and 1 and also sensitive to the sample size . It is calculated by the following equation:
The root mean square residual (RMR) by Jöreskog and Sörbom  is the square root of the average residual between the elements of sample covariance and predicted covariance matrix . The RMR is computed by the following equation:where , which is the total number of exogenous and endogenous variables. Generally, RMR assumes values from 0 to 1 , but is more preferred. When there are differences in the scales of measurement for the observed variables, it makes it difficult to interpret, and hence, standardized root mean square residual (SRMR) was developed for easier and meaningful interpretation [28, 29]. The SRMR is computed by the following equation:where and are the elements of the covariance matrix of the sample data and implied covariance matrix, respectively. The SRMR takes values from 0 to 1, and the lower the value of SRMR, the better.
The root mean square error approximation (RMSEA) by Steiger and Lind  is among the fit indices which are used to assess the fitness of model data and are classified as badness-of-fit indices. The RMSEA is computed by the following equation:
Several studies consider a model as close fit if , an average fit if , neither good nor bad fit if , and poor fit if .
The relative fit indices usually compare the chi-square statistic for the hypothesized model with the baseline model . The values for “normed” or scaled fit indices should fall between 0 and 1 inclusive. However, sometimes the nonnormed fit indices assume values less than 0 or more than 1. A recently agreed cutoff point for a good model fit based on relative fit indices is a value greater or equal to 0.95 .
The comparative fit index (CFI) by Bentler  is used when comparing hypothesized and baseline models and is computed by the following equation:where and are the chi-square test statistics of the baseline and the hypothesized models, respectively, with corresponding degrees of freedom and . The value is between 0 and 1, which is less affected by sample size and has an acceptable value of greater than or equal to 0.95 .
In rescaling the chi-square into 0 (no fit) and 1 (exact fit), the normed fit index (NFI) by Bentler and Bonett  is used and computed by the following equation:where the fitting function value of the baseline model is given as and that of the hypothesized model is . This fit index responds to sample size, and acceptable value should be greater or equal to 0.95 .
The Tucker and Lewis fit index (TLI) , also known as nonnormed fit index (NNFI) , was developed against or to reduce the effect of sample size, but it sometimes reports values not within 0 and 1 inclusive. The TLI is computed by the following equation:
The McDonald and Marsh  and Bentler  proposed the relative noncentrality index (RNI) for assessing model fit which is less affected by sample size but not bounded by 0 and 1. is computed by the following equation:
The Bollen incremental fit index (IFI) is one of the fit indices which are not affected by the sample size . Many studies showed that some of the fit indices are influenced by sample sizes in such a way that the larger sample sizes appear to lead to fit indices with larger values [28, 31]. IFI is computed by the following equation:
To penalize for model complexity, the normed and goodness-of-fit indices were adjusted for loss of degrees of freedom for estimation of more parameters. As a result, Mulaik et al.  and James et al.  developed parsimonious normed fit index and adjusted goodness-of-fit index, respectively. These fit indices usually assume values close to 0.5 . The parsimonious normed fit index (PNFI) is given by the following equation:whilst the parsimonious goodness-of-fit index (PGFI) is computed by the following equation:
3.1. Study Data
In order to compare the estimators, we used the diastolic and systolic blood pressure to measure a theoretical variable called hypertension . Obesity and age are risk factors for hypertension and diabetes which are cardiovascular disease risk factors [10, 40]. These conditions contribute the highest to illness, disability, and mortality. Mortality due to cardiovascular diseases is very common. Coronary risk is an index used to assess the risk of heart disease. The direct effects of age, obesity, hypertension, fasting blood sugar, and postprandial glucose were assessed on the coronary risk (see the conceptual models in Figure 1). Because diastolic and systolic blood pressures contribute differently to hypertension, four additional models were fitted measuring hypertension with manifest variables with the focus of the converse relationship between hypertension and diabetes.
We used a real-life dataset collected and used by Lokpo et al. , from the Ghana Prison Service, Ho in Volta Region of Ghana after the Research Ethics Committee (REC) of the University of Health and Allied Sciences gave a clearance (“ERC/UHAS-REC A.4  18-19”). Other ethical considerations were followed including informed consent. Three variables, fasting blood sugar, postprandial glucose, and coronary risk in Table 1 had few missing values and they were imputed using their respective medians. The study data accounted for sampling adequacy using the Kaiser–Meyer–Olkin test. It reported a value of 0.7786 which implies the dataset had a good sampling adequacy and could be used for factor analysis. Also Bartlett’s test showed that the variables are correlated ( value ). The dataset deviated slightly from multivariate normality; however, most of the estimators in this study can handle nonnormality to some level.
4. Results and Discussion
In order to compare the ridge maximum likelihood estimator with other estimators using the above real-life data, the SEM as presented in the previous section was applied to the required manifest variables to establish the effect of obesity, diabetes, and hypertension on coronary disease using a sample data of size 113. The models also assessed the effects of obesity and diabetes on hypertension as well as obesity and hypertension on diabetes. Table 2 shows the correlation matrix of the data, and the model data fit indicators are reported in Table 3. The path coefficients and types of effects are reported in Tables 4–9 while path model results are presented in Figures 2–7. In all, six models were fitted. Model 1 contains latent hypertension as a risk factor for type II diabetes and Model 2 models the effect of type II diabetes on latent hypertension. Model 3 models the effect of manifest diastolic blood pressure on type II diabetes and Model 4 looks at the effects of manifest systolic blood pressure on type II diabetes. Lastly, Model 5 models the effect of type II diabetes on manifest diastolic blood pressure whilst Model 6 accounts for the effect of type II diabetes on manifest systolic blood pressure. These relationships are hypothesized based on the literature [12, 42, 43].
The results as presented in Table 3 indicate that the MLE and ridge methods converged successfully for all models except GLSE which did not converge for Model 4. The ULSE converged for only the third model with high chi-square test statistic value and unreliable results whilst the ADFE reported converged hypothesized model for Model 6. and reported the best and similar fit indices for all models. The SEM results reported a nonsignificant chi-square test statistic for all methods of estimation except ULSE which reported the highest chi-square value with unknown value. The relative chi-square is also less than and falls within the accepted interval. Also, the reported nonsignificant values for all methods that converged successfully, implying a good model. All the estimation methods reported good and acceptable except the ADFE which reported a poor of 0.095. Other fit indices which also showed that the hypothesized models were fitted appropriately include the absolute fit index: , relative fit indices: , and parsimonious fit indices: , , and . Generally, the ridge maximum likelihood estimators performed better than other estimators. The and reported the least SRMR and relative chi-square values of 0.036 and 1.192, respectively, followed by MLE, before GLSE whilst the results of ULSE are unreliable (see Model 3 of Table 3). Moreover, the other measures which show good models by reporting higher values, and again did better generally. They reported the highest CFI, TLI, RNI, GFI, NFI, BFI, and AGFI values of 0.993, 0.985, 0.993, 0.965, 0.961, 0.994, and 0.902, respectively, for Model 3. These results show that and reported the best fit indices which are used to assess the model data fit for SEM approach. The RNI and CFI fit indices were proved to be the same sometimes due to some algebraic conditions they shared in common . In this study, they were the same for all estimators.
The ADFE and ULSE are not good estimators for the hypothesized models in this study, whilst GLSE reported fit indices which do not fall within the acceptable ranges, especially CFI, TLI, RNI, and NFI for all models except Model 3. The obesity latent exogenous variable was determined using manifest variables: BMI, WC, hip circumference (HC), and VF. All the manifest variables for measuring obesity fitted correctly and were significantly positively related to the theoretical exogenous variable (obesity). The latent hypertension was also significantly determined using diastolic and systolic blood pressure. The measurement of the latent hypertension agrees with the results of Yousefi et al. , where systolic blood pressure contributed more than the diastolic blood pressure but disagrees with that of Broström et al. .
Some of the hypothesized paths were significant. The theoretical hypertension in this study had no effect on diabetes (Figure 2 and Table 4). The exogenous latent obesity has the main effect of 0.215 on the endogenous latent hypertension. The exogenous latent obesity also showed a positive relationship with coronary risk with significant direct and nonsignificant indirect effect of 0.218 and 0.022, respectively. The direct effect of exogenous latent obesity on postprandial glucose is 0.08 (Figure 2). The relationship between latent blood pressure and latent obesity in this study (both Model 1: Table 4 and Figure 2, and Model 2: Table 5 and Figure 3) is consistent with the results of Yousefi et al. . They reported that obesity had a strong positive effect on latent hypertension which implies that reducing weight will reduce hypertension.
From the first model (Figure 2), the blood pressure measured as a theoretical variable had no effect on type II diabetes. Like the first model, the second model (Figure 3) shows that type II diabetes also do not have a significant direct or indirect effect on hypertension endogenous latent variable for this study (Figure 3 and Table 5). However, when blood pressure was measured with manifest diastolic blood pressure in Model 3 (Figure 4 and Table 6), the blood pressure has a significant effect on type II diabetes. The manifest systolic blood pressure has no significant effect on type II diabetes (Figure 5 and Table 7). Moreover, it revealed that obesity affects systolic and diastolic blood pressure, which means reducing obesity could reduce both hypertension and type II diabetes. Also, in Model 5 (Figure 6 and Table 8), type II diabetes has a significant effect on diastolic blood pressure. This implies that an increase in type II diabetes could lead to high diastolic blood pressure. Type II diabetes does not have effects on systolic blood pressure whilst obesity shows a significant effect on systolic blood pressure as in Model 6 (Figure 7 and Table 9). Obesity has a positive effect on coronary risk in all models. Generally, age has positive total effects on obesity, hypertension, and coronary risk.
This study compared a ridge estimator with others in modelling the relationship between obesity, type II diabetes, and hypertension on coronary risk controlling for age, as well as the converse effects of type II diabetes on hypertension. The and did better than other estimators like the maximum likelihood, generalized least squares, unweighted least squares, and asymptotic distribution-free estimators. The ULSE and ADFE reported nonconverged and unreliable model coefficients. Aside the results of the ULSE and ADFE, the other estimators reported very similar model coefficients. The obesity latent exogenous variable was significantly measured using waist circumference, hip circumference, body mass index, and visceral fat. All obesity manifest measures are positively related to the endogenous variable. Also the diastolic and systolic blood pressure are significant determinants of blood pressure. The study found that increase in obesity could result in a significant increase in both hypertension and coronary risk. Type II diabetes has a converse effect on blood pressure. Latent blood pressure and obesity do not have significant relationships with diabetes. However, type II diabetes has a significant positive effect on manifest diastolic blood pressure. The manifest diastolic blood also has a significant effect on type II diabetes. Therefore, this calls for holistic public healthcare policies to reduce both conditions under noncommunicable diseases as reducing one at a time may not be effective.
In order to make an inference concerning the relationships between obesity, diabetes, hypertension, and coronary risk to the population of Ghana, a large dataset covering the whole country is needed. However, the dataset used for the empirical analysis in this study covered one of the regions of the country. Hence, the study is unable to generalize these findings to the whole country. Also, although in assessing the direct and indirect effects of obesity on diabetes, hypertension, and coronary risk we control for age, there may be other confounding variables which were not measured for this study, and hence, further study is required.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
All authors contributed to this manuscript and approved it for submission. DA, AOA, and SKA conceived the problem and the research design as well as data collection and analysis. All the authors assisted in drafting and reviewing the final manuscript.
The authors are grateful to Mr. Daniel Mensah, a lecturer in the Department of Dietetics of the University of Health and Allied Sciences, for making the data available for this study.
J. J. Hox and T. Bechger, “An introduction to structural equation modeling,” Family Science Review, vol. 11, pp. 354–373, 1998.View at: Google Scholar
K. A. Bollen, Structural Equations with Latent Variables, John Wiley Sons, Inc, Hoboken, NJ, USA, 1989.
U. H. Olsson, T. Foss, S. V. Troye, and R. D. Howell, “The performance of ML, GLS, and WLS estimation in structural equation modeling under conditions of misspecification and nonnormality,” Structural Equation Modeling: A Multidisciplinary Journal, vol. 7, no. 4, pp. 557–595, 2000.View at: Publisher Site | Google Scholar
WHO, “A global brief on hypertension: silent killer, global public health crisis,” Tech. Rep., WHO, Geneva, Switzerland, 2013, Technical report.View at: Google Scholar
A. Bener, M. T. Yousafzai, S. Darwish, A. O. Al-Hamaq, E. A. Nasralla, and M. Abdul-Ghani, “Obesity index that better predict metabolic syndrome: body mass index, waist circumference, waist hip ratio, or waist height ratio,” Journal of Obesity, vol. 2013, Article ID 269038, 9 pages, 2013.View at: Publisher Site | Google Scholar
R. Yousefi, M. Ghayour Mobarhan, H. Esmaily, A. Saki, G. A. Anthony Ferns, and M. Tayefi, “Identifying factors associated with hypertension using structural equation modeling: a population-based study,” Iranian Rehabilitation Journal, vol. 16, no. 3, pp. 307–316, 2018.View at: Publisher Site | Google Scholar
A. Roman-Urrestarazu, F. M. H. Ali, H. Reka, M. J. Renwick, G. D. Roman, and E. Mossialos, “Structural equation model for estimating risk factors in type 2 diabetes mellitus in a middle eastern setting: evidence from the steps Qatar,” BMJ Open Diabetes Research and Care, vol. 4, no. 1, Article ID e000231, 2016.View at: Publisher Site | Google Scholar
L. Kurniawan, U. Bahrun, M. Hatta, and M. Arif, “Body mass, total body fat percentage, and visceral fat level predict insulin resistance better than waist circumference and body mass index in healthy young male adults in Indonesia,” Journal of Clinical Medicine, vol. 7, no. 5, p. 96, 2018.View at: Publisher Site | Google Scholar
M. Cui, “Effect-size index for evaluation of model-data fit in structural equation modeling,” Florida State University, Tallahassee, FL, USA, 2012, Master’s thesis.View at: Google Scholar
D. Suhr, “Step your way through path analysis,” in Proceedings of the Western Users of SAS Software Conference, Universal City, CA, USA, November 2008.View at: Google Scholar
A. Satorra and P. M. Bentler, Corrections to Test Statistics and Standard Errors in Covariance Structure Analysis, Sage Publications, Inc., Thousand Oaks, CA, USA, 1994.
K. G. Jöreskog and D. Sörbom, LISREL 5: Analysis of Linear Structural Relationships by Maximum Likelihood and Least Squares Methods, University of Uppsala, Uppsala, Sweden, 1981.
D. Hooper, J. Coughlan, and M. Mullen, “Structural equation modelling: guidelines for determining model fit structural equation modelling: guidelines for determining model fit,” Electronic Journal of Business Research Methods, vol. 6, pp. 53–60, 2008.View at: Google Scholar
J. H. Steiger and J. M. Lind, “Statistically based tests for the number of common factors,” in Proceedings of the Annual Meeting of the Psychometric Society, Iowa City, IA, USA, 1980.View at: Google Scholar
L. James, S. Mulaik, and J. Brett, Causal Analysis: Assumptions, Models, and Data, Sage Publications, Thousand Oaks, CA, USA, 1982.
S. Y. Lokpo, J. Osei-Yeboah, W. K. Owiredu et al., “Evaluation of dietary patterns and haematological profile of apparently healthy officers of the central prisons in the Ho municipality. a cross sectional study,” Scientific African, vol. 7, Article ID e00284, 2020.View at: Publisher Site | Google Scholar
A. Broström, O. Sunnergren, P. Johansson et al., “Symptom profile of undiagnosed obstructive sleep apnoea in hypertensive outpatients in primary care: a structural equation model analysis,” Quality in Primary Care, vol. 20, no. 4, pp. 287–298, 2012.View at: Google Scholar