Table of Contents
Journal of Oral Diseases
Volume 2014, Article ID 823530, 7 pages
Research Article

Variable Selection Method in Prediction Models: Application in Periodontology

1Dental Public Health, Faculty of Odontology, University of Montpellier 1, 545 avenue du Pr. JL Viala, 34193 Montpellier Cedex 5, France
2Restorative Dentistry, Faculty of Odontology, University of Montpellier 1, 545 avenue du Pr. JL Viala, 34193 Montpellier Cedex 5, France
3Periodontology, Faculty of Odontology, University of Montpellier 1, 545 avenue du Pr. JL Viala, 34193 Montpellier Cedex 5, France
4Biostatistics, Faculty of Medicine, University of Montpellier 1, 545 avenue du Pr. JL Viala, 34193 Montpellier Cedex 5, France

Received 27 September 2013; Revised 16 December 2013; Accepted 23 December 2013; Published 4 February 2014

Academic Editor: Hideki Ohyama

Copyright © 2014 Paul Tramini et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


The aim of this study, applied in the field of periodontal diseases, was first to analyze the fatty acid levels in two groups of patients and then to propose a method for selecting the most relevant predictors. Two groups of patients, 29 with moderate or severe periodontitis and 27 who served as controls, were clinically examined, and their fatty acids in serum were measured by gas chromatography. The levels of these 12 fatty acids were the variables of the analysis. Logistic regression, together with the area under the receiver operating characteristic (ROC) curves, allowed determining a composite score which led to a subset of the most relevant covariables. The fatty acid levels differed significantly between the 2 groups in multivariate analysis () and the best logistic model was obtained with only 3 predictive variables: arachidonic acid, linoleic acid, and DHA. Fatty acid levels in serum of patients were significantly different according to the presence of moderate or severe periodontitis. By taking into account the comparison of ROC curves, our approach could optimize the choice of variables in multivariate analyses and could better fit it with diagnosis and prognosis of oral diseases in dental research.

1. Introduction

Periodontal disease is one of the most common causes of tooth loss among adults [1, 2]. It is characterized by chronic inflammation, periodontal pocket formation together with alveolar bone, and gum destruction. It is now admitted that bacteria alone are insufficient to cause such diseases and the host’s characteristics are indeed known to be determining as well: heredity, tobacco, systemic diseases, and nutrition [3, 4]. So the aetiological role of systemic factors is difficult to separate from that of local factors [5].

It has been demonstrated that the balance between polyunsaturated and saturated serum fatty acids (FA) played an important role in the bone remodelling of the skeleton [6, 7]. So alveolar bone and periodontal tissues can be affected in the same way. Because some polyunsaturated FA are the precursors of prostaglandins which are mediators of inflammation, higher levels are associated with inflamed tissues [8, 9]. Thus, fatty acid measurements could be correlated to periodontal disease, such as periodontitis and alveolar bone loss. Moreover, in this study, we investigated the FA concentrations in the serum of two groups of patients: with periodontitis and without (or with mild) periodontitis. The aim of this study, applied in the field of periodontal diseases, was first to analyze the FA levels in two groups of patients and then to propose a method for selecting the most relevant predictors among a set of quantitative variables in a prediction model. This new approach was developed in order to simplify and optimize the choice of predictors, among the numerous potential predictive symptoms and biological analyses already described in periodontal research.

2. Materials and Methods

2.1. Patients’ Recruitment

The subjects were recruited in the dental hospital of Montpellier (France), from September 2010 to June 2011. The inclusion criteria comprised age from 35 to 55 years old and at least the presence of 20 natural teeth per subject. The patients who agreed with this study were interviewed and their general health was ascertained. Exclusion criteria were smoking, chronic systemic disorders, or prescription of systemic antibiotics or anti-inflammatory or other systemic drugs, because they may influence the relationship between the FA levels and the dependent variable.

2.2. Clinical Examinations

Clinical examinations were performed by one calibrated dentist (P.G.) in the Department of Periodontology of the Dental Hospital of Montpellier. Probing pocket depth (PPD) and clinical attachment loss (CAL) were assessed at six sites per tooth with a probe PCPUNC157 (Hu-Friedy, Chicago, USA). PPD was measured as the distance from the gingival margin to the base of the gingival sulcus or periodontal pocket, and CAL was the distance from the cementoenamel junction to the base of the sulcus or periodontal pocket. The periodontal status was determined as proposed by Page and Eke [10]:(i)severe periodontitis: two or more interproximal sites with CAL ≥ 6 mm, not on the same tooth, and one or more interproximal sites with PPD ≥ 5 mm;(ii)moderate periodontitis: two or more interproximal sites with CAL ≥ 4 mm, not on the same tooth, or two or more interproximal sites with PPD ≥ 5 mm;(iii)no or mild periodontitis: neither moderate nor severe periodontitis.

After clinical examinations, two groups of patients were determined. The first group (group 1), was composed of 27 subjects who were diagnosed with mild or no periodontitis. The mean age was years (5 men and 22 women). There were 4 men and 18 women without periodontitis and one man and 4 women with mild periodontitis. The percentages of affected sites per subject were, respectively, 6.8% (2-3 mm), 0% (4–6 mm), and 0% (≥7 mm). The second group (group 2) comprised 29 subjects (6 men and 23 women) suffering from moderate or severe periodontitis, mean age: years. Moderate periodontitis was present in 4 men and 10 women, while severe periodontitis was present in 2 men and 13 women. The distribution of PPD was 36.2% (2-3 mm), 21.8% (4–6 mm), and 6.2% (≥7 mm). A randomly chosen sample of five subjects in each group (17% of the whole sample) was reexamined by the same dentist. The intraindividual reproducibility of CAL and PPD measurements was very good, with a Cohen kappa equal to 0.87 and 0.90, respectively. After written consent from all the subjects (groups 1 and 2) to use their blood samples and undergo clinical oral examinations, their FA were measured in serum by gas chromatography and expressed in grams per litre. This noninterventional study did not require approval by an ethics committee, since the blood samples (5 mL) were drawn from a database of patients consulting check-ups before oral surgery under general anaesthesia, or for regular check-ups in the department of preventive medicine. Blood samples were drawn after an overnight fast from the antecubital vein in nonheparinised test tubes and centrifuged at 15006 g for 30 min at 48°C. Triglycerides, cholesterol esters, and phospholipids were isolated by thin-layer chromatography. Hydrogen was the carrier gas. Results are given as percentage of total moles of FA. Seven unsaturated fatty acids and five saturated acids were measured in this study. So a total of 12 variables were used in the statistical analyses.

2.3. Univariate Analysis

Descriptive data were summarized as mean ± standard deviation (SD) and coefficient of variation (CV). According to the normality of the distribution, assessed with the Shapiro-Wilk test, comparison of fatty acid levels between the two groups was tested by univariate analyses (Student’s -test or Mann-Whitney test). Relative differences between group 1 and group 2, expressed as a percentage of the mean value among diseased patients, were also calculated in order to summarize the variation of FA concentration.

2.4. Multivariate Analysis

The relationship between the FA levels and the response variable (with/without periodontitis) was fitted using logistic regression after calibration by the Hosmer-Lemeshow goodness-of-fit test. Multicollinearity was tested by estimating the variance inflation factor (VIF). Every variable associated with a value below 0.20 in the univariate analysis was entered in the logistic model. A forward procedure was used to select the final multivariate model. FA effects are expressed as odds ratio with 95% confidence interval. Logistic regression is the most commonly used method to assess the relationship between one or more independent variables (the FA levels) and a binary response variable (with/without periodontitis).

The obtained logistic model allowed proposing a quantitative composite score. This score summarizes the effects of the FA levels and can be defined as a new predictor. Because the new predictor is linear combination of the fatty acid levels, it is a continuous variable. A standard approach to summarize the predictor performance was to examine all possible cutpoints. Each cutpoint yielded an estimated sensitivity and specificity. Sensitivity and specificity are, respectively, the chance that a true positive and a true negative will be identified as such by the predictor. A good predictor ought to have high values for both sensitivity and specificity [11]. The receiver operating characteristic (ROC) curve was then defined. The area under the ROC curve (AUC) was used as a global summary statistic of predictive accuracy. The optimal threshold cutpoint was determined from the ROC curve.

A cross-validation technique was used to assess how the results will generalize to an independent data set. One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset, called the training set, and validating the analysis on the other subset, called the validation set or testing set. To reduce variability, multiple rounds of cross-validation were performed using different partitions, and the validation results were averaged over the rounds. In this study, leave-one-out cross-validation was performed. This technique involves using a single observation from the original sample as the validation data, and the remaining observations as the training data. This was repeated such that each observation in the sample was used once as the validation data. Then, the performance (sensitivity and specificity) of the proposed prediction model was compared with those of ordinary logistic models, taking into account other combinations of FA, in particular the full logistic model including 12 covariables. value of less than 0.05 was considered significant. Statistical analysis was performed with R 2.10.1 software.

3. Results

After having checked with the Shapiro-Wilk test, all of the 12 FA followed normal distribution: myristic: , palmitic: , palmitoleic: , stearic: , oleic: , linoleic: , gammalinolenic: , DHGL: , arachidonic (AA): , alphalinoleic: , eicosapentaenoic acid (EPA): , and docosahexaenoic acid (DHA): . Simple statistics in univariate analysis are shown in Table 1. Four FA had significantly different levels between the two groups: stearic acid, AA, EPA, and DHA.

Table 1: Simple statistics for the fatty acid levels in the two groups.

Relative differences between group 1 and group 2 are displayed in Figure 1. Positive differences were associated with higher values in group 2, while negative differences were associated with higher values in group 1. The highest positive relative differences were for stearic (11.6%), gammalinoleic (11.4%), and arachidonic (11.2%) acids. The lowest values for negative relative differences were represented by EPA (−33.9%), DHA (−25.0%), and linoleic acid (−3.8%). Eight FA out of 12 were higher in patients with periodontitis, while 4 FA were higher in the control group.

Figure 1: Relative differences between diseased and control patients.

The Hosmer-Lemeshow test showed that the relationship between the FA levels and the response variable was well fitted by the logistic regression model: chisquare = 10.43, with a value of 0.24. The mean VIF between predictors was equal to 2.19, with a maximum value of 3.12, which means that the collinearity between the FA is low. Then, ROC curves were constructed for each FA. They are displayed in Figure 2. The AUC calculation showed very low prediction performance values for each FA separately. The construction of a composite score allowed choosing the best logistic model (Table 2). The latter was obtained with these 3 FA: arachidonic, linoleic, and DHA. These FA, which are finally retained by the model, are highlighted in Figure 1 (hatched bars). Another ROC curve was yielded by means of this composite score (Figure 3). AUC was found to be equal to 0.821. Depending on the threshold value, the best results, after cross-validation, were 26 true positives, 18 true negatives, 8 false positives, and 4 false negatives. So 44 subjects were correctly classified and 12 were misclassified. Sensitivity, specificity, positive predictive value and negative predictive value, were 86.7%, 69.2%, 76.5%, and 81.8%, respectively (Table 3). The quality of prediction was compared in Table 3. Sensitivity and specificity did not significantly differ between the model yielded by means of the composite score versus the full model including 12 covariables: 86.7% versus 80.0% () and 69.2% versus 73.1% (), respectively.

Table 2: Choice of the best logistic model by a composite score.
Table 3: Diagnostic values for each logistic model.
Figure 2: ROC curves for the 12 fatty acids.
Figure 3: ROC curve constructed with 3 fatty acids (arachidonic, linoleic, and DHA).

4. Discussion

4.1. Fatty Acids Levels and Periodontal Disease

The FA levels in serum differed significantly depending on the presence of periodontitis (moderate or severe according to Page and Eke [10]). The present study was aimed to check if inflammation that was clinically detected in periodontal tissues could be linked with biological measurements, such as the FA levels, which may include different forms of periodontitis. Discrimination between those forms was not investigated. The three FA selected by the model are playing a significant role in bone metabolism [1214]. Firstly, AA was found to be more abundant among diseased patients (Table 1 and Figure 1), which confirms the results from other studies [8, 15] and its implication in periodontal inflammation. It was also well documented by Eberhard et al. [16] in experimental gingivitis and by Figueredo et al. [17] who found a significant relation between the serum level of FA and the severity of periodontitis. But the role of linoleic acid (18 : 2n-6), an essential polyunsaturated FA, which appeared to be one of the most discriminant variables, needs to be more investigated. Unlike AA, it was found to be less abundant among diseased patients. Johnson and Fritsche [18] found that linoleic acid reduced the risk of some diseases, but at a higher level it might contribute to excess chronic inflammation. The roles of EPA and DHA, which are ω-3 fatty acids, were also found to be prominent. Their role is also described in the literature: patients with a low DHA or EPA intake are more likely to have periodontal disease [19, 20]. They also regulate hepatic lipid and glucose metabolism [21].

Since this study follows a cross-sectional design, it is difficult to interpret the variations of fatty-acids in blood, which are known to change over time. So one can imagine little variations of concentration and inconstant results about the significant effect of some FA. However, the results of the present study are in agreement with longitudinal studies [22]. It is now well known that a competition between n-6 and n-3 unsaturated FA occurs in prostaglandin formation [19], which mainly implies AA, EPA, and DHA. Another assumption could be that the variation of the FA concentration in blood may precede the clinical evidence and the oral symptoms.

4.2. Variable Selection Method

Usually, reducing the number of predictors leads to a reduction of the discrimination power [23, 24]. However, the present method allowed reducing the number of variables from 12 to 3, with no reduction of power of discrimination between the two groups or patients and with no reduction of the number of patients correctly classified, when cross-validation procedure was applied. One main finding is that the ordinary logistic model, including the covariables which were significant in univariate analysis, yielded lower values of sensitivity and specificity than the logistic model with composite score (Table 3).

It is worthwhile to underline that the FA retained by this selection method were not systematically the most prominent in univariate analyses when discriminating the 2 groups of patients (Table 1). In univariate analyses, stearic acid, AA, EPA, and DHA showed significantly different levels between cases and controls. But the variable selection method proposed in this paper showed that the subset of variables the most relevant was constituted by AA, linoleic acid, and DHA (Figure 1). The outcomes of the multivariate model do not systematically match with those of the univariate models, because the multivariate approach takes into account the best combination between the 12 variables to explain the “group” difference. The cross-validation procedure may partly explain those differences, because the performance of our model was less penalized by it. However, cross-validation is a more objective approach, by testing the model on a different subset of subjects: its goal is to gauge the generalizability of the prediction model. Another problem may rise from collinearity or correlation between explanatory variables in multiple regression, which can lead to unexpected results [24]. With the model proposed in the present study, the mean VIF between predictors was equal to 2.19, with a maximum value of 3.12, which is much lower than 10 [25]. So, absence of multicollinearity could be assumed.

FA levels in serum of patients were significantly different according to the presence of periodontitis. Of course, FA are not the only components in these complex biological processes, but their importance has been demonstrated. In such a multifactorial disease, it is uncertain to assign a patient to one of the two groups simply by relying on the presence of a single risk factor [26]. Nevertheless, diagnosis by FA measurements is expected to be made earlier: blood measurement data, which result from biochemical reactions during complex pathological processes, are indeed expected to precede the clinical symptoms. In a way, this could help to anticipate the clinical symptoms. This study aimed to help the practitioner in making clinical decisions, such as diagnosis, treatment plan, and prognosis, but it does not have to be considered as a perfect rule which should replace the clinician’s experience.

More generally, by taking into account the comparison of ROC curves, our approach could optimize the choice of variables in multivariate analyses and could better fit in with prognosis of oral diseases in medical research. Very few tools are available in epidemiological research to best choose a multivariate model when many explanatory variables have been measured and are potentially relevant. So, in the future, it will be interesting to conduct a follow-up study in order to understand whether the biochemical transformations, followed by the FA measurements, could precede the clinical and pathological manifestations. This could lead to an earlier diagnosis and a more efficient prophylactic intervention, because taking into account the biochemical measurements on an asymptomatic subject is an act of prevention [27].

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


  1. W. T. McFall Jr., “Tooth loss in 100 treated patients with periodontal disease. A long-term study,” Journal of Periodontology, vol. 53, no. 9, pp. 539–549, 1982. View at Google Scholar · View at Scopus
  2. L. J. Brown and H. Löe, “Prevalence, extent, severity and progression of periodontal disease,” Periodontology 2000, vol. 2, pp. 57–71, 1993. View at Google Scholar · View at Scopus
  3. T. C. Hart and K. S. Kornman, “Genetic factors in the pathogenesis of periodontitis,” Periodontology 2000, vol. 1997, no. 14, pp. 202–215, 1997. View at Google Scholar · View at Scopus
  4. R. C. Page, S. Offenbacher, H. E. Schroeder, G. J. Seymour, and K. S. Kornman, “Advances in the pathogenesis of periodontitis: summary of developments, clinical implications and future directions,” Periodontology 2000, vol. 1997, no. 14, pp. 216–248, 1997. View at Google Scholar · View at Scopus
  5. G. C. Armitage, “Classifying periodontal diseases—a long-standing dilemma,” Periodontology 2000, vol. 30, no. 1, pp. 9–23, 2002. View at Google Scholar · View at Scopus
  6. L. Vermelin, B. Baroukh, A. Llorens, and J. L. Saffar, “Effects of essential fatty acid deficiency on periodontal tissue adaptation to spontaneous tooth migration,” Calcified Tissue International, vol. 77, no. 1, pp. 30–36, 2005. View at Publisher · View at Google Scholar · View at Scopus
  7. K. Hamazaki, M. Itomura, S. Sawazaki, and T. Hamazaki, “Fish oil reduces tooth loss mainly through its anti-inflammatory effects?” Medical Hypotheses, vol. 67, no. 4, pp. 868–870, 2006. View at Publisher · View at Google Scholar · View at Scopus
  8. Y. Çiçek, I. Özmen, V. Çanakçi, A. Dilsiz, and F. Şahin, “Content and composition of fatty acids in normal and inflamed gingival tissues,” Prostaglandins Leukotrienes and Essential Fatty Acids, vol. 72, no. 3, pp. 147–151, 2005. View at Publisher · View at Google Scholar · View at Scopus
  9. B. Siegel, E. Weihe, M. Bette, R. M. Nüsing, L. Flores-de-Jacoby, and R. Mengel, “The effect of age on prostaglandin-synthesizing enzymes in the development of gingivitis,” Journal of Periodontal Research, vol. 42, no. 3, pp. 259–266, 2007. View at Publisher · View at Google Scholar · View at Scopus
  10. R. C. Page and P. I. Eke, “Case definitions for use in population-based surveillance of periodontitis,” Journal of Periodontology, vol. 78, no. 7, pp. 1387–1399, 2007. View at Publisher · View at Google Scholar · View at Scopus
  11. C. Ahn, “Statistical methods for the estimation of sensitivity and specificity of site-specific diagnostic tests,” Journal of Periodontal Research, vol. 32, no. 4, pp. 351–354, 1997. View at Google Scholar · View at Scopus
  12. B. A. Watkins, Fatty Acids Modulate Bone Formation and Cartilage Function, vol. 4, Edited by A. A. Spector, International Society for the Study of Fatty Acids and Lipids, Washington, DC, USA, 1997.
  13. M. M. Rahman, A. Bhattacharya, and G. Fernandes, “Conjugated linoleic acid inhibits osteoclast differentiation of RAW264.7 cells by modulating RANKL signaling,” Journal of Lipid Research, vol. 47, no. 8, pp. 1739–1748, 2006. View at Publisher · View at Google Scholar · View at Scopus
  14. L.-S. Kremmyda, E. Tvrzicka, B. Stankova, and A. Zak, “Fatty acids as biocompounds: their role in human metabolism, health and disease—a review—part 2: fatty acid physiological roles and applications in human health and disease,” Biomedical Papers, vol. 155, no. 3, pp. 195–218, 2011. View at Publisher · View at Google Scholar · View at Scopus
  15. P. Requirand, P. Gibert, P. Tramini, J. P. Cristol, and B. Descomps, “Serum fatty acid imbalance in bone loss: example with periodontal disease,” Clinical Nutrition, vol. 19, no. 4, pp. 271–276, 2000. View at Publisher · View at Google Scholar · View at Scopus
  16. J. Eberhard, F. Heilmann, Y. Açil, H. K. Albers, and S. Jepsen, “Local application of n-3 or n-6 polyunsaturated fatty acids in the treatment of human experimental gingivitis,” Journal of Clinical Periodontology, vol. 29, no. 4, pp. 364–369, 2002. View at Publisher · View at Google Scholar · View at Scopus
  17. C. M. Figueredo, G. L. Martinez, J. C. Koury, R. G. Fischer, and A. Gustafsson, “Serum levels of long-chain polyunsaturated fatty acids in patients with periodontal disease,” Journal of Periodontology, vol. 84, no. 5, pp. 675–682, 2012. View at Google Scholar
  18. G. H. Johnson and K. Fritsche, “Effect of dietary linoleic acid on markers of inflammation in healthy persons: a systematic review of randomized controlled trials,” Journal of the Academy of Nutrition and Dietetics, vol. 112, no. 7, pp. 1029–1041, 2012. View at Google Scholar
  19. P. Albertazzi and K. Coupland, “Polyunsaturated fatty acids. Is there a role in postmenopausal osteoporosis prevention,” Maturitas, vol. 42, no. 1, pp. 13–22, 2002. View at Publisher · View at Google Scholar · View at Scopus
  20. L. Kesavalu, B. Vasudevan, B. Raghu et al., “Omega-3 fatty acid effect on alveolar bone loss in rats,” Journal of Dental Research, vol. 85, no. 7, pp. 648–652, 2006. View at Google Scholar · View at Scopus
  21. E. M. Novak and S. M. Innis, “Dietary long chain n-3 fatty acids are more closely associated with protein than energy intakes from fat,” Prostaglandins Leukotrienes and Essential Fatty Acids, vol. 86, no. 3, pp. 107–112, 2012. View at Publisher · View at Google Scholar · View at Scopus
  22. M. Iwasaki, A. Yoshihara, P. Moynihan, R. Watanabe, G. W. Taylor, and H. Miyazaki, “Longitudinal relationship between dietary ω-3 fatty acids and periodontal disease,” Nutrition, vol. 26, no. 11-12, pp. 1105–1109, 2010. View at Publisher · View at Google Scholar · View at Scopus
  23. R. Azen, D. V. Budescu, and B. Reiser, “Criticality of predictors in multiple regression,” British Journal of Mathematical and Statistical Psychology, vol. 54, no. 2, pp. 201–225, 2001. View at Publisher · View at Google Scholar · View at Scopus
  24. E. Vittinghoff, D. V. Glidden, S. C. Shiboski, and C. E. McCulloch, Regression Methods in Biostatistics, Springer, 2005.
  25. Y.-K. Tu, M. Kellett, V. Clerehugh, and M. S. Gilthorpe, “Problems of correlations between explanatory variables in multiple regression analyses in the dental literature,” British Dental Journal, vol. 199, no. 7, pp. 457–461, 2005. View at Publisher · View at Google Scholar · View at Scopus
  26. P. Meisel, T. Kocher, V. Baelum, and R. Lopez, “Risk factors in periodontitis and classifying the disease,” European Journal of Oral Sciences, vol. 111, no. 3, pp. 280–283, 2003. View at Publisher · View at Google Scholar · View at Scopus
  27. P. Requirand, La Parodontologie Prédictive par la Protomique Sérique, CEIA, Brussels, Belgium, 2007.