Digital Models as an Alternative to Plaster Casts in Assessment of Orthodontic Treatment Outcomes
Objective. This study aimed to compare the use of digital models and plaster casts in assessing the improvement in occlusion following orthodontic treatment. Materials and Methods. Digital models and plaster casts of 39 consecutive patients at pre- and posttreatment stages were obtained and assessed using the Peer Assessment Rating (PAR) index and the Index of Complexity and Treatment Need (ICON). PAR and ICON scores were compared at individual and group levels. Categorization of improvement level was compared using Kappa () statistics. Results. There was no significant difference in neither PAR scores (p > 0.05) nor ICON scores (p > 0.05) between digital and plaster cast assessments. The Intraclass Correlation Coefficient (ICC) values for changes in PAR and ICON scores were excellent (ICC > 0.80). Agreement of ratings of occlusal improvement level between digital and plaster model assessments was 0.83 () for PAR and 0.59 () for ICON, respectively. Conclusion. The study supported the use of digital models as an alternative to plaster casts when assessing changes in occlusion at the ‘individual patient’ level using ICON or PAR. However, it could not fully support digital models as an alternate to plaster casts at ‘the group level’ (as in the case of clinical audit/research).
There has been considerable development in digital models in orthodontics, which have obvious advantages in terms of storage space and ease of transfer and overcome the physical damage and loss problems associated with conventional plaster models . On the other hand, plaster casts are also essential for use in orthodontic diagnosis and treatment planning, allowing us to evaluate treatment need and treatment outcome .
Despite the array of commercial companies providing digital models, studies have observed good agreement between digital and plaster models in the assessment of arch length; tooth size, overbite and overjet; and various intra-arch and interarch relationships.[2–4] Comparisons of Bolton Ratios between digital and plaster models have demonstrated clinically insignificant results between the two [5–7].
Potentially digital models may also be useful in assessing orthodontic treatment need and treatment outcomes. The Peer Assessment Rating (PAR) is a score to assess various occlusal traits making up a malocclusion. The individual scores are summed to obtain an overall total, representing the degree a case deviates from normal alignment and occlusion, which means the score of zero would indicate good alignment and higher scores indicating increased levels of irregularity . Assessments comparing pretreatment PAR scores between digital and plaster models have shown substantial agreement [5, 9].
The Index of Complexity and Treatment Need (ICON), which comprises five weighted measurements, is based on the subjective judgments of 97 orthodontists from nine countries on 240 initial and 98 treated models . Previous studies have reported no statistical difference in overall ICON scores between plaster and digital models from pre- and posttreatment assessment .
However, further study is required to support or refute such claims and to consider the level of agreement in a more comprehensive manner, i.e., considering not only the magnitude of change in scores but also agreement in the level of improvement obtained. This study aims to compare the use of digital models and conventional plaster models in assessing occlusal improvements by both PAR and ICON.
2. Materials and Methods
2.1. Study Sample
Prior to the study a sample size calculation was conducted based on the null hypothesis of good or excellent agreement of 0.7, with α at 0.05 and β at 0.2 (80% power), a sample size of 35 was proposed. Allowing for potential artefact anomalies, an additional 10% recruitment was planned. Eventually, 39 consecutive subjects were recruited from 2008 to 2012 in the Faculty of Dentistry of the University of Hong Kong. Pre- and posttreatment plaster casts and digital models (O3DM®) were obtained (78 sets of both) for these subjects.
2.2. Data Collection
Pretreatment and posttreatment models were assessed on both plaster casts and digital models. This was undertaken by a trained and calibrated examiner in the use of PAR and ICON indices. A digital calliper (Shanghai Taihai Congliang Ju Co., Ltd., Shanghai, China) was used to obtain the measurements from the plaster casts, to the nearest 0.01mm. For measurements of overjet and overbite, a standard ruler was used, and measurements were obtained to the nearest 0.1mm. Digital models were assessed using O3DM’s designated software (www.o3dm.com/moderndentallab) to measure the individual components of PAR and ICON. All components were measured to the nearest 0.01mm.
Assessments of PAR were made according to the methods described by Richmond et al.  The five components of PAR include assessment of upper and lower anterior segment crowding, buccal occlusion, overjet, overbite, and centreline. Pretreatment and posttreatment plaster casts were measured according to the PAR index conventions and guidelines. The same counterpart digital models were assessed using the O3DM’s own computer software. Each of the individual components was weighted accordingly to prescribed rubric, and the overall weighted scores were used for comparison.
The ICON assessment was conducted according to the methods and criteria described by Daniels and Richmond . The five components of ICON include assessment of aesthetic assessment; upper arch crowding or spacing; crossbite; incisor openbite or overbite; and buccal segment anteroposterior relationship. Plaster casts and digital models were assessed and weighted, with the overall weighted scores being used for comparison.
2.3. Data Analyses
PAR and ICON scores of pre- and posttreatment digital and plaster models were derived. In addition, improvement in PAR and ICON scores was categorized according to outcome assessment criteria. [10, 12] Intraexaminer reliability (method error assessment) was calculated by using Dahlberg’s formula , which was conducted by assessing on a random sample of 10% of both the plaster and digital models blind of the original assessments.
Agreement between digital and plaster models was examined using several analytical strategies. First, the mean directional difference between digital and plaster model PAR and ICON scores was calculated (plaster minus digital score). Then a test was performed to evaluate whether the mean of the directional difference was significantly different from zero. A mean directional difference significantly different from zero provides evidence of systemic bias between plaster and digital models. To examine systematic bias, effect sizes were calculated by dividing the mean difference score by the standard deviation of the difference score . Secondly, the mean absolute difference was calculated using Wilcoxon signed-rank test. In contrast to the directional difference, the absolute difference ignores the positive and negative signs of the difference between plaster and digital models. Third, the Intraclass Correlation Coefficient (ICC) values between the digital and plaster casts were computed. The ICC is an appropriate measure of agreement as it corrects correlation for systematic differences and provides an unbiased estimate of agreement . Following on, agreement between categorization of improvement in occlusion was determined using Kappa () statistics .
The method error for plaster casts for PAR and ICON was 3.15 and 3.4, respectively (p > 0.05); for digital models this was 2.6 and 4.4, respectively (p > 0.05).
PAR agreement results are presented in Table 1. There was no significant difference in neither pretreatment nor posttreatment plaster and digital model assessments based on the Wilcoxon signed-rank test (p > 0.05). The standardized difference of pretreatment models was < 0.30 and < 0.10 for the posttreatment models. Furthermore, there was no significant difference in the magnitude of change in PAR assessments obtained from plaster compared to digital models (p > 0.05); standardized difference < 0.30. The mean absolute difference in pre- and posttreatment assessments for both plaster and digital models was < 2.0 and for changes in PAR scores between plaster and digital models was < 2.0. The ICC values for pretreatment models were 0.989 (95% CI, 0.980 to 0.994) and for posttreatment models were 0.982 (95% CI, 0.966 to 0.991) and 0.988 (95% CI, 0.978 to 0.994) for changes in PAR scores. Agreement of the categorization of improvement in occlusion with respect to PAR scores was 0.83 () (Table 2).
ICON agreement results are presented in Table 3. There was no significant difference in neither pretreatment nor posttreatment plaster and digital model assessments (p > 0.05). The standardized difference of pre- and posttreatment models was < 0.2. Furthermore, there was no significant difference in the magnitude of change in ICON assessments obtained from plaster compared to digital models (p > 0.05); standardized difference was < 0.10. The mean absolute difference in pretreatment models was < 10.0 and for posttreatment assessments was < 5.0 and for changes in ICON scores between plaster and digital models was <10.0. The ICC values for pretreatment models were 0.903 (95% CI, 0.817 to 0.949) and for posttreatment models were 0.859 (95% CI, 0.733 to 0.926) and 0.922 (95% CI, 0.852 to 0.959) for changes in ICON scores. Agreement of the categorization of improvement in occlusion with respect to ICON scores was 0.59 () (Table 4).
A number of analytical strategies were employed to provide a comprehensive assessment of agreement of orthodontic treatment outcomes between digital and plaster models using two of the most commonly used indices—PAR and ICON [10, 12]. The present study found that there was no significant difference in the directional difference between the digital and plaster models with respect to pretreatment, posttreatment, and change in scores. This concurs with the findings of others [5, 9, 11]. The present study also found that the standardized differences, as an indicator of systematic bias, was generally negligible (< 0.20) but with respect to agreement in changes, PAR scores was somewhat larger; although it can still be considered as a small ‘bias’ . These findings suggest that in assessment of agreement between digital and plaster models it is important to consider not simply agreement of pre- and/or posttreatment overall scores but also of ‘the change’ in scores which is additional information that this study provides and has clinical relevance. Absolute difference values provide an insight into agreement irrespective of direction (ignoring the positive and negative change); the findings indicated that the absolute difference were generally small expect for changes in ICON scores (mean 9.13). Given that ICON score can range from 7 to 140, a difference of 9 constitutes approximately 7% difference that in itself may have no clinical significance.
There was no significant difference in categorization of improvement level in PAR or ICON scores obtained from digital and plaster models; Kappa values could be interpreted as ‘excellent’ for PAR at 0.83, but only ‘moderate’ for ICON at 0.59 . This slight difference may be the result from a more stringent standard of ICON than PAR in the ‘greatly improved’ category . Therefore, at the group level, while there was generally good agreement for both PAR and ICON scores, variations in some situation the level of agreement were less than ideal, i.e., systematic bias (ICON). This may have implications with respect to clinical audit as well as research that rely on analyses of groups of patients.
In conclusion, findings from this study demonstrated that there are acceptable levels of agreement between digital and plaster models in the assessment of treatment outcomes. This study supports the notion that when assessing at the ‘individual patient’ level digital models can be used as an alternate to plaster models which has obvious clinical in-practice implications. However, it could not fully support digital models as an alternate to plaster casts at ‘the group level’ (as in the case of clinical audit/research).
The data used to support the findings of this study are available from the corresponding author upon request.
A previous version of the abstract of this manuscript has been presented in the FDI Annual World Dental Congress, 2012, and published in International Dental Journal, 2012, v. 62 suppl. 1, p.28-29, abstract P201.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
D. R. Stevens, C. Flores-Mir, B. Nebbe, D. W. Raboud, G. Heo, and P. W. Major, “Validity, reliability, and reproducibility of plaster vs digital study models: comparison of peer assessment rating and Bolton analysis and their constituent measurements,” American Journal of Orthodontics and Dentofacial Orthopedics, vol. 129, no. 6, pp. 794–803, 2006.View at: Publisher Site | Google Scholar
D. Naidu, J. Scott, D. Ong, and C. T. C. Ho, “Validity, reliability and reproducibility of three methods used to measure tooth widths for bolton analyses.,” Australian Orthodontic Journal, vol. 25, no. 2, pp. 97–103, 2009.View at: Google Scholar
G. Dahlberg, Statistical Methods for Medical and Biological Students, Intersciensce, NY, USA, 1940.View at: MathSciNet
J. Cohen, Statistical power analysis for the behavioral sciences, Lawrence Erlbaum Associates, 1988.
T. Colton, Statistics in Medicine, Little Brown and Company, 1974.