Engineering Solutions for Craniomaxillofacial Rehabilitation and Orodental Healthcare
View this Special IssueResearch Article  Open Access
Automatic Analysis of Lateral Cephalograms Based on Multiresolution Decision Tree Regression Voting
Abstract
Cephalometric analysis is a standard tool for assessment and prediction of craniofacial growth, orthodontic diagnosis, and oralmaxillofacial treatment planning. The aim of this study is to develop a fully automatic system of cephalometric analysis, including cephalometric landmark detection and cephalometric measurement in lateral cephalograms for malformation classification and assessment of dental growth and soft tissue profile. First, a novel method of multiscale decision tree regression voting using SIFTbased patch features is proposed for automatic landmark detection in lateral cephalometric radiographs. Then, some clinical measurements are calculated by using the detected landmark positions. Finally, two databases are tested in this study: one is the benchmark database of 300 lateral cephalograms from 2015 ISBI Challenge, and the other is our own database of 165 lateral cephalograms. Experimental results show that the performance of our proposed method is satisfactory for landmark detection and measurement analysis in lateral cephalograms.
1. Introduction
Cephalometric analysis is a scientific research approach for assessment and prediction of craniofacial growth, orthodontic diagnosis, and oralmaxillofacial treatment planning for patients with malocclusion in clinical practice [1]. We focus on 2D lateral cephalometric analysis, which is performed on cephalometric radiographs in lateral view. Cephalometric analysis has undergone three stages of development: manual stage, computeraided stage, and computerautomated stage. Cephalometric analysis based on radiographs was introduced by Broadbent [2] and Hofrath [3] for the first time in 1931. In the first stage, it consists of five steps to obtain the cephalometric analysis: (a) placing a sheet of acetate over the cephalometric radiograph; (b) manual tracing of craniofacial anatomical structures; (c) manual marking of cephalometric landmarks; (d) measuring angular and linear parameters using the landmark locations; and (e) analysis/classification of craniomaxillofacial hard tissue and soft tissue [4]. This process is tedious, time consuming, and subjective. In the second stage, the first step in traditional cephalometric analysis has been skipped since the cephalometric radiograph is digitized. Furthermore, the next two steps can be operated by computer, and the measurement can be automatically calculated by software. However, this computeraided analysis is still time consuming and the results are not reproducible due to large inter and intravariability error in landmark annotation. In the third stage, the most crucial step, i.e., identifying landmarks, can be automatized by image processing algorithms [5]. The automatic analysis has high reliability and repeatability, and it can save a lot of time for the orthodontists. However, fully automatic cephalometric analysis is challenging due to overlaying structures and inhomogeneous intensity in cephalometric radiographs as well as anatomical differences among subjects.
Cephalometric landmarks include corners, line intersections, center points and other salient features of anatomical structures. They always have stable geometrical locations in anatomical structures in cephalograms. In this study, 45 landmarks are used in lateral cephalograms (refer to Table 1). The cephalometric landmarks play a major role in calculating the cephalometric planes, which are the lines between two landmarks in 2D cephalometric radiographs as shown in Table 2. Furthermore, different measurements are calculated using different cephalometric landmarks and planes, and they are used to analyze different anatomical structures [4]. Measurements can be angles or distance, which are used to analyze skeletal and dental anatomical structures, as well as soft tissue profile. To conduct a clinical diagnosis, many analytic approaches have been developed, including Downs analysis, Wylie analysis, Riedel analysis, Steiner analysis, Tweed analysis, Sassouni analysis, Bjork analysis, and so on [6].


Methods of automatic cephalometric landmark detection are mainly separated into three categories: (1) bottomup methods; (2) deformation modelbased methods; and (3) classifier/regressorbased methods. The first category is bottomup methods, while the other two categories are learningbased methods.
For bottomup methods, two techniques were usually employed, including edge detection and template matching. Edgebased methods were to extract the anatomical contours in cephalograms, and then the relative landmarks were identified on the contours using prior knowledge. LevyMandel et al. [7] first proposed the edge tracking method for identifying craniofacial landmarks. First, input images were smoothed by median filter and then edges were extracted by MeroVassy operator. Second, contours are obtained by the edge tracking technique based on the constraints of locations, endings, and breaking segments and edge linking conditions. Finally, the landmarks are detected on contours according to their definition in cephalometry. This method was only tested by two high quality cephalometric radiographs. Furthermore, 13 landmarks on noncontours were not detected. A twostage landmark detection method was proposed by Grau et al. [8], which first extracted the major line features in high contrast by linedetection module, such as contours of jaws and nose, and then detected landmarks by pattern recognition based on mathematical morphology. Edgebased methods could only detect the landmarks on contours and were not robust to noise, low contrast, and occlusion. Templatematchingbased methods were to find a most likely region with the least distance to the template of the specific landmark, and the center of the region was considered as the estimated position of the specific landmark. Therefore, not only the landmarks on contours, but also the landmarks on noncontours can be detected. Ashish et al. [9] proposed a templatematching method using a coarsetofine strategy for cephalometric landmark detection. Mondal et al. [10] proposed an improved method from Canny edge extraction to detect the craniofacial anatomical structures. Kaur and Singh [11] proposed an automatic cephalometric landmark detection using Zernike moments and template matching. Template matching based methods had difficulty in choosing the representative template and were not robust to anatomical variability in individual. All these methods strongly depended on the quality of images.
Subsequently, deformation modelbased methods were proposed for these limitations by using shape constraints. Forsyth and Davis [12] reviewed and evaluated the approaches reported from 1986 to 1996 on automatic cephalometric analysis systems, and they highlighted a cephalometric analysis system presented by Davis and Taylor, which introduced an appearance model into landmark detection. The improved algorithms of active shape/appearance models were then used to refine cephalometric landmark detection combining with template matching, and higher accuracy was obtained [13–17]. Here, shape or appearance models were learned from training data to regularize the searching through all landmarks in testing data. However, it was difficult to initialize landmark positions for searching, because the initialization was always achieved by traditional methods.
Recently, significant progress has been made for automatic landmark detection in cephalograms by using supervised machinelearning approaches. These machinelearning approaches can be further separated into two classes: classification and regression models. A support vector machine (SVM) classifier was used to predict the locations of landmarks in cephalometric radiographs, while the projected principaledge distribution was proposed to describe edges as the feature vector [18]. ElFeghi et al. [19] proposed a coarsetofine landmark detection algorithm, which first used fuzzy neural network to predict the locations of landmarks, and then refined the locations of landmarks by template matching. Leonardi et al. [20] employed the cellular neural networks approach for automatic cephalometric landmark detection on softcopy of direct digital cephalometric Xrays, which was tested by 41 cephalograms and detected 10 landmarks. Favaedi et al. [21] proposed a probability relaxation method based on shape features for cephalometric landmark detection. Farshbaf and Pouyan et al. [22] proposed a coarsetofine SVM classifier to predict the locations of landmarks in cephalograms, which used histograms of oriented gradients for coarse detection and histograms of gray profile for refinement. Classifierbased methods were successfully used to identify the landmark from the whole cephalograms, but positive and negative samples were difficult to be balanced in training data, and computational complexity is increased due to pixelbased searching.
Many algorithms have been reported for automatic cephalometric landmark detection, but results were difficult to compare due to the different databases and landmarks. The situation has been better since two challenges were held at 2014 and 2015 IEEE International Symposium on Biomedical Imaging (ISBI). During the challenges, regressorbased methods were first introduced to automatic landmark detection in lateral cephalograms. The challenges aimed at automatic cephalometric landmark detection and using landmark positions to measure the cephalometric linear and angular parameters for automatic assessment of anatomical abnormalities to assist clinical diagnosis. Benchmarks have been achieved by using random forest (RF) [23] regression voting based on shape model matching at the challenges. In particular, Lindner et al. [24, 25] presented the algorithm of random forest regression voting (RFRV) in the constrained local model (CLM) framework for automatic landmark detection, which obtained the mean error of 1.67 mm, and the successful detection rate of 73.68% in precision range of 2 mm. The algorithm of RFRVCLM won the challenges, and the RFbased classification algorithm combined with Haarlike appearance features and game theory [26] was in rank 2. The comprehensive performance analysis among those algorithms for the challenges were reported in [27, 28]. Later, Vandaele et al. [29] proposed an ensemble treebased method using multiresolution features in bioimages, and the method was tested by three databases including a cephalogram database of 2015 ISBI challenge. Now, public database and evaluation are available to improve the performance of automatic analysis system for cephalometric landmark detection and parameter measurement, and many research outcomes have been achieved. However, automatic cephalometric landmark detection and parameter measurement for clinical practice is still challenging.
In recent days, efforts have been made to develop automatic cephalometric analysis systems for clinical usages. In this paper, we present a new automatic cephalometric analysis system, including landmark detection and parameter measurement in lateral cephalograms, and the block diagram of our system is shown in Figure 1. The core of this system is automatic landmark detection, which is realized by a new method based on multiresolution decision tree regression voting. Our main contributions can be concluded in four aspects: (1) we propose a new landmark detection framework of multiscale decision tree regression voting; (2) SIFTbased patch feature is first employed to extract the local feature of cephalometric landmarks, and the proposed approach is flexible when extending to detection of more landmarks because feature selection and shape constraints are not used; (3) two clinical databases are used to evaluate the extension of the proposed method. Experimental results show that our method can achieve robust detection when extending from 19 landmarks to 45 landmarks; (4) automatic measurement of clinical parameters is implemented in our system based on the detected landmarks, which can facilitate clinical diagnosis and research.
The rest of this paper is organized as follows. Section 2 describes our proposed method of automatic landmark detection based on multiscale decision tree regression voting and parameter measurement. Experimental results for 2015 ISBI Challenge cephalometric benchmark database and our own database are presented in Section 3. The discussion of the proposed system is given in Section 4. Finally, we conclude this study with expectation of future work in Section 5.
2. Method
2.1. Landmark Detection
It is well known that cephalometric landmark detection is the most important step in cephalometric analysis. In this paper, we propose a new framework of automatic cephalometric landmark detection, which is based on multiresolution decision tree regression voting (MDTRV) using patch features.
2.1.1. SIFTBased Patch Feature Extraction
(1). SIFT Feature. Scale invariant feature transform (SIFT) is first proposed for key point detection by Lowe [30]. The basic idea of SIFT feature extraction algorithm consists of four steps: (1) local extrema point detection in scalespace by constructing the differenceofGaussian image pyramids; (2) accurate key point localization including location, scale, and orientation; (3) orientation assignment for invariance to image rotation; and (4) key point descriptor as the local image feature. The advantage of SIFT features to represent the key points is affine invariant and robust to illumination change for images. This method of feature extraction has been commonly used in the fields of image matching and registration.
(2) SIFT Descriptor for Image Patch. Key points with discretized descriptors can be used as visual words in the field of image retrieval. Histogram of visual words can then be used by a classifier to map images to abstract visual classes. Most successful image representations are based on affine invariant features derived from image patches. First, features are extracted on sampled patches of the image, either using a multiresolution grid in a randomized manner, or using interest point detectors. Each patch is then described using a feature vector, e.g., SIFT [30]. In this paper, we use the SIFT feature vectors [31] to represent image patches, which can be used by a regressor to map patches to the displacements from each landmark.
The diagram of SIFT feature descriptor of an image patch is illustrated in Figure 2. An image patch centered at location (x, y) will be described by a square window of length 2W + 1, where W is a parameter. For each square image patch P centered at position (x, y) of length 2W + 1, the gradient and of image patch P is computed by using finite differences. The gradient magnitude is computed byand the gradient angle (measured in radians, clockwise, starting from the X axis) is calculated by
Each square window is separated into 4 × 4 adjacent small windows, and the 8bin histogram of gradients (direction angles started from 0 to (2) is extracted in each small window. For each image patch, 4 × 4 histograms of gradients are concatenated to the resulting feature vector (dimension = 128). Finally, the feature of image patch P can be described as SIFT descriptor. The example of SIFTbased patch feature extraction in a cephalogram is shown in Figure 3.
(a)
(b)
(c)
2.1.2. Decision Tree Regression
Decision tree is a classical and efficient statistical learning algorithm and has been widely used to solve classification and regression problems [32, 33]. Decision trees predict responses to data. To predict a response, query the new data by the decisions from the root node to a leaf node in the tree. The leaf node contains the response. Classification trees give responses that are nominal, while regression trees give numeric responses. In this paper, CART (classification and regression trees) [34], a binary tree, is used as the regressors to learn the mapping relationship between the SIFTbased patch feature vectors f and the displacement vectors from the centers of the patches to the position of each landmark in the training images. The regressors are then used to predict the displacements using patch feature vectors of the test image. Finally, the predicted displacements are used to obtain the optimal location of each landmark via voting. One advantage of the regression approach is to avoid balancing positive and negative examples for the training of classifiers. Another advantage of using regression, rather than classification, is that good results can be obtained by evaluating the region of interest on randomly sampling pixels rather than at every pixel.
(1) Training. In training process, a regression tree can be constructed for each landmark l via splitting. Here, optimization criterion and stopping rules are used to determine how a decision tree is to grow. The decision tree can be improved by pruning or selecting the appropriate parameters.
The optimization criterion is to choose a split to minimize the meansquared error (MSE) of predictions compared to the ground truths in the training data. Splitting is the main process of creating decision trees. In general, four steps are performed to split node . First, for each observation (i.e., the extracted feature in the training data), the weighted MSE of the responses (displacements ) in node is computed by usingwhere is the set of all observation indices in node and is the sample size. Second, the probability of an observation in node is calculated bywhere is the weight of observation . In this paper, set = 1/N. Third, all elements of the observation are sorted in ascending order. Every element is regraded as a splitting candidate. is the unsplit set of all observation indices corresponding to missing values. Finally, the best way to split node using is determined by maximizing the reduction in MSE among all splitting candidates. For all splitting candidates in the observation , the following steps are performed:(1)Split the observations in node t into left and right child nodes ( and , respectively).(2)Compute the reduction in MSE . For a particular splitting candidate, and represent the observation indices in the sets and , respectively. If does not contain any missing values, then the reduction in MSE for the current splitting candidate is
If contains any missing values, then the reduction in MSE iswhere is the set of all observation indices in node t that are not missing.(3)Choose the splitting candidate that yields the largest MSE reduction. In this way, the observations in node are split at the candidate that maximize the MSE reduction.To stop splitting nodes of the decision tree, two rules can be followed: (1) it is pure of the node that the MSE for the observed response in this node is less than the MSE for the observed response in the entire data multiplied by the tolerance on quadratic error per node and (2) the decision tree reaches to the setting values for depth of the regression decision tree, for example, the maximum number of splitting nodes max_splits.
The simplicity and performance of a decision tree should be considered to improve the performance of the decision tree at the same time. A deep tree usually achieves high accuracy on the training data. However, the tree is not to obtain high accuracy on a test data as well. It means that a deep tree tends to overfit, i.e., its test accuracy is much less than its training accuracy. On the contrary, a shallow tree does not achieve high training accuracy, but can be more robust, i.e., its training accuracy could be similar to that of a test data. Moreover, a shallow tree is easy to interpret and saves time for prediction. In addition, tree accuracy can be obtained by cross validation, when there are not enough data for training and testing.
There are two ways to improve the performance of decision trees by minimizing crossvalidated loss. One is to select the optimal parameter value to control depth of decision trees. The other is postpruning after creating decision trees. In this paper, we use the parameter max_splits to control the depth of resulting decision trees. Setting a large value for max_splits lends to growing a deep tree, while setting a small value for max_splits yields a shallow tree with larger leaves. To select the appropriate value for max_splits, the following steps are performed: (1) set a spaced set of values from 1 to the total sample size for max_splits per tree; (2) create crossvalidated regression trees for the data using the setting values for max_splits and calculate the crossvalidated errors; and (3) the appropriate value of max_splits can be obtained by minimizing the crossvalidated errors.
(2) Prediction. In prediction process, you can easily predict responses for new data after creating a regression tree . Suppose f_new is the new data (i.e., a feature vector extracted from a new patch P_new in the test cephalogram). According to the rules of the regression tree, the nodes select the specific attributes from the new observation f_new and reach the leaf step by step, which stores the mean displacement d_new. Here, we predict the displacement from the center of patch P_new to the landmark l using SIFT feature vector by the regressor .
2.1.3. MDTRV Using SIFTBased Patch Features
(1) Decision Tree Regression Voting (DTRV). As illustrated in Figure 4, the algorithm of automatic cephalometric landmark detection using decision tree regression voting (DTRV) in single scale consists of training and testing processes as a supervised learning algorithm. The training process begins by feature extraction for patches sampled from the training images. Then, a regression tree is constructed for each landmark with inputting feature vectors and displacements (f, d). Here, f represents the observations and d represents the targets for this regression problem. The testing process begins by the same step of feature extraction. Then, the resulting f_new is used to predict the displacement d_new by regressor . In the end, the optimal landmark position is obtained via voting. The voting style includes single unit voting and single weighted voting. As mentioned in [35], these two styles perform equally well in application of voting optimal landmark positions for medical images. In this paper, we use the single unit voting.
(2) MDTRV Using SIFTBased Patch Features. Finally, a new framework of MDTRV using SIFTbased patch features is proposed by using a simple and efficient strategy for accurate landmark detection in lateral cephalograms. There are four stages in the proposed algorithm of MDTRV, which are iteratively performed in the scales of 0.125, 0.25, 0.5, and 1. When the scale is 0.125, the training K patches with K displacements are randomly sampled in a whole cephalogram for a specific landmark . The decision tree regressor is created by using these training samples. For prediction, SIFT features are extracted for testing patches sampled in the whole cephalogram, and displacements are predicted by regressor using extracted features. The optimal position of the specific landmark l is obtained via voting through all predicted displacements. When the scale is 0.25, 0.5, or 1, only the patch sampling rule is different from the procedure in scale of 0.125. That is, the training and testing patches are randomly sampled in the (2S + 1) × (2S + 1) neighborhood of true and initial landmark positions. The estimated landmark position is used as the initial landmark position in the next scale of testing process. The optimal position of the specific landmark l in scale of 1 is refined by using Hough forest [36], which is regarded as the final resulting location. This multiresolution coarsetofine strategy has greatly improved the accuracy of cephalometric landmark detection.
2.2. Parameter Measurement
Measurements are either angular or linear parameters calculated by using cephalometric landmarks and planes (refer to Tables 1 and 2). According to geometrical structure, measurements can be classified into five classes: the angle of three points, the angle of two planes, the distance between two points, the distance from a point to a plane, and the distance between two points projected to a plane. All measurements can be calculated automatically in our system as described in the following.
2.2.1. The Angle of Three Points
Assume point B is the vertex among three points A, B, and C, then the angle of these three points is calculated byandwhere represents the Euclidean distance of A and B.
2.2.2. The Angle between Two Planes
As the cephalometric radiographs are 2D images in this study, the planes are projected as straight lines. Thus, the planes are determined by two points. The angle between two planes and is calculated by
2.2.3. The Distance between Two Points
The distance between two points is calculated using the following equation: is a scale indicator that can be calculated by the calibrated gauge in the lateral cephalograms and its unit is mm/pixel.
2.2.4. The Distance from a Point to a Plane in the Horizontal Direction
The plane is defined as
Thus, the distance of point C to the plane is calculated by
2.2.5. The Distance between Two Points Projected to a Plane
The calculation of the distance between two points projected to a plane is illustrated in Figure 5. In order to calculate the distance between two points projected to a plane, first we use Equations (4)–(12) to determine the plane . Then, we calculate the distance from two points C and D to the plane as and by Equations (4)–(13). Third, the distance between two points C and D is calculated by Equations (4)–(11). Finally, the distance between two points C and D projected to the plane is represented as and is calculated by
3. Experimental Evaluation
3.1. Data Description
3.1.1. Database of 2015 ISBI Challenge
The benchmark database (database1) included 300 cephalometric radiographs (150 for TrainingData, 150 for Test1Data), which is described in Wang et al. [28]. All of the cephalograms were collected from 300 patients aged from 6 to 60 years old. The image resolution was 1935 × 2400 pixels. For evaluation, 19 landmarks were manually annotated by two experienced doctors in each cephalogram (see illustrations in Figure 6); the ground truth was the average of the annotations by both doctors, and eight clinical measurements were used for classification of anatomical types.
(a)
(b)
3.1.2. Database of Peking University School and Hospital of Stomatology
The database2 included 165 cephalometric radiographs as illustrated in Table 3 (the IRB approval number is PKUSSIRB201415063). There were 55 cephalograms collected from 55 Chinese adult subjects (29 females, 26 males) who were diagnosed as skeletal class I (0 < ANB < 4) with minor dental crowding and a harmonious. The other 110 cephalograms were collected from the 55 skeletal class III patients (ANB < 0) (32 females, 23 males), who underwent combined surgical orthodontic treatment at the Peking University School and Hospital of Stomatology from 2010 to 2013. The image resolutions were different in the range from 758 × 925 to 2690 × 3630 pixels. For evaluation, 45 landmarks were manually annotated by two experienced orthodontists in each cephalogram as shown in Figure 7; the ground truth was the average of annotations by both doctors, and 27 clinical measurements were used for future cephalometric analysis.

(a)
(b)
3.1.3. Experimental Settings
In the initial scale of landmark detection, for training, K = 50; for prediction, = 400. In the other scales, the additional parameters W and S are set to 48 and 40, separately. Because the image resolutions are different, preprocessing is required to rescale all the images to the fixed width of 1960 pixels in database2. For database2, we test our algorithm by using 5fold cross validation.
3.2. Landmark Detection
3.2.1. Evaluation Criterion
The first evaluation criterion is the mean radial error with the associated standard deviation. The radial error , i.e., the distance between the predicted position and the true position of each landmark, is defined aswhere , represent the estimated and true positions of each landmark for all cephalograms in the dataset; , are the corresponding positions in set and , respectively; and is the total number of cephalograms in the dataset. The mean radial error (MRE) and the associated standard deviation (SD) for each landmark are defined as
The second evaluation criterion is the success detection rate with respect to the 2.5 mm, 3 mm, 5 mm, and 10 mm precision ranges. If is less than a precision range, the detection of the landmark is considered as a successful detection in the precision range; otherwise, it is considered as a failed detection. The success detection rate (SDR) with precision less than is defined aswhere denotes four precision ranges used in the evaluation, including 2.0 mm, 2.5 mm, 3 mm, and 4 mm.
3.2.2. Experimental Results
(1) Results of database1. Experimental results of cephalometric landmark detection using database1 is shown in Table 4. The MREs of landmarks L10, L4, L19, L5, and L16 are more than 2 mm. The other 14 landmarks are all within the MRE of 2 mm, in which 3 landmarks are within the MRE of 1 mm. The average MRE and SD of 19 landmarks are 1.69 mm and 1.43 mm, respectively. It shows that the detection of cephalometric landmarks is accurate by our proposed method. The SDRs are 73.37%, 79.65%, 84.46%, and 90.67% within the precision ranges of 2.0, 2.5, 3.0, and 4.0 mm, respectively. In 2 mm precision range, the SDR is more than 90% for four landmarks (L9, L12, L13, L14); the SDR is between 80% and 90% for six landmarks (L1, L7, L8, L11, L15, L17); the SDR is between 70% and 80% for two landmarks (L6 and L18); the SDRs for the other seven landmarks are less than 70%, where the SDR of landmark L10 is the lowest. It can be seen from Table 4 that the landmark L10 is the most difficult to detect accurately.

Comparison of our method with three stateoftheart methods using Test1data is shown in Table 5. The difference between MRE of our proposed method and RFRVCLM in [24] is less than 1 pixel, which means that their performance is comparable within the resolution of image.
Furthermore, we conduct the comparison with the top two methods in 2015 ISBI Challenge of cephalometric landmark detection in terms of MRE, SD, and SDR as illustrated in Table 6. Our method achieves the SDR of 73.37% within the precision range of 2.0 mm, which is similar to the SDR of 73.68% of the best method. The MRE of our proposed method is 1.69 mm, which is comparable to that of method of RFRVCLM, but the SD of our method is less than that of method of RFRVCLM. The results show that our method is accurate for landmark detection in lateral cephalograms and is more robust.
(2) Results of database2. Two examples of landmark detection in database2 are shown in Figure 8. It can be observed from the figure that the predicted locations of 45 landmarks in cephalograms are near to the ground truth locations, which shows the success of our proposed algorithm for landmark detection.
(a)
(b)
For the quantitative assessment of the proposed algorithm, the statistical result is shown in Table 7. The MRE and SD of the proposed method to detect 45 landmarks are 1.71 mm and 1.39 mm, respectively. The average SDRs of 45 landmarks within the precision range 2.0 mm, 2.5 mm, 3 mm, and 4 mm are 72.08%, 80.63%, 86.46%, and 93.07%, respectively. The experimental results in database2 show comparable performance to the results of database1. It indicates that the proposed method can be successfully applied to the detection of more clinical landmarks.

3.3. Measurement Analysis
3.3.1. Evaluation Criterion
For the classification of anatomical types, eight cephalometric measurements were usually used. The description of these eight measurements and the methods for classification are explained in Tables 8 and 9, respectively.


One evaluation criterion for measurement analysis is the success classification rate (SCR) for these 8 popular methods of analysis, which is calculated using confusion matrix. In the confusion matrix, each column represents the instances of an estimated type, while each row represents the instances of the ground truth type. The SCR is defined as the averaged diagonal of the confusion matrix.
For evaluation, the performance of the proposed system for more measurement analysis in database2, another criterion, the mean absolute error (MAE), is calculated bywhere is the value of angular or linear measurement estimated by our system and is the ground truth measurement obtained using human annotated landmarks.
3.3.2. Experimental Results
(1) Results of database1. Using the detected 19 landmarks, 8 cephalometric measurements are calculated for classification of anatomical types in database1. The comparison of our method to the best two methods is given in Table 10, where we have achieved the best classification results for two measurements of APDI and FHI. The average SCR obtained by our method is 75.03%, which is much better than the method in [26] and is comparable to the method of RFRVCLM [24].
(2) Results of database2. According to the detected 45 landmarks, 27 measurements are automatically calculated by our system using database2, including 17 angular and 10 linear measurements, which are illustrated in Table 11.

The MAE and of 27 measurements are illustrated in Table 12. Here, the MAE represents the performance of measurement analysis of our automatic system. The represents interobserver variability calculated between the ground truth values obtained by the two experts. For angular measurements, the difference between MAE and is within 0.5° for 9/17 measurements; the difference between MAE and is within 1° for 12/17 measurements, and the difference between MAE and is within 2° for 16/17 measurements. In particular, the MAE of Z Angle is less than . The other one measurement has the difference of 2.24°. For linear measurements, the difference between MAE and is within 0.5 mm for 9/10 measurements. The other one measurement has the difference of 1.16 mm. The results show that our automatic system is efficient and accurate for measurement analysis.
 
: interobserver variability. 
4. Discussion
The interobserver variability of human annotation is analyzed. Table 13 shows that the MRE and SD between annotations by two orthodontists are 1.38 mm and 1.55 mm for database1, respectively [27]; for database2, the MRE and SD between annotations of two orthodontists are 1.26 mm and 1.27 mm, respectively. Experimental results show that the proposed algorithm of automatic cephalometric landmark detection can achieve the MRE of less than 2 mm and the SD almost equal to that of manual marking. Therefore, the performance of automatic landmark detection by the proposed algorithm is comparable to manual marking in term of the interobserver variability between two clinical experts. Furthermore, the detected landmarks are used to calculate the angular and linear measurements in lateral cephalograms. The satisfactory results of measurement analysis are presented in experiments based on the accurate landmark detection. It shows that the proposed algorithm has the potential to be applied in clinical practice of cephalometric analysis for orthodontic diagnosis and treatment planning.

The detection accuracy of some landmarks is lower than the average value, and there are mainly three main reasons: (i) some landmarks are located at the overlaying anatomical structures, such as landmarks Go, UL5, L6E, UL6, and U6E; (ii) some landmarks have large variability of manual marking due to large anatomical variability among subjects especially in abnormality, such as landmarks A, ANS, Pos, Ar, Ba, and Bolton; and (iii) structural information is not obvious due to little intensity variability in the neighborhood of some landmarks in images, such as landmarks P, Co, L6A, and U6A.
The proposed system follows clinical cephalometric analysis procedure, and the accuracy of the system can be evaluated by the manual marking accuracy. There are several limitations in this study. On one hand, except for the algorithm, the data effects to the performance of the system include three aspects: (i) the quality of the training data; (ii) the size of the training dataset; and (iii) the shape and appearance variation exhibited in the training data. On the other hand, performance of the system depends on consistency between the training data and the testing data, similarly as any supervised learningbased methods.
5. Conclusion
In conclusion, we design a new framework of landmark detection in lateral cephalograms with lowtohigh resolutions. In each image resolution, decision tree regression voting is employed in landmark detection. The proposed algorithm takes full advantage of image information in different resolutions. In lower resolution, the primary local structure information rather than local detail information can be extracted to predict the positions of anatomical structure involving the specific landmarks. In higher resolution, the local structure information involves more detail information, and it is useful for prediction positions of landmarks in the neighborhood. As demonstrated in experimental results, the proposed algorithm has achieved good performance. Compared with stateoftheart methods using the benchmark database, our algorithm has obtained the comparable accuracy of landmark detection in terms of MRE, SD, and SDR. Tested by our own clinical database, our algorithm has also obtained average 72% successful detection rate within precision range of 2.0 mm. In particular, 45 landmarks have been detected in our database, which is over two times of the number of landmarks in the benchmark database. Therefore, the extensibility of the proposed algorithm is confirmed using this clinical dataset. In addition, automatic measurement of clinical parameters has also achieved satisfactory results. In the future, we will put more efforts to improve the performance of automatic analysis in lateral cephalograms so that the automatic system can be utilized in clinical practice to obtain objective measurement. More research will be conducted to reduce the computational complexity of the algorithm as well.
Data Availability
The database1 of 2015 ISBI Challenge is available at http://www.ntust.edu.tw/∼cweiwang/ISBI2015/challenge1/index.html. The database2 of Peking University School and Hospital of Stomatology cannot be made public available due to privacy concerns for the patients.
Disclosure
This work is based on research conducted at Beijing Institute of Technology.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
The authors would like to express their gratitude to Department of Orthodontics, Peking University School and Hospital of Stomatology, China, for the supply of image database2.
References
 J. Yang, X. Ling, Y. Lu et al., “Cephalometric image analysis and measurement for orthognathic surgery,” Medical & Biological Engineering & Computing, vol. 39, no. 3, pp. 279–284, 2001. View at: Publisher Site  Google Scholar
 B. H. Broadbent, “A new xray technique and its application to orthodontia,” Angle Orthodontist, vol. 1, pp. 4546, 1931. View at: Google Scholar
 H. Hofrath, “Die Bedeutung des Rontgenfern und Abstandsaufnahme fiir de Diagnostik der Kieferanomalien,” Fortschritte der Kieferorthopädie, vol. 2, pp. 232–258, 1931. View at: Google Scholar
 R. Leonardi, D. Giordano, F. Maiorana et al., “Automatic cephalometric analysis  a systematic review,” Angle Orthodontist, vol. 78, no. 1, pp. 145–151, 2008. View at: Publisher Site  Google Scholar
 T. S. Douglas, “Image processing for craniofacial landmark identification and measurement: a review of photogrammetry and cephalometry,” Computerized Medical Imaging and Graphics, vol. 28, no. 7, pp. 401–409, 2004. View at: Publisher Site  Google Scholar
 D. S. Gill and F. B. Naini, “Chapter 9 cephalometric analysis,” in Orthodontics: Principles and Practice, pp. 78–87, Blackwell Science Publishing, Oxford, UK, 2011. View at: Publisher Site  Google Scholar
 A. D. LevyMandel, A. N. Venetsanopoulos, and J. K. Tsotsos, “Knowledgebased landmarking of cephalograms,” Computers and Biomedical Research, vol. 19, no. 3, pp. 282–309, 1986. View at: Publisher Site  Google Scholar
 V. Grau, M. Alcaniz, M. C. Juan et al., “Automatic localization of cephalometric landmarks,” Journal of Biomedical Informatics, vol. 34, no. 3, pp. 146–156, 2001. View at: Publisher Site  Google Scholar
 J. Ashish, M. Tanmoy, and H. K. Sardana, “A novel strategy for automatic localization of cephalometric landmarks,” in Proceedings of IEEE International Conference on Computer Engineering and Technology, pp. V3284–V3288, Tianjin, China, 2010. View at: Google Scholar
 T. Mondal, A. Jain, and H. K. Sardana, “Automatic craniofacial structure detection on cephalometric images,” IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2606–2614, 2011. View at: Publisher Site  Google Scholar
 A. Kaur and C. Singh, “Automatic cephalometric landmark detection using Zernike moments and template matching,” Signal, Image and Video Processing, vol. 9, no. 1, pp. 117–132, 2015. View at: Publisher Site  Google Scholar
 D. B. Forsyth and D. N. Davis, “Assessment of an automated cephalometric analysis system,” European Journal of Orthodontics, vol. 18, no. 5, pp. 471–478, 1996. View at: Publisher Site  Google Scholar
 S. Rueda and M. Alcaniz, “An approach for the automatic cephalometric landmark detection using mathematical morphology and active appearance models,” in Proceedings of Medical Image Computing and ComputerAssisted Intervention (MICCAI 2006), pp. 159–166, Copenhagen, Denmark, October 2006, Lecture Notes in Computer Science, R. Larsen, Ed. View at: Google Scholar
 W. Yue, D. Yin, C. Li et al., “Automated 2D cephalometric analysis on Xray images by a modelbased approach,” IEEE Transactions on Biomedical Engineering, vol. 53, no. 8, pp. 1615–1623, 2006. View at: Publisher Site  Google Scholar
 R. Kafieh, A. mehri, S. sadri et al., “Automatic landmark detection in cephalometry using a modified Active Shape Model with sub image matching,” in Proceedings of IEEE International Conference on Machine Vision, pp. 73–78, Islamabad, Pakistan, 2008. View at: Publisher Site  Google Scholar
 J. Keustermans, W. Mollemans, D. Vandermeulen, and P. Suetens, “Automated cephalometric landmark identification using shape and local appearance models,” in Proceedings of 20th International Conference on Pattern Recognition, pp. 2464–2467, Istanbul, Turkey, August 2010. View at: Publisher Site  Google Scholar
 P. Vucinic, Z. Trpovski, and I. Scepan, “Automatic landmarking of cephalograms using active appearance models,” European Journal of Orthodontics, vol. 32, no. 3, pp. 233–241, 2010. View at: Publisher Site  Google Scholar
 S. Chakrabartty, M. Yagi, T. Shibata et al., “Robust cephalometric landmark identification using support vector machines,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 825–828, Vancouver, Canada, May 2003. View at: Publisher Site  Google Scholar
 I. ElFeghi, M. A. SidAhmed, and M. Ahmadi, “Automatic localization of craniofacial landmarks for assisted cephalometry,” Pattern Recognition, vol. 37, no. 3, pp. 609–621, 2004. View at: Publisher Site  Google Scholar
 R. Leonardi, D. Giordano, and F. Maiorana, “An evaluation of cellular neural networks for the automatic identification of cephalometric landmarks on digital images,” Journal of Biomedicine and Biotechnology, vol. 2009, Article ID 717102, 12 pages, 2009. View at: Publisher Site  Google Scholar
 L. Favaedi, M. Petrou, and IEEE, “Cephalometric landmarks identification using probabilistic relaxation,” in Proceedings of 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 4391–4394, IEEE Engineering in Medicine and Biology Society Conference Proceedings, Buenos Aires, Argentina, August 2010. View at: Publisher Site  Google Scholar
 M. Farshbaf and A. A. Pouyan, “Landmark detection on cephalometric radiology images through combining classifiers,” in Proceedings of 2010 17th Iranian Conference of Biomedical Engineering (ICBME), pp. 1–4, Isfahan, Iran, November 2010. View at: Publisher Site  Google Scholar
 L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at: Publisher Site  Google Scholar
 C. Lindner and T. F. Cootes, “Fully automatic cephalometric evaluation using random forest regressionvoting,” in Proceedings of International Symposium on Biomedical Imaging (ISBI), Brooklyn, NY, USA, May 2015, Grand Challenges in Dental Xray Image Analysis—Automated Detection and Analysis for Diagnosis in Cephalometric Xray Image. View at: Google Scholar
 C. Lindner, C.W. Wang, C.T. Huang et al., “Fully automatic system for accurate localisation and analysis of cephalometric landmarks in lateral cephalograms,” Scientific Reports, vol. 6, no. 1, 2016. View at: Publisher Site  Google Scholar
 B. Ibragimov et al., “Computerized cephalometry by game theory with shape and appearancebased landmark refinement,” in Proceedings of International Symposium on Biomedical imaging (ISBI), Brooklyn, NY, USA, May 2015, Grand Challenges in Dental Xray Image Analysis—Automated Detection and Analysis for Diagnosis in Cephalometric Xray Image. View at: Google Scholar
 C.W. Wang, C.T. Huang, M.C. Hsieh et al., “Evaluation and comparison of anatomical landmark detection methods for cephalometric Xray images: a grand challenge,” IEEE Transactions on Medical Imaging, vol. 34, no. 9, pp. 1890–1900, 2015. View at: Publisher Site  Google Scholar
 C. W. Wang, C. T. Huang, J. H. Lee et al., “A benchmark for comparison of dental radiography analysis algorithms,” Medical Image Analysis, vol. 31, pp. 63–76, 2016. View at: Publisher Site  Google Scholar
 R. Vandaele, J. Aceto, M. Muller et al., “Landmark detection in 2D bioimages for geometric morphometrics: a multiresolution treebased approach,” Scientific Reports, vol. 8, no. 1, 2018. View at: Publisher Site  Google Scholar
 D. G. Lowe, “Distinctive image features from scaleinvariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. View at: Publisher Site  Google Scholar
 A. Vedaldi and B. Fulkerson, “VLFeat: an open and portable library of computer vision algorithms,” in Proceedings of Conference on ACM multimedia, pp. 1469–1472, Firenze, Italy, October 2010, http://www.vlfeat.org/. View at: Google Scholar
 C. Qiu, L. Jiang, and C. Li, “Randomly selected decision tree for testcost sensitive learning,” Applied Soft Computing, vol. 53, pp. 27–33, 2017. View at: Publisher Site  Google Scholar
 K. Kim and J. S. Hong, “A hybrid decision tree algorithm for mixed numeric and categorical data in regression analysis,” Pattern Recognition Letters, vol. 98, pp. 39–45, 2017. View at: Publisher Site  Google Scholar
 L. Breiman and J. H. Friedman, Classification and Regression Trees, CRC Press, Boca Raton, FL, USA, 1984.
 C. Lindner, P. A. Bromiley, M. C. Ionita et al., “Robust and accurate shape model matching using random forest regressionvoting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1862–1874, 2015. View at: Publisher Site  Google Scholar
 J. Gall and V. S. Lempitsky, “Classspecific Hough forests for object detection,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 143–157, Miami, FL, USA, June 2009. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2018 Shumeng Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.