#### Abstract

Cephalometric analysis is a standard tool for assessment and prediction of craniofacial growth, orthodontic diagnosis, and oral-maxillofacial treatment planning. The aim of this study is to develop a fully automatic system of cephalometric analysis, including cephalometric landmark detection and cephalometric measurement in lateral cephalograms for malformation classification and assessment of dental growth and soft tissue profile. First, a novel method of multiscale decision tree regression voting using SIFT-based patch features is proposed for automatic landmark detection in lateral cephalometric radiographs. Then, some clinical measurements are calculated by using the detected landmark positions. Finally, two databases are tested in this study: one is the benchmark database of 300 lateral cephalograms from 2015 ISBI Challenge, and the other is our own database of 165 lateral cephalograms. Experimental results show that the performance of our proposed method is satisfactory for landmark detection and measurement analysis in lateral cephalograms.

#### 1. Introduction

Cephalometric analysis is a scientific research approach for assessment and prediction of craniofacial growth, orthodontic diagnosis, and oral-maxillofacial treatment planning for patients with malocclusion in clinical practice [1]. We focus on 2D lateral cephalometric analysis, which is performed on cephalometric radiographs in lateral view. Cephalometric analysis has undergone three stages of development: manual stage, computer-aided stage, and computer-automated stage. Cephalometric analysis based on radiographs was introduced by Broadbent [2] and Hofrath [3] for the first time in 1931. In the first stage, it consists of five steps to obtain the cephalometric analysis: (a) placing a sheet of acetate over the cephalometric radiograph; (b) manual tracing of craniofacial anatomical structures; (c) manual marking of cephalometric landmarks; (d) measuring angular and linear parameters using the landmark locations; and (e) analysis/classification of craniomaxillofacial hard tissue and soft tissue [4]. This process is tedious, time consuming, and subjective. In the second stage, the first step in traditional cephalometric analysis has been skipped since the cephalometric radiograph is digitized. Furthermore, the next two steps can be operated by computer, and the measurement can be automatically calculated by software. However, this computer-aided analysis is still time consuming and the results are not reproducible due to large inter- and intravariability error in landmark annotation. In the third stage, the most crucial step, i.e., identifying landmarks, can be automatized by image processing algorithms [5]. The automatic analysis has high reliability and repeatability, and it can save a lot of time for the orthodontists. However, fully automatic cephalometric analysis is challenging due to overlaying structures and inhomogeneous intensity in cephalometric radiographs as well as anatomical differences among subjects.

Cephalometric landmarks include corners, line intersections, center points and other salient features of anatomical structures. They always have stable geometrical locations in anatomical structures in cephalograms. In this study, 45 landmarks are used in lateral cephalograms (refer to Table 1). The cephalometric landmarks play a major role in calculating the cephalometric planes, which are the lines between two landmarks in 2D cephalometric radiographs as shown in Table 2. Furthermore, different measurements are calculated using different cephalometric landmarks and planes, and they are used to analyze different anatomical structures [4]. Measurements can be angles or distance, which are used to analyze skeletal and dental anatomical structures, as well as soft tissue profile. To conduct a clinical diagnosis, many analytic approaches have been developed, including Downs analysis, Wylie analysis, Riedel analysis, Steiner analysis, Tweed analysis, Sassouni analysis, Bjork analysis, and so on [6].

Methods of automatic cephalometric landmark detection are mainly separated into three categories: (1) bottom-up methods; (2) deformation model-based methods; and (3) classifier/regressor-based methods. The first category is bottom-up methods, while the other two categories are learning-based methods.

For bottom-up methods, two techniques were usually employed, including edge detection and template matching. Edge-based methods were to extract the anatomical contours in cephalograms, and then the relative landmarks were identified on the contours using prior knowledge. Levy-Mandel et al. [7] first proposed the edge tracking method for identifying craniofacial landmarks. First, input images were smoothed by median filter and then edges were extracted by Mero-Vassy operator. Second, contours are obtained by the edge tracking technique based on the constraints of locations, endings, and breaking segments and edge linking conditions. Finally, the landmarks are detected on contours according to their definition in cephalometry. This method was only tested by two high quality cephalometric radiographs. Furthermore, 13 landmarks on noncontours were not detected. A two-stage landmark detection method was proposed by Grau et al. [8], which first extracted the major line features in high contrast by line-detection module, such as contours of jaws and nose, and then detected landmarks by pattern recognition based on mathematical morphology. Edge-based methods could only detect the landmarks on contours and were not robust to noise, low contrast, and occlusion. Template-matching-based methods were to find a most likely region with the least distance to the template of the specific landmark, and the center of the region was considered as the estimated position of the specific landmark. Therefore, not only the landmarks on contours, but also the landmarks on noncontours can be detected. Ashish et al. [9] proposed a template-matching method using a coarse-to-fine strategy for cephalometric landmark detection. Mondal et al. [10] proposed an improved method from Canny edge extraction to detect the craniofacial anatomical structures. Kaur and Singh [11] proposed an automatic cephalometric landmark detection using Zernike moments and template matching. Template matching based methods had difficulty in choosing the representative template and were not robust to anatomical variability in individual. All these methods strongly depended on the quality of images.

Subsequently, deformation model-based methods were proposed for these limitations by using shape constraints. Forsyth and Davis [12] reviewed and evaluated the approaches reported from 1986 to 1996 on automatic cephalometric analysis systems, and they highlighted a cephalometric analysis system presented by Davis and Taylor, which introduced an appearance model into landmark detection. The improved algorithms of active shape/appearance models were then used to refine cephalometric landmark detection combining with template matching, and higher accuracy was obtained [13–17]. Here, shape or appearance models were learned from training data to regularize the searching through all landmarks in testing data. However, it was difficult to initialize landmark positions for searching, because the initialization was always achieved by traditional methods.

Recently, significant progress has been made for automatic landmark detection in cephalograms by using supervised machine-learning approaches. These machine-learning approaches can be further separated into two classes: classification and regression models. A support vector machine (SVM) classifier was used to predict the locations of landmarks in cephalometric radiographs, while the projected principal-edge distribution was proposed to describe edges as the feature vector [18]. El-Feghi et al. [19] proposed a coarse-to-fine landmark detection algorithm, which first used fuzzy neural network to predict the locations of landmarks, and then refined the locations of landmarks by template matching. Leonardi et al. [20] employed the cellular neural networks approach for automatic cephalometric landmark detection on softcopy of direct digital cephalometric X-rays, which was tested by 41 cephalograms and detected 10 landmarks. Favaedi et al. [21] proposed a probability relaxation method based on shape features for cephalometric landmark detection. Farshbaf and Pouyan et al. [22] proposed a coarse-to-fine SVM classifier to predict the locations of landmarks in cephalograms, which used histograms of oriented gradients for coarse detection and histograms of gray profile for refinement. Classifier-based methods were successfully used to identify the landmark from the whole cephalograms, but positive and negative samples were difficult to be balanced in training data, and computational complexity is increased due to pixel-based searching.

Many algorithms have been reported for automatic cephalometric landmark detection, but results were difficult to compare due to the different databases and landmarks. The situation has been better since two challenges were held at 2014 and 2015 IEEE International Symposium on Biomedical Imaging (ISBI). During the challenges, regressor-based methods were first introduced to automatic landmark detection in lateral cephalograms. The challenges aimed at automatic cephalometric landmark detection and using landmark positions to measure the cephalometric linear and angular parameters for automatic assessment of anatomical abnormalities to assist clinical diagnosis. Benchmarks have been achieved by using random forest (RF) [23] regression voting based on shape model matching at the challenges. In particular, Lindner et al. [24, 25] presented the algorithm of random forest regression voting (RFRV) in the constrained local model (CLM) framework for automatic landmark detection, which obtained the mean error of 1.67 mm, and the successful detection rate of 73.68% in precision range of 2 mm. The algorithm of RFRV-CLM won the challenges, and the RF-based classification algorithm combined with Haar-like appearance features and game theory [26] was in rank 2. The comprehensive performance analysis among those algorithms for the challenges were reported in [27, 28]. Later, Vandaele et al. [29] proposed an ensemble tree-based method using multiresolution features in bioimages, and the method was tested by three databases including a cephalogram database of 2015 ISBI challenge. Now, public database and evaluation are available to improve the performance of automatic analysis system for cephalometric landmark detection and parameter measurement, and many research outcomes have been achieved. However, automatic cephalometric landmark detection and parameter measurement for clinical practice is still challenging.

In recent days, efforts have been made to develop automatic cephalometric analysis systems for clinical usages. In this paper, we present a new automatic cephalometric analysis system, including landmark detection and parameter measurement in lateral cephalograms, and the block diagram of our system is shown in Figure 1. The core of this system is automatic landmark detection, which is realized by a new method based on multiresolution decision tree regression voting. Our main contributions can be concluded in four aspects: (1) we propose a new landmark detection framework of multiscale decision tree regression voting; (2) SIFT-based patch feature is first employed to extract the local feature of cephalometric landmarks, and the proposed approach is flexible when extending to detection of more landmarks because feature selection and shape constraints are not used; (3) two clinical databases are used to evaluate the extension of the proposed method. Experimental results show that our method can achieve robust detection when extending from 19 landmarks to 45 landmarks; (4) automatic measurement of clinical parameters is implemented in our system based on the detected landmarks, which can facilitate clinical diagnosis and research.

The rest of this paper is organized as follows. Section 2 describes our proposed method of automatic landmark detection based on multiscale decision tree regression voting and parameter measurement. Experimental results for 2015 ISBI Challenge cephalometric benchmark database and our own database are presented in Section 3. The discussion of the proposed system is given in Section 4. Finally, we conclude this study with expectation of future work in Section 5.

#### 2. Method

##### 2.1. Landmark Detection

It is well known that cephalometric landmark detection is the most important step in cephalometric analysis. In this paper, we propose a new framework of automatic cephalometric landmark detection, which is based on multiresolution decision tree regression voting (MDTRV) using patch features.

###### 2.1.1. SIFT-Based Patch Feature Extraction

*(1). SIFT Feature*. Scale invariant feature transform (SIFT) is first proposed for key point detection by Lowe [30]. The basic idea of SIFT feature extraction algorithm consists of four steps: (1) local extrema point detection in scale-space by constructing the difference-of-Gaussian image pyramids; (2) accurate key point localization including location, scale, and orientation; (3) orientation assignment for invariance to image rotation; and (4) key point descriptor as the local image feature. The advantage of SIFT features to represent the key points is affine invariant and robust to illumination change for images. This method of feature extraction has been commonly used in the fields of image matching and registration.

*(2) SIFT Descriptor for Image Patch*. Key points with discretized descriptors can be used as visual words in the field of image retrieval. Histogram of visual words can then be used by a classifier to map images to abstract visual classes. Most successful image representations are based on affine invariant features derived from image patches. First, features are extracted on sampled patches of the image, either using a multiresolution grid in a randomized manner, or using interest point detectors. Each patch is then described using a feature vector, e.g., SIFT [30]. In this paper, we use the SIFT feature vectors [31] to represent image patches, which can be used by a regressor to map patches to the displacements from each landmark.

The diagram of SIFT feature descriptor of an image patch is illustrated in Figure 2. An image patch centered at location (*x*, *y*) will be described by a square window of length 2*W* + 1, where *W* is a parameter. For each square image patch *P* centered at position (*x*, *y*) of length 2*W* + 1, the gradient and of image patch *P* is computed by using finite differences. The gradient magnitude is computed byand the gradient angle (measured in radians, clockwise, starting from the *X* axis) is calculated by

Each square window is separated into 4 × 4 adjacent small windows, and the 8-bin histogram of gradients (direction angles started from 0 to (2) is extracted in each small window. For each image patch, 4 × 4 histograms of gradients are concatenated to the resulting feature vector (dimension = 128). Finally, the feature of image patch *P* can be described as SIFT descriptor. The example of SIFT-based patch feature extraction in a cephalogram is shown in Figure 3.

**(a)**

**(b)**

**(c)**

###### 2.1.2. Decision Tree Regression

Decision tree is a classical and efficient statistical learning algorithm and has been widely used to solve classification and regression problems [32, 33]. Decision trees predict responses to data. To predict a response, query the new data by the decisions from the root node to a leaf node in the tree. The leaf node contains the response. Classification trees give responses that are nominal, while regression trees give numeric responses. In this paper, CART (classification and regression trees) [34], a binary tree, is used as the regressors to learn the mapping relationship between the SIFT-based patch feature vectors *f* and the displacement vectors from the centers of the patches to the position of each landmark in the training images. The regressors are then used to predict the displacements using patch feature vectors of the test image. Finally, the predicted displacements are used to obtain the optimal location of each landmark via voting. One advantage of the regression approach is to avoid balancing positive and negative examples for the training of classifiers. Another advantage of using regression, rather than classification, is that good results can be obtained by evaluating the region of interest on randomly sampling pixels rather than at every pixel.

*(1) Training*. In training process, a regression tree can be constructed for each landmark *l* via splitting. Here, optimization criterion and stopping rules are used to determine how a decision tree is to grow. The decision tree can be improved by pruning or selecting the appropriate parameters.

The optimization criterion is to choose a split to minimize the mean-squared error (MSE) of predictions compared to the ground truths in the training data. Splitting is the main process of creating decision trees. In general, four steps are performed to split node . First, for each observation (i.e., the extracted feature in the training data), the weighted MSE of the responses (displacements ) in node is computed by usingwhere is the set of all observation indices in node and is the sample size. Second, the probability of an observation in node is calculated bywhere is the weight of observation . In this paper, set = 1/*N*. Third, all elements of the observation are sorted in ascending order. Every element is regraded as a splitting candidate. is the unsplit set of all observation indices corresponding to missing values. Finally, the best way to split node using is determined by maximizing the reduction in MSE among all splitting candidates. For all splitting candidates in the observation , the following steps are performed:(1)Split the observations in node *t* into left and right child nodes ( and , respectively).(2)Compute the reduction in MSE . For a particular splitting candidate, and represent the observation indices in the sets and , respectively. If does not contain any missing values, then the reduction in MSE for the current splitting candidate is

*t*that are not missing.(3)Choose the splitting candidate that yields the largest MSE reduction. In this way, the observations in node are split at the candidate that maximize the MSE reduction.

To stop splitting nodes of the decision tree, two rules can be followed: (1) it is pure of the node that the MSE for the observed response in this node is less than the MSE for the observed response in the entire data multiplied by the tolerance on quadratic error per node and (2) the decision tree reaches to the setting values for depth of the regression decision tree, for example, the maximum number of splitting nodes *max_splits*.

The simplicity and performance of a decision tree should be considered to improve the performance of the decision tree at the same time. A deep tree usually achieves high accuracy on the training data. However, the tree is not to obtain high accuracy on a test data as well. It means that a deep tree tends to overfit, i.e., its test accuracy is much less than its training accuracy. On the contrary, a shallow tree does not achieve high training accuracy, but can be more robust, i.e., its training accuracy could be similar to that of a test data. Moreover, a shallow tree is easy to interpret and saves time for prediction. In addition, tree accuracy can be obtained by cross validation, when there are not enough data for training and testing.

There are two ways to improve the performance of decision trees by minimizing cross-validated loss. One is to select the optimal parameter value to control depth of decision trees. The other is postpruning after creating decision trees. In this paper, we use the parameter *max_splits* to control the depth of resulting decision trees. Setting a large value for *max_splits* lends to growing a deep tree, while setting a small value for *max_splits* yields a shallow tree with larger leaves. To select the appropriate value for *max_splits*, the following steps are performed: (1) set a spaced set of values from 1 to the total sample size for *max_splits* per tree; (2) create cross-validated regression trees for the data using the setting values for *max_splits* and calculate the cross-validated errors; and (3) the appropriate value of *max_splits* can be obtained by minimizing the cross-validated errors.

*(2) Prediction*. In prediction process, you can easily predict responses for new data after creating a regression tree . Suppose **f_new** is the new data (i.e., a feature vector extracted from a new patch *P_new* in the test cephalogram). According to the rules of the regression tree, the nodes select the specific attributes from the new observation **f_new** and reach the leaf step by step, which stores the mean displacement **d_new**. Here, we predict the displacement from the center of patch *P_new* to the landmark *l* using SIFT feature vector by the regressor .

###### 2.1.3. MDTRV Using SIFT-Based Patch Features

*(1) Decision Tree Regression Voting (DTRV)*. As illustrated in Figure 4, the algorithm of automatic cephalometric landmark detection using decision tree regression voting (DTRV) in single scale consists of training and testing processes as a supervised learning algorithm. The training process begins by feature extraction for patches sampled from the training images. Then, a regression tree is constructed for each landmark with inputting feature vectors and displacements (**f**, **d**). Here, **f** represents the observations and **d** represents the targets for this regression problem. The testing process begins by the same step of feature extraction. Then, the resulting **f_new** is used to predict the displacement **d_new** by regressor . In the end, the optimal landmark position is obtained via voting. The voting style includes single unit voting and single weighted voting. As mentioned in [35], these two styles perform equally well in application of voting optimal landmark positions for medical images. In this paper, we use the single unit voting.

*(2) MDTRV Using SIFT-Based Patch Features*. Finally, a new framework of MDTRV using SIFT-based patch features is proposed by using a simple and efficient strategy for accurate landmark detection in lateral cephalograms. There are four stages in the proposed algorithm of MDTRV, which are iteratively performed in the scales of 0.125, 0.25, 0.5, and 1. When the scale is 0.125, the training *K* patches with *K* displacements are randomly sampled in a whole cephalogram for a specific landmark . The decision tree regressor is created by using these training samples. For prediction, SIFT features are extracted for testing patches sampled in the whole cephalogram, and displacements are predicted by regressor using extracted features. The optimal position of the specific landmark *l* is obtained via voting through all predicted displacements. When the scale is 0.25, 0.5, or 1, only the patch sampling rule is different from the procedure in scale of 0.125. That is, the training and testing patches are randomly sampled in the (2*S* + 1) × (2*S* + 1) neighborhood of true and initial landmark positions. The estimated landmark position is used as the initial landmark position in the next scale of testing process. The optimal position of the specific landmark *l* in scale of 1 is refined by using Hough forest [36], which is regarded as the final resulting location. This multiresolution coarse-to-fine strategy has greatly improved the accuracy of cephalometric landmark detection.

##### 2.2. Parameter Measurement

Measurements are either angular or linear parameters calculated by using cephalometric landmarks and planes (refer to Tables 1 and 2). According to geometrical structure, measurements can be classified into five classes: the angle of three points, the angle of two planes, the distance between two points, the distance from a point to a plane, and the distance between two points projected to a plane. All measurements can be calculated automatically in our system as described in the following.

###### 2.2.1. The Angle of Three Points

Assume point B is the vertex among three points A, B, and C, then the angle of these three points is calculated byandwhere represents the Euclidean distance of *A* and *B*.

###### 2.2.2. The Angle between Two Planes

As the cephalometric radiographs are 2D images in this study, the planes are projected as straight lines. Thus, the planes are determined by two points. The angle between two planes and is calculated by

###### 2.2.3. The Distance between Two Points

The distance between two points is calculated using the following equation: is a scale indicator that can be calculated by the calibrated gauge in the lateral cephalograms and its unit is mm/pixel.

###### 2.2.4. The Distance from a Point to a Plane in the Horizontal Direction

The plane is defined as

Thus, the distance of point C to the plane is calculated by

###### 2.2.5. The Distance between Two Points Projected to a Plane

The calculation of the distance between two points projected to a plane is illustrated in Figure 5. In order to calculate the distance between two points projected to a plane, first we use Equations (4)–(12) to determine the plane . Then, we calculate the distance from two points *C* and *D* to the plane as and by Equations (4)–(13). Third, the distance between two points C and D is calculated by Equations (4)–(11). Finally, the distance between two points *C* and *D* projected to the plane is represented as and is calculated by

#### 3. Experimental Evaluation

##### 3.1. Data Description

###### 3.1.1. Database of 2015 ISBI Challenge

The benchmark database (database1) included 300 cephalometric radiographs (150 for TrainingData, 150 for Test1Data), which is described in Wang et al. [28]. All of the cephalograms were collected from 300 patients aged from 6 to 60 years old. The image resolution was 1935 × 2400 pixels. For evaluation, 19 landmarks were manually annotated by two experienced doctors in each cephalogram (see illustrations in Figure 6); the ground truth was the average of the annotations by both doctors, and eight clinical measurements were used for classification of anatomical types.

**(a)**

**(b)**

###### 3.1.2. Database of Peking University School and Hospital of Stomatology

The database2 included 165 cephalometric radiographs as illustrated in Table 3 (the IRB approval number is PKUSSIRB-201415063). There were 55 cephalograms collected from 55 Chinese adult subjects (29 females, 26 males) who were diagnosed as skeletal class I (0 < ANB < 4) with minor dental crowding and a harmonious. The other 110 cephalograms were collected from the 55 skeletal class III patients (ANB < 0) (32 females, 23 males), who underwent combined surgical orthodontic treatment at the Peking University School and Hospital of Stomatology from 2010 to 2013. The image resolutions were different in the range from 758 × 925 to 2690 × 3630 pixels. For evaluation, 45 landmarks were manually annotated by two experienced orthodontists in each cephalogram as shown in Figure 7; the ground truth was the average of annotations by both doctors, and 27 clinical measurements were used for future cephalometric analysis.

**(a)**

**(b)**

###### 3.1.3. Experimental Settings

In the initial scale of landmark detection, for training, *K* = 50; for prediction, = 400. In the other scales, the additional parameters *W* and *S* are set to 48 and 40, separately. Because the image resolutions are different, preprocessing is required to rescale all the images to the fixed width of 1960 pixels in database2. For database2, we test our algorithm by using 5-fold cross validation.

##### 3.2. Landmark Detection

###### 3.2.1. Evaluation Criterion

The first evaluation criterion is the mean radial error with the associated standard deviation. The radial error , i.e., the distance between the predicted position and the true position of each landmark, is defined aswhere , represent the estimated and true positions of each landmark for all cephalograms in the dataset; , are the corresponding positions in set and , respectively; and is the total number of cephalograms in the dataset. The mean radial error (MRE) and the associated standard deviation (SD) for each landmark are defined as

The second evaluation criterion is the success detection rate with respect to the 2.5 mm, 3 mm, 5 mm, and 10 mm precision ranges. If is less than a precision range, the detection of the landmark is considered as a successful detection in the precision range; otherwise, it is considered as a failed detection. The success detection rate (SDR) with precision less than is defined aswhere denotes four precision ranges used in the evaluation, including 2.0 mm, 2.5 mm, 3 mm, and 4 mm.

###### 3.2.2. Experimental Results

*(1) Results of database1*. Experimental results of cephalometric landmark detection using database1 is shown in Table 4. The MREs of landmarks L10, L4, L19, L5, and L16 are more than 2 mm. The other 14 landmarks are all within the MRE of 2 mm, in which 3 landmarks are within the MRE of 1 mm. The average MRE and SD of 19 landmarks are 1.69 mm and 1.43 mm, respectively. It shows that the detection of cephalometric landmarks is accurate by our proposed method. The SDRs are 73.37%, 79.65%, 84.46%, and 90.67% within the precision ranges of 2.0, 2.5, 3.0, and 4.0 mm, respectively. In 2 mm precision range, the SDR is more than 90% for four landmarks (L9, L12, L13, L14); the SDR is between 80% and 90% for six landmarks (L1, L7, L8, L11, L15, L17); the SDR is between 70% and 80% for two landmarks (L6 and L18); the SDRs for the other seven landmarks are less than 70%, where the SDR of landmark L10 is the lowest. It can be seen from Table 4 that the landmark L10 is the most difficult to detect accurately.

Comparison of our method with three state-of-the-art methods using Test1data is shown in Table 5. The difference between MRE of our proposed method and RFRV-CLM in [24] is less than 1 pixel, which means that their performance is comparable within the resolution of image.

Furthermore, we conduct the comparison with the top two methods in 2015 ISBI Challenge of cephalometric landmark detection in terms of MRE, SD, and SDR as illustrated in Table 6. Our method achieves the SDR of 73.37% within the precision range of 2.0 mm, which is similar to the SDR of 73.68% of the best method. The MRE of our proposed method is 1.69 mm, which is comparable to that of method of RFRV-CLM, but the SD of our method is less than that of method of RFRV-CLM. The results show that our method is accurate for landmark detection in lateral cephalograms and is more robust.

*(2) Results of database2*. Two examples of landmark detection in database2 are shown in Figure 8. It can be observed from the figure that the predicted locations of 45 landmarks in cephalograms are near to the ground truth locations, which shows the success of our proposed algorithm for landmark detection.

**(a)**

**(b)**

For the quantitative assessment of the proposed algorithm, the statistical result is shown in Table 7. The MRE and SD of the proposed method to detect 45 landmarks are 1.71 mm and 1.39 mm, respectively. The average SDRs of 45 landmarks within the precision range 2.0 mm, 2.5 mm, 3 mm, and 4 mm are 72.08%, 80.63%, 86.46%, and 93.07%, respectively. The experimental results in database2 show comparable performance to the results of database1. It indicates that the proposed method can be successfully applied to the detection of more clinical landmarks.

##### 3.3. Measurement Analysis

###### 3.3.1. Evaluation Criterion

For the classification of anatomical types, eight cephalometric measurements were usually used. The description of these eight measurements and the methods for classification are explained in Tables 8 and 9, respectively.

One evaluation criterion for measurement analysis is the success classification rate (SCR) for these 8 popular methods of analysis, which is calculated using confusion matrix. In the confusion matrix, each column represents the instances of an estimated type, while each row represents the instances of the ground truth type. The SCR is defined as the averaged diagonal of the confusion matrix.

For evaluation, the performance of the proposed system for more measurement analysis in database2, another criterion, the mean absolute error (MAE), is calculated bywhere is the value of angular or linear measurement estimated by our system and is the ground truth measurement obtained using human annotated landmarks.

###### 3.3.2. Experimental Results

*(1) Results of database1*. Using the detected 19 landmarks, 8 cephalometric measurements are calculated for classification of anatomical types in database1. The comparison of our method to the best two methods is given in Table 10, where we have achieved the best classification results for two measurements of APDI and FHI. The average SCR obtained by our method is 75.03%, which is much better than the method in [26] and is comparable to the method of RFRV-CLM [24].

*(2) Results of database2*. According to the detected 45 landmarks, 27 measurements are automatically calculated by our system using database2, including 17 angular and 10 linear measurements, which are illustrated in Table 11.

The MAE and of 27 measurements are illustrated in Table 12. Here, the MAE represents the performance of measurement analysis of our automatic system. The represents interobserver variability calculated between the ground truth values obtained by the two experts. For angular measurements, the difference between MAE and is within 0.5° for 9/17 measurements; the difference between MAE and is within 1° for 12/17 measurements, and the difference between MAE and is within 2° for 16/17 measurements. In particular, the MAE of *Z* Angle is less than . The other one measurement has the difference of 2.24°. For linear measurements, the difference between MAE and is within 0.5 mm for 9/10 measurements. The other one measurement has the difference of 1.16 mm. The results show that our automatic system is efficient and accurate for measurement analysis.

#### 4. Discussion

The interobserver variability of human annotation is analyzed. Table 13 shows that the MRE and SD between annotations by two orthodontists are 1.38 mm and 1.55 mm for database1, respectively [27]; for database2, the MRE and SD between annotations of two orthodontists are 1.26 mm and 1.27 mm, respectively. Experimental results show that the proposed algorithm of automatic cephalometric landmark detection can achieve the MRE of less than 2 mm and the SD almost equal to that of manual marking. Therefore, the performance of automatic landmark detection by the proposed algorithm is comparable to manual marking in term of the interobserver variability between two clinical experts. Furthermore, the detected landmarks are used to calculate the angular and linear measurements in lateral cephalograms. The satisfactory results of measurement analysis are presented in experiments based on the accurate landmark detection. It shows that the proposed algorithm has the potential to be applied in clinical practice of cephalometric analysis for orthodontic diagnosis and treatment planning.

The detection accuracy of some landmarks is lower than the average value, and there are mainly three main reasons: (i) some landmarks are located at the overlaying anatomical structures, such as landmarks Go, UL5, L6E, UL6, and U6E; (ii) some landmarks have large variability of manual marking due to large anatomical variability among subjects especially in abnormality, such as landmarks A, ANS, Pos, Ar, Ba, and Bolton; and (iii) structural information is not obvious due to little intensity variability in the neighborhood of some landmarks in images, such as landmarks P, Co, L6A, and U6A.

The proposed system follows clinical cephalometric analysis procedure, and the accuracy of the system can be evaluated by the manual marking accuracy. There are several limitations in this study. On one hand, except for the algorithm, the data effects to the performance of the system include three aspects: (i) the quality of the training data; (ii) the size of the training dataset; and (iii) the shape and appearance variation exhibited in the training data. On the other hand, performance of the system depends on consistency between the training data and the testing data, similarly as any supervised learning-based methods.

#### 5. Conclusion

In conclusion, we design a new framework of landmark detection in lateral cephalograms with low-to-high resolutions. In each image resolution, decision tree regression voting is employed in landmark detection. The proposed algorithm takes full advantage of image information in different resolutions. In lower resolution, the primary local structure information rather than local detail information can be extracted to predict the positions of anatomical structure involving the specific landmarks. In higher resolution, the local structure information involves more detail information, and it is useful for prediction positions of landmarks in the neighborhood. As demonstrated in experimental results, the proposed algorithm has achieved good performance. Compared with state-of-the-art methods using the benchmark database, our algorithm has obtained the comparable accuracy of landmark detection in terms of MRE, SD, and SDR. Tested by our own clinical database, our algorithm has also obtained average 72% successful detection rate within precision range of 2.0 mm. In particular, 45 landmarks have been detected in our database, which is over two times of the number of landmarks in the benchmark database. Therefore, the extensibility of the proposed algorithm is confirmed using this clinical dataset. In addition, automatic measurement of clinical parameters has also achieved satisfactory results. In the future, we will put more efforts to improve the performance of automatic analysis in lateral cephalograms so that the automatic system can be utilized in clinical practice to obtain objective measurement. More research will be conducted to reduce the computational complexity of the algorithm as well.

#### Data Availability

The database1 of 2015 ISBI Challenge is available at http://www.ntust.edu.tw/∼cweiwang/ISBI2015/challenge1/index.html. The database2 of Peking University School and Hospital of Stomatology cannot be made public available due to privacy concerns for the patients.

#### Disclosure

This work is based on research conducted at Beijing Institute of Technology.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The authors would like to express their gratitude to Department of Orthodontics, Peking University School and Hospital of Stomatology, China, for the supply of image database2.