Abstract

The increasing number of imaging studies and the prevailing application of positron emission tomography (PET) in clinical oncology have led to a real need for efficient PET volume handling and the development of new volume analysis approaches to aid the clinicians in the clinical diagnosis, planning of treatment, and assessment of response to therapy. A novel automated system for oncological PET volume analysis is proposed in this work. The proposed intelligent system deploys two types of artificial neural networks (ANNs) for classifying PET volumes. The first methodology is a competitive neural network (CNN), whereas the second one is based on learning vector quantisation neural network (LVQNN). Furthermore, Bayesian information criterion (BIC) is used in this system to assess the optimal number of classes for each PET data set and assist the ANN blocks to achieve accurate analysis by providing the best number of classes. The system evaluation was carried out using experimental phantom studies (NEMA IEC image quality body phantom), simulated PET studies using the Zubal phantom, and clinical studies representative of nonsmall cell lung cancer and pharyngolaryngeal squamous cell carcinoma. The proposed analysis methodology of clinical oncological PET data has shown promising results and can successfully classify and quantify malignant lesions.

1. Introduction

Positron emission tomography (PET) volume analysis is vital for various clinical applications including artefact reduction and removal, tumour quantification in staging, a process which analyses the development of tumours over time, and to aid in radiotherapy treatment planning [1, 2]. PET has been progressively incorporated into the management of patients. Results of clinical studies using fluorodeoxyglucose (FDG)-PET have demonstrated its added value in the diagnosis, staging, and evaluation of response to therapy [35]. The utilisation of advanced high performance analysis approaches will be useful in aiding clinicians in diagnosis and radiotherapy planning. Although the task of medical volume analysis appears simple, the reality is that an indepth knowledge of the anatomy and physiology is required to perform such task on clinical medical images. Essentially, the expert observes a particular slice, determines borders between regions, and classifies each region. This is commonly completed slice by slice for a 3D volume and requires a reslicing of data into the transaxial, sagittal, and coronal planes. In addition to this, identifying smaller slice features and contrast modifications are often required. Although, for a typical 3D data set, the entire expert manual analysis can take several hours to complete, this approach is perhaps the most reliable and accurate method of medical volume analysis. This is due to the immense complexity of the human visual system, a system well suited to this task [69].

The main challenges associated with PET are the statistical noise and the low resolution which results in a significant partial volume effect. This effect should be reduced to the minimum level so that the required information can be precisely extracted from the analysed volume. Analysing and extracting the proper information from PET volumes can be performed by utilising analysis and classification approaches which provide rich information compared to what can be extracted from visual interpretation of the PET volumes alone. The need for accurate and fast analysis approaches of imaging data motivated the exploitation of artificial intelligence (AI) technologies. Artificial neural network (ANN) is one of the powerful AI techniques that has the capability to learn from a set of data and construct weight matrices to represent the learning patterns. The ANN is a mathematical model which emulates the activity of biological neural networks in the human brain.

ANNs had great success in many applications including pattern classification, decision making, forecasting, and adaptive control [10]. Competitive neural networks with wavelet invariant moments have been used in [11] to detect the arbitrary pose of the face and verify the candidate face regions. In this work, the user selects some wavelet invariant moments and feed them into the neural network. There is just one active output out of three outputs of the network which correspond to frontal face, nonface, and profile face. Therefore there is no full classification for the whole image since it is just restricted to some selected features. Supervised competitive learning algorithm has been used to train competitive learning neural network with the extracted features set for handwritten Chinese character recognition [12]. A number of research studies has been carried out in the medical field utilising ANN for image segmentation and classification using various medical imaging modalities. Multilayer perceptron (MLP) neural network have been used in [13] to identify breast nodule malignancy using sonographic images. A multiple classifier system using five types of ANNs and five sets of texture features extraction for the characterisation of hepatic tissue from CT images was presented in [14]. Kohonen self-organising neural network for segmentation and a multilayer backpropagation neural network for classifying multispectral MR images have been used in [15]. Kohonen ANN was also used for image segmentation in [16]. Computer-aided diagnostic (CAD) scheme to detect nodules in lung using multiresolution massive training artificial neural network (MTANN) is presented in [17].

Many other approaches were used for medical image segmentation. A fuzzy locally adaptive Bayesian (FLAB) segmentation for automatic lesion volume delineation has been proposed in [18]. The FLAB approach was compared with a threshold approach as well as fuzzy hidden Markov chains (FHMCs) and the fuzzy C-Means (FCM) algorithms. In this comparison, phantom data sets were used to validate the performance of the proposed algorithm. A new fuzzy segmentation technique adapted to typical PET data was also recently proposed [19]. First, PET images smoothed using a nonlinear anisotropic diffusion filter are added as a second input to the FCM algorithm to incorporate spatial information. Thereafter, a methodology was developed to integrate the a trous wavelet transform in the standard FCM algorithm to allow handling of heterogeneous lesions’ uptake. An unsupervised MRI segmentation method based on self-organising feature map has been reported in [20]. An extra spatial information about a pixel region was obtained using a Markov random field (MRF) model. The utilisation of MRF term improves the segmentation results without extra data samples in the training set. The cooperation of MRF into SOFM has shown its great potentials as MRF term models the smoothness of the segmented regions. It verifies that the neighboring pixels should have similar segmentation assignment unless they are on the boundary of two distinct regions. However, it is not clear how the proposed approach is able to differentiate between two regions.

This paper aims to develop a robust PET volume analysis system using ANN combined with Bayesian information criterion (BIC). The initial investigation of this system was published in [21]. Two methodologies have been investigated using the competitive neural network (CNN) and the learning vector quantisation neural network (LVQNN) for classifying PET volumes. BIC has been used in this system to select the optimal number of classes for each PET data set and feed it into the ANN blocks. The CNN and LVQNN outputs have been evaluated using two PET phantom data sets, a clinical PET volume of nonsmall cell lung cancer patient, and PET volumes from seven patients with laryngeal tumours.

This paper is organised as follows. Section 2 presents theoretical background for the main system components including BIC, CNN, and LVQNN. The proposed medical volume analysis system is described in Section 3. Results and analysis are illustrated in Section 4, and finally conclusions are presented in Section 5.

2. Theoretical Background

2.1. Bayesian Information Criterion

Bayesian information criterion (BIC) is employed to approximate the Bayes factor which is consequently used to compare a series of rival theories. BIC is one hypothesis testing approach which uses Bayesian inference. BIC has gained notoriety as a significant approach for model selection and has been used in contexts varying from image processing and analysis [22], to biological and sociological research [23, 24]. Although the BIC does not allow for spatial correlations between voxels to be considered, it does provide a useful strategy for the comparison of contesting models. The model which applied to the data set is denoted by and is defined from the Gaussian distribution utilised and its associated parameters , where is a prior number of classes. To compare two competing hypotheses and , the posterior probability of each model with voxel labelling is computed as follows [23]:

is the prior probability of model , and in this case the number of classes considered is taken to be equally likely a priori, therefore,

The ratio of posteriors, , is commonly referred to as the Bayes factor for model versus model . This factor equates to the posterior odds of one hypothesis when the prior probabilities of the two hypotheses are likely to be equal. This also provides a measure of evidential weight provided for the data or against the null hypothesis. represents the integrated likelihood of model rather than the maximised likelihood and is given by:

is the usual likelihood, and is the prior assumed to be equally probable for all . Evaluating this integral is combinatorially difficult; however, a good approximation to the integrated likelihood is given by the BIC:

where is the maximum likelihood estimator of obtained from Gaussian mixture fitting:

where is the dimensionality of the data vectors, and is the cardinality of the parameter set employed. The Bayes factor shown in (6) can be approximated by computing the difference of BIC terms, which are the results of model fitting for different numbers of classes, and [23]:

Although the absolute value of the BIC is not individually informative due to comparison with the null hypothesis, the disparity between BIC values for competing models provides evidence specifying the use of one model against another.

Expectation maximisation (EM) algorithm is used to find the maximum likelihood estimation for each class in the processed PET volume. The maximum likelihood estimation of (MLE) based on the incomplete observed data can be defined as follows: where is the log likelihood of given . The maximum likelihood estimation is calculated first based on Gaussian distribution, which is a continuous probability distribution that is often used as a first approximation to describe real-valued random variables that tend to classify around a single mean value:

where is the mean for each class, and is the standard deviation.

The maximum likelihood estimation for each segment is finally obtained utilising this probability beside the histogram of each level in the processed slice. The mean and standard deviation, for each class are also calculated based on histogram calculation, and according to these statistical details mthe signal-to-noise ratio (SNR) for each class is obtained as well to evaluate the level of the signal in each segment [25]. SNR can be calculated according to the following equation:

2.2. Competitive Neural Network

Competitive neural networks can learn to detect regularities and correlations in their input and adapt their future responses to that input accordingly. The neurons of competitive networks learn to recognise groups of similar input vectors. Self-organising maps learn to recognise groups of similar input vectors in such a way that neurons physically near each other in the neuron layer respond to similar input vectors.

CNN consists of a single layer, the neurons in this competitive layer distribute themselves to recognise frequently presented input vector. The weights are applied to the input vector using negative Euclidean distance approach. The layer’s net input is calculated by combining its weighted inputs and biases. CNN is trained using two approaches: the first one is sequential order incremental training which trains the network with weight and bias learning rules with sequential updates. The other approach is random order incremental training which trains a network with weight and bias learning rules with incremental updates after each presentation of an input. Where in this type of neural network inputs are presented in random order [26].

2.2.1. Competitive Learning

The learning rule used for CNN is based on Kohonen rule [27, 28], the neuron, whose weight vector was closest to the input vector , is updated to be even closer. The result is that the winning neuron is more likely to win the competition the next time a similar vector is presented, and less likely to win when a very different input vector is presented. As more and more inputs are presented, each neuron in the layer closest to a group of input vectors soon adjusts its weight vector toward those input vectors. Eventually, if there are enough neurons, every class of similar input vectors will have a neuron that outputs 1 when a vector in the class is presented, while outputting a 0 at all other times. Thus, the competitive network learns to categorise the input vectors it sees each time. The weight of a neuron at iteration is adjusted as follows:

where the learning rate is set in the proposed PET application to 0.6. This value has been chosen by experiment, as the learning rate value near zero results in slow learning, however, values near one result in faster learning, but the weight vector stays unstable. To solve the problem of the number of classes in the CNN, BIC has been utilised to determine the optimal number of classes for each type of the processed data set.

2.3. Learning Vector Quantisation Neural Network

Learning vector quantisation neural network is a hybrid network, it uses unsupervised and supervised learning to form the classification. LVQNN has two layers: the first layer calculates weighted inputs using negative Euclidean distance approach. The second layer has neurons with pure-line activation function and calculates weighted input using dot product weight approach. There are no biases used in LVQNN. LVQ learning in the competitive layer is based on a set of input/target pairs. Each target vector has a single 1, and the rest of its elements are 0. The 1 tells the right classification of the associated input.

LVQNN is more efficient than CNN in case of large number of inputs. The optimisation procedure implicitly in this network yields the class means by estimating optimised assignment for each class [29].

2.3.1. LVQ Learning Rule

The learning rule in LVQNN combines competitive learning with supervised learning approach [30, 31], which requires a set of examples of the suitable network behavior as follows: Each of these target vectors should contain all zeros except for a single 1. This 1 indicates the class to which the assigned input vector belongs. At each iteration, an input vector is presented to the LVQNN, and the distance from to each prototype vector is calculated. After competition, one neuron wins, and the th element of the competition outputs vector is set to 1, then this outputs are multiplied by the weight matrix to get the final output, which has one nonzero element indicating the vector class. Kohonen rule is used as a learning rule; so if the input vector is classified correctly, then the weight of the winning neuron needs to be moved toward as follows: however, if is classified incorrectly then the weight of the winning neuron needs to be moved away from as follows:

3. The Proposed System

The proposed medical volume analysis system is illustrated in Figure 1. The 3D PET volume acquired from PET scanner goes through the preprocessing block, where thresholding, histogram equalisation, and median filter are utilised to remove external artefacts and enhance the quality of each slice features. The EM algorithm is then used to find the maximum likelihood estimation for each class in the enhanced volume. The normal Gaussian density is then calculated according to (8) for the values 0–255 of each class. The mean standard deviation and the class probability (CP) for each generated class are also calculated, according to these statistical details, the SNR for each class is also obtained. The classes probabilities must add up to one. The BIC values are calculated and plotted against different values of to determine the optimal number of classes for each processed slice. This number is fed to CNN and LVQNN, where the processed volume can be then classified. The outputs of these approaches are compared in the next step, and the best classified outputs are selected and displayed.

4. Results and Analysis

4.1. Phantom Studies
4.1.1. Experimental Phantom Studies

The first data set used in this study is obtained using the NEMA IEC image quality body phantom which consists of an elliptical water filled cavity with six spherical inserts suspended by plastic rods of volumes 0.5, 1.2, 2.6, 5.6, 11.5, and 26.5 mL. The inner diameters of these spheres are 10, 13, 17, 22, 28, and 37 mm. The PET image volume consists of voxels, each voxel has dimensions of 4.07 mm × 4.07 mm × 5 mm corresponding to voxel volume of 0.0828 mL. This phantom was extensively used in the literature for the assessment of image quality and validation of quantitative procedures [3235]. Other variants of the multisphere phantoms have also been suggested [36]. The PET scanner used for acquiring the data is the Biograph 16 PET/CT scanner (Siemens Healthcare, Erlangen, Germany) operating in 3D mode [37]. Following Fourier rebinning and model-based scatter correction, PET images were reconstructed using two-dimensional iterative normalised attenuation-weighted ordered subsets expectation maximisation (NAW-OSEM). CT-based attenuation correction was used to reconstruct the PET emission data. The default parameters used were ordered OSEM iterative reconstruction with four iterations and eight subsets followed by a postprocessing Gaussian filter (kernel full-with half-maximal height, 5 mm).

To choose the optimal number of classes for each slice in the processed PET phantom volume, different values of have been used. For the proposed application, BIC values are calculated incrementally increasing from to . is not further increased, as in this medical application, any additional separation is unnecessary. BIC values tend to increase indefinitely as the number of components increases. An increase in BIC value indicates improved model fit; however, these values typically stabilise on an approximate curve plateau, the beginning of which is usually taken to indicate the optimal value. The plot of BIC values with for experimental phantom data set shows that the best value can be obtained at 4, as illustrated in Figure 2.

The mean standard deviation, SNR, and class probability (CP) for each slice have been calculated to analyse all the recommended classes in each slice. Table 1 presents these values for the optimal class number (CN) for experimental phantom data set.

4.1.2. Simulated Phantom Studies

The second data set consists of Monte Carlo simulations of the Zubal anthropomorphic model where two volumes were generated. The first volume contains a matrix with isotropic voxels, the size of this volume is voxels. The second volume contains the same matrix of the first one but without isotropic voxels, it has a size of voxels. The voxel size in both volumes is 5.0625 mm × 5.0625 mm 2.4250 mm. The second data volume has 3 tumours in the lungs whose characteristics are given in Table 2 [38].

For isotropic voxels in simulated phantom data set, the optimal class number obtained from BIC plot is 5 classes, as shown in Figure 3. The plot of BIC values flattens to an approximate plateau at , and for this reason, this statistical model selection test determines 5 to be the most appropriate number of labels for classifying this data set. The optimal number of classes is the same for tumours 1, 2, and 3. For volume with nonisotropic voxels in the simulated phantom data set, the optimal number of classes obtained from BIC plot is also 5 classes for all the three tumours.

Table 3 shows the statistical information about the best number of classes for simulated phantom data set, tumour 1, with isotropic voxels. While the statistical details about tumour 2 are presented in Table 4.

Analysing the statistical details about tumour 3 shows that there is a small difference between the SNR values calculated for classes 2, 3, 4, and 5, as presented in Table 5. Similar analysis has been performed for the simulated phantom data set with nonisotropic voxels. Table 6 compares the SNR and class probability for tumours 1, 2, and 3 with nonisotropic voxels.

The optimum chosen class number is fed to both CNN and LVQNN, where both have clearly classified all spheres in experimental phantom data set, as illustrated in Figure 4. Clear classes have been also obtained for the simulated phantom data set with isotropic and nonisotropic voxels. For example, Figure 5 presents the original and classified slice for the simulated phantom data set with isotropic voxels and tumour 1.

Better performance has been achieved using LVQNN rather than CNN; however, the required time for classifying each slice in the processed volume using LVQNN is higher than the time required for processing each slice using CNN.

Two performance metrics have been employed to evaluate the performance of the proposed ANNs. A confusion matrix is a visualisation tool typically used in supervised and unsupervised learning approaches. Each row of the matrix represents the instances in a predicted class, while each column represents the instances in an actual class. One benefit of a confusion matrix is that it is easy to see if the system is confusing two classes: (the tumour and the remaining tissues in case of clinical PET data sets). The other performance metric approach is receiver-operating characteristic (ROC). This approach can be represented by plotting the fraction of true positives rate (TPR) versus the fraction of false positives rate (FPR), where the perfect point in the ROC curve is the point (0,1) [39]. Both ANNs have performed well in classifying both experimental and simulated data sets.

4.2. Clinical PET Volume
4.2.1. Clinical Data Set 1

Clinical PET volume of patient with histologically proven NSCLC (clinical Stage Ib-IIIb) who has undertaken a diagnostic whole-body PET/CT scan was used for assessment of the proposed classification technique. Patient fasted not less than 6 hours before PET/CT scanning. The standard protocol involved intravenous injection of 18F-FDG followed by a physiologic saline (10 mL). The injected FDG activity was adjusted according to patient’s weight using the following formula: where MBq (megabecquerel) is the international system of units for radioactivity. As One Bq is defined as the activity of a quantity of radioactive material in which one nucleus decays per second. After 45 min uptake time, free-breathing PET and CT images were acquired. The data were reconstructed using the same procedure described for the phantom studies. The maximal tumour diameters measured from the macroscopic examination of the surgical specimen served as ground truth for comparison with the maximum diameter estimated by the proposed classification technique.

The optimal number of classes obtained from BIC plot for clinical PET volume of nonsmall cell lung cancer patient is 5 classes, as illustrated in Figure 6. While Table 7 shows the statistical information about these classes. A subjective evaluation based on the clinician knowledge has been carried out for the output of the proposed approaches. The classified slice using the proposed approaches has clear detection of the region of interest (ROI) as illustrated in Figure 7.

4.2.2. Clinical Data Set 2

The second clinical data set used in this study is PET volumes from seven patients with T3-T4 laryngeal squamous cell carcinoma. Prior to treatment, each patient underwent an FDG-PET study. Patients were immobilised with a customised thermoplastic mask (Sinmed, Reeuwijk, The Netherlands) fixed to a flat table-top to prevent complex neck movements. First, a 10 min transmission scan was obtained on the Siemens Exact HR camera (CTI, Knoxville, USA). Immediately after intravenous injection of 185–370 MBq (5–10 mCi) of FDG, a 1h dynamic 3D emission scan was performed. It consisted of eight frames with variable duration ranging from 90 to 600 s. All images were corrected for dead time, random, scatter attenuation and decay and then reconstructed using a 3D OSEM algorithm, as used in the clinics for patients with head and neck tumours [4042]. The size of this data set is voxels for each patient.

In the case of clinical data set, the plot of BIC values flattens to an approximate plateau at , and for this reason, this statistical model selection test determines 5 to be the most appropriate number of labels for the classification of this data set. For the data set in question, the determined by the BIC plot corresponds precisely to the number of classification levels recommended by clinicians specifically for tumour quantification. The BIC provides a useful objective methodology for classification level selection. In particular, the BIC works efficiently for tumour quantification in oncological PET data and can be computed very rapidly. In the case of a volume with dimensionality voxels, histogram computation takes approximately 1.9 seconds, and thereafter, one BIC value associated with a specific value of can be calculated every 1.2 seconds. The total BIC model selection procedure takes approximately 8.4 seconds (for ). The model timings are obtained using a single processor, 2.66 GHz, with 3 GB of RAM.

Using the proposed approach the optimum CN for each patient has been chosen. According to Figure 8 which illustrates the BIC values for different values (from 2–8), the best CN for this data set is 5 classes. This CN is fed to both CNN and LVQNN to do the classification for each slice in the processed clinical data set.

All the statistical information about each class in this data set has been calculated, Figure 9 illustrates the values for each class of clinical data set 2 from patient number one to patient number seven. The calculated SNR was important to refer to the class which contains the ROI. A subjective evaluation based on the clinician knowledge has been carried out for the output of the proposed approach. The classified volumes using the proposed approaches have a clear detection of ROI. Laryngeal tumours from seven patients were clearly classified, Figure 10 illustrates the original and classified slice for patient 1 data set.

5. Conclusion

An artificial intelligent statistical approach based on CNN and LVQNN was proposed for 3D oncological PET volume analysis. Experimental, simulated, and clinical PET studies of nonsmall cell lung cancer and pharyngolaryngeal squamous cell carcinoma were used to evaluate the performance of the proposed system. BIC and EM approaches were deployed to obtain the optimal number of classes, which was used by CNN and LVQNN to classify each slice in the processed PET volume. The mean, standard deviation, signal-to-noise ratio, and class probability were also calculated for each class. A detailed objective assessment together with subjective evaluation based on clinical knowledge was performed to characterise the performance of the proposed approach. Promising results were obtained, and the system appears to successfully classify and quantify lesions from clinical oncological PET studies.

Acknowledgments

This paper was supported by the Swiss National Science Foundation under Grant SNSF 31003A-125246, Geneva Cancer League, and the Indo Swiss Joint Research Programme ISJRP 138866.