Computational and Mathematical Methods in Medicine

Volume 2018, Article ID 2527516, 9 pages

https://doi.org/10.1155/2018/2527516

## Image Decomposition Algorithm for Dual-Energy Computed Tomography via Fully Convolutional Network

^{1}National Digital Switching System Engineering & Technological R&D Centre, Zhengzhou 450002, China^{2}153 Central Hospital of Henan Province, Zhengzhou 450002, China

Correspondence should be addressed to Bin Yan; moc.liamtoh@ecapsby

Received 13 April 2018; Revised 17 July 2018; Accepted 30 July 2018; Published 5 September 2018

Academic Editor: Maria E. Fantacci

Copyright © 2018 Yifu Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

*Background*. Dual-energy computed tomography (DECT) has been widely used due to improved substances identification from additional spectral information. The quality of material-specific image produced by DECT attaches great importance to the elaborated design of the basis material decomposition method. *Objective*. The aim of this work is to develop and validate a data-driven algorithm for the image-based decomposition problem. *Methods*. A deep neural net, consisting of a fully convolutional net (FCN) and a fully connected net, is proposed to solve the material decomposition problem. The former net extracts the feature representation of input reconstructed images, and the latter net calculates the decomposed basic material coefficients from the joint feature vector. The whole model was trained and tested using a modified clinical dataset. *Results*. The proposed FCN delivers image with about 60% smaller bias and 70% lower standard deviation than the competing algorithms, suggesting its better material separation capability. Moreover, FCN still yields excellent performance in case of photon noise. *Conclusions*. Our deep cascaded network features high decomposition accuracies and noise robust property. The experimental results have shown the strong function fitting ability of the deep neural network. Deep learning paradigm could be a promising way to solve the nonlinear problem in DECT.

#### 1. Introduction

Conventional single-energy X-ray technique provides information about the examined object which is not sufficient to characterize it precisely. Dual-energy computed tomography (DECT) provides additional information by using two different energy spectra to scan the object, which has been presented as a valid alternative to conventional single-energy X-ray imaging. In recent years, the adoption of DECT has gained increased attention in public security [1] and medical field [2, 3]. The advantage of DECT is the ability for material characterization and differentiation [4]. This decomposition of mixture into two basic materials depends on the principle that the attenuation coefficient is material and energy dependent. Thus, measurements at two distinct energies should permit the separation of the attenuation into its basic components.

The quality of material-specific image produced by DECT attaches great importance to the elaborated design of the basis material decomposition method. The existing decomposition methods can be divided into two main categories: projection-based [5–7] and image-based [8–10]. Projection-based methods pass the projection data through a decomposition function, followed by image reconstruction such as filtered backprojection (FBP). It commonly provides better accuracy and reconstructed image with reduced beam-hardening artifacts in comparison with image-based methods. However, projection-based methods need matched projection datasets. This means that physically the same lines need to be measured for each spectrum, which is usually not the case in today’s CT scanners. Image-based methods use linear combinations of reconstructed images to get an image that contains material-selective DECT information. It is an approximative technique, and the resulting images are less quantitative than with projection-based methods. But image-based methods can handle mismatched projection datasets and are applicable to the decomposition of three or more constituent materials, which is more expedient in practice. Thus, they have been employed more frequently in modern DECT implementations.

The material decomposition problem in image domain can be described by the following equation:where and are the pixels in reconstructed images from low- and high-energy projections, respectively, and and are the corresponding points in decomposed basic materials images. The subscripts 1 and 2 indicate two specific materials. and are the average attenuation coefficients of the two basic materials under low/high-energy spectra. These attenuation coefficients are usually obtained by manually selecting two uniform regions of interest (ROIs) on the CT images that contain the basic materials [9, 11, 12]. Direct material decomposition via matrix inversion is a way of calculating the points and in the decomposed image, which is written as follows:

Equation (2) can be easily solved as long as the value of is not null. However, values of the two terms in do not differ significantly from each other. Therefore, the decomposition result is very sensitive to the noise in the input reconstructed images. Various methods have been proposed to solve this noise suppression problem. Precorrection [13, 14] methods reconstruct two water-precorrected images, followed by a linear combination, to yield images that are free from cupping artifacts usually in water-equivalent materials. The noise reduction techniques after image decomposition include Kalender’s correlated noise reduction (KCNR) [15, 16], noise forcing (NOF) [17], and noise clipping (NOC) [18], whose most fundamental strategy is the application of a smoothing filter. Recent advanced iterative methods [9, 10] consider the statistical properties of the decomposition process, producing high-quality edge-preserving images. These methods have shown great success on the decomposition problem. Their well performances rely on the well-handcrafted design of the algorithm.

In recent years, deep learning techniques, which use neural networks having a deep structure with three or more layers, have attracted widespread attention, mainly by outperforming alternative machine learning methods in numerous important applications. The current most popular deep model is the convolutional neural network (CNN) which has emerged as a powerful class of models for image classification [19, 20] and object detection [21]. In the field of computed tomography, some of the recent studies have already attempted to use deep neural networks to solve the problems such as low-dose image denoising [22] and artifact reduction [23]. Wang [24] provides an analytical and global perspective to the combination of tomographic imaging and deep learning. For the material decomposition problem in DECT, several neural network-based methods have also been proposed, but they all decompose the material in the projection domain [7, 25, 26].

Inspired by the recent learning-based methods [27, 28], in this paper, we propose an end-to-end image decomposition algorithm via deep learning techniques. A modified fully convolutional network is applied to extract the feature of reconstructed images and suppress the image noise at the same time. The last layer of the model is a fully connected layer to calculate the decomposed images from the extracted features. We demonstrate the effectiveness of our algorithm by the experiment on a clinical dataset. Two conventional algorithms are implemented and compared to the proposed FCN.

#### 2. Methods

##### 2.1. Fully Convolutional Network

Fully convolutional network (FCN) is one kind of CNN, which is firstly proposed and used for semantic segmentation [29]. The standard CNN generally is composed of a pooling layer and a convolutional layer which are alternately connected. The convolutional layers learn the features of the input. The pooling layers guarantee that the deeper layers can extract higher scale-level features through downsampling. In order to map the feature to the class labels, a fully connected layer will be added to the last output layer, which has fixed dimensions and throws away spatial coordinates. Due to this structural design, the naive CNN requires fixed-sized inputs and produces no-spatial outputs.

The main idea of FCN is transforming the last fully connected layer into a convolution layer with kernels that cover its entire input region. This replacement policy brings about several advantages for FCN. First, the input of the net can be the images of arbitrary sizes, which means that the net can be trained on image patches and then tested on the full-sized images. Second, it can efficiently learn to make dense predictions for per-pixel tasks such as semantic segmentation. Lastly, per-pixel tasks for naive CNN generate a huge amount of redundant convolution computations at adjacent patches. FCN avoids such problems by computing all convolutions in the first layer on the entire input image, leading to significant speedup in the forward-propagation process.

Because of these advantages, FCN is especially suitable for solving the image-based material decomposition problem which can also be regarded as a per-pixel prediction task. In addition, convolution operation to image is interpretable, since it can be seen as a kind of image filtering.

##### 2.2. Image Decomposition Model

For image decomposition, we designed an end-to-end decomposition model based on FCN. The proposed model takes reconstructed images as inputs and predicts the basic material coefficients pixel by pixel in the decomposed image, completing image decomposition and noise suppression at one time.

An overview of our model is illustrated in Figure 1. It is composed of two types of layers: convolutional and fully connected layers. Since the pooling layer may discard important structural details in feature maps, we omit it from the model to avoid losing the quality of result images. But no downsampling process by the pooling layer will lead to the same size of the feature maps at different layers. We hope the model can still catch the multiscale features of the image at different layers, so the strides of the convolutional layers are set to 2 to finish the downsampling operation. The input of the model is the image patch of size in reconstructed images. There are two independent fully convolutional nets corresponding to the reconstructed images from low- and high-energy projections. The two nets have the same layer structure and are called the L-FCN and H-FCN in short in this study. They are composed of four convolutional layers. The output of layer can be formulated as follows:where is the input feature map or images and and represent the convolutional kernel weights and bias parameter, respectively. ∗ is the convolutional operation. is the nonlinear active function of the neuron. The outputs of L-FCN or H-FCN () are a vector which represents the feature of the current input patch. The two feature vectors from L-FCN and H-FCN are merged into a joint vector. Then, a fully connected layer is used to calculate the decomposed basic material coefficients from the joint vector, which follows the following equation:where is the predicted material coefficients vector, and are the unsolved parameter matrixes, and represents the merged vector from L-FCN and H-FCN.