Abstract

The number of medical images being stored and communicated daily is rapidly increasing, according to the need for these images in medical diagnoses. Hence, the storage space and bandwidths required to store and communicate these images are exponentially increasing, which has brought attention toward compressing these images. In this study, a new compression method is proposed for medical images based on convolutional neural networks. The proposed neural network consists of two main stages: a segmentation stage and an autoencoder. The segmentation stage is used to recognize the Region of Interest (ROI) in the image and provide it to the autoencoder stage, so more emphasis on the information of the ROI is applied. The autoencoder part of the neural network contains a bottleneck layer that has one-eighth of the dimensions of the input image. The values in this layer are used to represent the image, while the following layers are used to decompress the images, after training the neural network. The proposed method is evaluated using the CLEF MED 2009 dataset, where the evaluation results show that the method has significantly better performance, compared to the existing state-of-the-art methods, by providing more visually similar images using less data.

1. Introduction

With the rapidly growing use of medical images in diagnosis procedures, the storage space and bandwidth required to store and communicate these images are exponentially increasing [1]. Accordingly, significant emphasis has been attracted by image compression techniques, according to their ability to reduce the size of the data required to represent the image, by eliminating redundant information [2]. This reduction in the data size allows more efficient exploitation of the storage space and communication bandwidth. Mainly, digital image compression techniques are divided into two main categories: lossless and lossy [3]. In a lossless compression technique, the information retrieved from the compressed image is identical to the original information of the image, before being compressed [4]. In contrast, in a lossy technique, the information retrieved from the compressed image is similar to the original image information but not identical [5]. However, lossy compression techniques can significantly reduce the size of the produced compressed image and have the ability to adjust or balance the tradeoff between the size of the produced data and the quality of the retrieved image, compared to the original one [6]. Hence, several recent methods are proposed to compress the same image on different levels of compression, based on the significance of information available in the image [7]. These methods attempt to maintain the information in the Region of Interest (ROI) similar to the original image while applying more lossy compression to the remaining part of the image, to significantly reduce its size.

One of the main challenges in producing a hybrid compression is the accurate detection of the ROI. A universal segmentation method that has the ability to detect and segment all types of objects in medical images does not exist. Mainly, the specificity of these methods is the reason behind the lack of such a universal method [810]. According to this limitation, the use of the existing methods [7, 8, 11] that employ such segmentation algorithms is limited to certain types of medical images, which can be segmented using the employed segmentation method.

With the recent breakthroughs by using Artificial Neural Networks (ANNs), these networks have been able to significantly improve the performance of image segmentation [12] and compression [7]. In addition to the improved segmentation performance, the use of ANNs for this task provides the compression method with better flexibility, regarding the type of the objects in the medical images being compressed. The addition of new types of medical images requires only training the segmentation method to recognize such ROIs, as these networks do not use hand-crafted features for the segmentation and can learn new ones by additional training. Accordingly, the method proposed in this study uses a single hybrid neural network that has both the segmentation and compression networks, so that the compressed image is outputted by only passing it through the proposed neural network, once. However, the proposed training approach for the neural network allows training the different parts separately, so that the segmentation part can be updated for any type of medical images.

Compared to existing image compression techniques, such as the JPEG2000, the use of deep convolutional autoencoders has shown significantly better compression rates, with similar complexity [13]. This performance has encouraged the use of neural networks in different approaches to adjust their performance according to the required applications. A convolutional recurrent neural network is used by Sushmit et al. [14] to compress x-ray images by dividing each image into 16 regions. However, in addition to the limitation in the image information provided to the neural network per each region, recurrent units are significantly more complex than convolutional [15]. The division of the image also denies the ability of defining the Region of Interest (ROI), as such a region is being also divided. Hence, it is not possible to provide better image quality for the important region of the medical image; i.e., the entire image is compressed to the same quality.

Ahamed et al. [16] propose a near-lossless compression algorithm for medical images using an Artificial Neural Network (ANN). Per each block of four pixels, the feed-forward neural network predicts the fourth value, based on the remaining three values as inputs. Then, the errors between the predicted values and the actual pixel values in the input image are measured and compressed using entropy encoding. Upon arrival, the same neural network is used to predict the pixel values of the image, whereas the received error values are used to adjust the outputted values from the neural network. Different heuristic optimization methods are used to adjust the parameters, i.e. train, of the neural network, which are the Gravitational Search (GS), Particle Swarm (PS), and Backpropagation (BP). The results show that, despite the higher compression rate that can be achieved when the GS and PS methods are used, the use of the BP method to train the neural network has achieved the highest similarity between the original and decompressed images. Such results indicate that the neural network that is trained using the BP algorithm has better predictions. However, as this method does not consider the importance of certain pixels in the medical image, compared to other pixels such as the background, the size of the compressed image can be further reduced by applying more lossy compression to such less important pixels. In general, each medical image consists of two main regions, depending on the importance of the information presented in each of the regions. The region that contains the most valuable information, especially for diagnosis, is known as the ROI, while the remainder of the image contains less important information or even useless information regarding the purpose of the image. Hence, the integrity of the information in the ROI of the image must be taken into more consideration than that in the non-ROI, so that more efficient compression can be produced while maintaining the ROI’s information intact.

The method proposed by Ayoobkhan et al. [7] uses a hybrid compression method, near-lossless compression for the ROI, and lossy compression for the remainder of the image. To extract the ROI, this method uses the Graph-Based Segmentation (GBS) method [17]. A Feed-Forward Neural Network (FF-NN) is used in that method to compress the image data, based on the segments produced by the GBS, so that the neural network is trained to provide near-lossless predictions of pixels values based on the values of their neighboring pixels. The edge weight values are optimized using two different optimization algorithms, which are the Particle Swarm (PS) and Gravitational Search (GS) optimizers. Despite the good performance of that method, compared to methods proposed in earlier studies, the use of a Convolutional Neural Network (CNN) can produce better results when interacting with images, according to its ability of using three-dimensional filters that can provide better consideration of the position of the pixels in addition to their values, compared to a FF-NN.

According to the good performance of CNNs when interacting with images as inputs, these neural networks are used in different image-based applications. Compression is one of the tasks that have been tackled using CNNs, by using an approach known as autoencoders. In this approach, the neural network consists of three main types of layers: encoding, bottleneck, and decoding. The encoding layers reduce the size of the input, until the bottleneck layer, which is the smallest layer in the network, is reached. Then, the decoding layers regrow in size until the size of the input layer is reached at the output layer. The neural network is trained to minimize the difference between the output and the input image, so that the original image can be retrieved from the values in the bottleneck layer. After the training, the encoding network is used to produce the compressed version of the image, which can be used to retrieve the original image using the decoding network. Another application that CNNs are used for is segmenting images, by predicting the segments that each pixel in the image belongs to. One of the most popular segmentation neural networks that have shown outstanding performance in segmenting medical images is the U-net neural network [12].

In this paper, a new medical image compression method is proposed to produce different compression levels for the ROI and the remainder of the image, so that the important information is maintained as similar to the input image as possible while significantly reducing the size of the data that represent the image by applying a more lossy compression to the remainder of the image. By using the U-net segmentation neural network, the proposed method gains the ability to be flexible in the type of segmented objects in the images being compressed, as recognizing the segments in these images is a matter of training the U-net. Accordingly, the proposed training approach for the neural network allows training the different parts of the neural network separately. Moreover, the use of a single hybrid neural network can accelerate the computations required to reach the output, especially when these computations are parallelized [18, 19].

3. Materials and Methods

The method proposed in this study aims to compress the medical image to reduce the size of storage space and bandwidth required to store and communicate these images. However, according to the importance of the information in the ROI of the image, the proposed method uses different compression levels for the ROI and non-ROI regions, so that less information is lost at the ROI when the image is decompressed. Hence, the method requires a segmentation stage to extract the ROI and an autoencoder to allow compressing the input image depending on the detected segments. As shown in Figure 1, the input image is processed using two stacks of layers, in the same neural network. One stack represents the autoencoder part, which is responsible for the compression task while the other part is responsible for the segmentation, i.e., the extraction of the ROI. The segment produced by the segmentation part is appended to the input image before being processed by the autoencoder part, so the autoencoder can recognize the significant pixels in the image. Hence, the output of the segmentation part must have the same dimensions of the input, which is the case with the U-net segmentation neural network. Two outputs are collected from the autoencoder: one of them is the output of the entire neural network with its two parts, compression and decompression, while the other is the output of the bottleneck layer, i.e., the compressed version of the image. Hence, and according to the sizes of the filters used in the autoencoder part of the proposed neural network, the size of the output of the bottleneck layer is one-eighth of the input dimensions. The loss value is then computed using the outputs of the autoencoder, segment, and the input image.

3.1. Structure of the Proposed Neural Network

As the input image is segmented into only two regions, and as each pixel can only be in one of these regions, i.e., no overlapping, the output of the segmentation neural network has a size of (w, h, 1). Each value in the output of the segmentation neural network represents the probability of the pixel in the corresponding position to be in the ROI; i.e., values of ones are assigned to the positions of the pixels in the ROI. U-net is a fully convolutional neural network that has shown significantly better segmentation performance, compared to other CNN implementations, with medical images [12, 20]. Hence, this neural network is used in the segmentation part of the proposed method. For the autoencoder part of the proposed neural network, the topology that is used in the autoencoder proposed by Theis et al. [6], which produces an array with (w/8 × h/8) values at the bottleneck layer, is used. However, the size of the input, and hence the output, layers of the autoencoder in the proposed neural network is set to 256 × 256 to handle larger images, which produces 32 × 32 array at the bottleneck layer. Moreover, as Figure 1 shows, the input of the autoencoder contains both the original image and the segment produced by the U-net segmentation neural network. Thus, the shape of the input of the autoencoder part is 256 × 256 × 2. Figure 2 summarizes the structure of the implemented autoencoder part in the proposed hybrid neural network, showing the types of the layers in the model, as well as the dimensions of the input and output of each layer. The outputs of this neural network are collected from the output layer, which outputs the decoded image, and the bottleneck layer, which outputs the compressed version of the image.

3.2. Training the Artificial Neural Networks

In order to train the U-net for the required segmentation task, the segments are generated using the GBS method [17] and used as the true values required from the neural network. Despite the good performance of this segmentation method, the use of manually segmented images, i.e., providing more accurate training samples, can slightly improve the accuracy of the segmentation part of the proposed method. However, as the main purpose of the proposed method is compressing the images, the segments of the GBS are used to accelerate the training of the proposed neural network, by training the U-net part separately. After the training of the segmentation part is completed, it is wrapped by other layers to achieve the required topology shown in Figure 1. Freezing the weights of the U-net allows training the entire neural network, i.e., updating its weights and biases, without changing those of the U-net part. Otherwise, the neural network begins to learn to reduce the ROI in order to achieve lower data size, i.e., reducing the loss. This loss is calculated based on the Structural Similarity Index (SSIM) between the image outputted from the autoencoder and the one inputted to the neural network, in addition to the size of the compressed data at the bottleneck layer. To provide more efficient compression, the values at the bottleneck layer are compressed using the lossless Huffman entropy encoding. Reducing the number of unique values in the output of the bottleneck layer can significantly reduce the size of the output produced by the Huffman encoding. Hence, the number of unique values is used to represent the size of the data in the loss function; i.e., the neural network is trained to reduce the number of unique values outputted by the bottleneck layer. Thus, the loss function shown in (1) is used to calculate the loss value of the neural network.wherewhere is the image inputted to the neural network, is the one outputted by the autoencoder, is the average grey value in an image, and is the variance. The C values are constants that are used to avoid division by zeros or the production of extremely large numbers when low values are produced in the denominator of the SSIM formula. Moreover, the and parameters adjust the importance of each region, i.e., adjusts the effect of the ROI on the overall loss and is for the non-ROI (NROI) while B represents the values produced by the bottleneck layer. Hence, the loss value approaches zero as the SSIMs approach one and the number of unique values outputted from the bottleneck layer approaches one. Moreover, to emphasize the importance of the ROI, the values of and are set to 0.8 and 0.2, sequentially, for the proposed method. According to the better performance of neural networks that are trained using the BP algorithm, this algorithm is used in the training of the proposed neural network.

When the training of the proposed neural network is completed, it is decomposed into encoding and decoding, i.e., compressing and decompressing, networks at the bottleneck layer. This layer becomes the output of the compressing neural network and the layer next to it, in Figure 2, becomes the input layer of the decompressing network, as shown in Figure 3. As the neural network is trained to output image with required characteristics, i.e., hybrid compression for the different regions of the image, and this output is calculated by the decompression part based on the output of the bottleneck layer, as shown in Figure 2; storing or transmitting the output of the bottleneck layer can reproduce the exact same output of the decompressing neural network, which is the main concept of autoencoders. Additionally, according to the use of the segmentation and autoencoder neural networks to produce a single hybrid neural network, both types of information are included in the output of the bottleneck. Hence, the decompression part of the autoencoder can accurately restore the original image.

4. Performance Evaluation

The proposed medical image compression method is trained and evaluated using CLEF MED 2009 dataset [21]. Similar to the evaluation setup used by Ayoobkhan et al. [7], a total of 1000 random images are manually selected from 12 different categories in the 15 K images in the dataset, as shown in Table 1. The remaining images are used for the training of the U-net and the proposed compression methods. The models are implemented, trained, and evaluated using Python programming language with the aid of Keras deep learning library [22] running with Tensorflow [23] backend.

The performance of the proposed method is evaluated based on the compression rate (CR), shown in (3), and the Peak Signal-to-Noise Ratio (PSNR), shown in (3), of the ROI, NROI, and the entire image. In summary, higher CR indicates that the data produced by the compression technique is significantly lower than the original data; i.e., the input of the technique and the higher PSNR indicates that the images are more similar to each other, i.e., better quality of the image after compression.where is the largest value in the original image and is the root mean squared error.

First, the influence of the parameters and on the performance of the proposed method is evaluated by calculating the average CR and PSNR for all the images in the evaluation set. The results of this evaluation, illustrated in Figure 4, show that increasing the value of , i.e., decreasing , has a significant influence on the CR of the output of the proposed compression method and the PSNR of the ROI. However, less influence on the PSNR of the NROI is measured. This behavior is according to the lower amount of information available in the NROI, which is represented using a smaller number of unique values. The huge amount of information that requires a huge number of unique values to be represented in the ROI is more sensitive to the lossy compression and has been the main contributor to the CR. According to the similar PSNR for both the ROI and NROI when the value is equal to 0.7 and 0.8, these levels of compression are selected for further investigation and comparison.

The averages of PSNR and CR measures are calculated for the images per each category at and summarized in Table 2.

As these results show, the proposed method has been able to significantly reduce the size of the data required to represent the information in the medical image. The results also show that the average CR is higher in the types of images that have larger NROI to ROI ratio; i.e., the size of the ROI is smaller in the image. Moreover, this compression has been achieved while maintaining the visual information of the image very similar even after being compressed with an average CR of 9.68. Moreover, despite the low overall distortion in the decompressed images, illustrated by the high PSNR values and visually in Figure 5, the distortion in the ROI is significantly lower than that in the NROI, with 49.07 and 46.74 average PSNR for each region, respectively.

The comparison shown in Table 2 also illustrates the superiority of the proposed method in means of both the CR and PSNR, compared to the method proposed by Ayoobkhan et al. [7]. Despite the lower CR achieved by the proposed method with the mammogram images, the quality of the images is significantly higher than those produced by Ayoobkhan et al. [7]. The higher CR and lower PSNR indicate that the ROIs in these images are not detected accurately, so the more lossy compression is applied to the image. Hence, the proposed method has better recognition accuracy of the ROI, according to the use of the U-net segmentation neural network. Moreover, according to the comparisons in [7], the proposed method has significantly better performance than the methods in earlier studies [2426]. Additionally, despite the lower CPU frequency in the computer used for the evaluation of the proposed method, with 2.80 GHz Intel Core i7-7700HQ and similar 16 GB memory, compared to the computer used by Ayoobkhan et al. [7], the existence of an Nvidia GTX 1050 GPU with 4 GB of memory has been able to significantly reduce the time required to compress the chest image to only 308 mS, compared to 67 S reported in [7]. This reduction is according to the ability of the GPU to parallelize the computations in the neural network, unlike the use of the CPU which does these computations in series, and the use of a single hybrid neural network for the entire process instead of dividing it on multiple neural networks.

Additionally, the performance of the proposed method at is summarized in Table 3 and compared to the performance of the two approaches used by Ahamed et al. [16]. Despite the slight reduction in the CR and PSNR of the NROI, the proposed method has significantly increased the PSNR of the ROI in the produced compressed images, compared to its performance with . Hence, the PSNR values of the ROI in the compressed images produced by the proposed method are significantly higher than those in the BP based approach proposed by Ahamed et al. [16], with average PSNRs of 54.27 and 53.33, respectively. Additionally, the CR has still been able to maintain higher levels at , compared to the BP based method from Ahamed et al. [16], with values of 6.73 and 5.2, respectively. However, this compression rate is lower than that produced by the GS/PS method from Ahamed et al.[16], but, by comparing the performance measures of this approach to the performance of the proposed method in Table 2, it is clear that the proposed method has achieved significantly better performance with 9.68 average CR, compared to 8.85 achieved by the GS/PS method from Ahamed et al. [16]. This improvement in the CR is also accompanied by the ability of the proposed method to produce images with high average PSNR rates, 49.07 dB for the ROI and 47.79 for the entire image, compared to 41.46 dB achieved by the GS/PS method from Ahamed et al. [16]. Hence, the proposed method has been able to outperform both approaches by simply adjusting the value of to balance the levels of the CR and PSNR in the ROI.

5. Conclusions and Future Work

According to the importance of compressing medical images, a new method is proposed in this paper based on a convolutional neural network. The neural network used in the proposed method achieves two main tasks: segmenting the image and compressing it. For the segmentation state, the well-known U-net segmentation neural network is trained and used to detect the ROI of the image. Then, the original image and the segmentation information are forwarded to an autoencoder neural network, which uses a bottleneck hidden layer with one-eighth of the original dimensions of the image to compress the information in the image to those dimensions. The proposed neural network is trained to reduce the number of unique values in the bottleneck layer, so that smaller data size can be achieved by using Huffman entropy encoding. Moreover, the loss function that is used to train the neural network relies on the SSIM values of the ROI and NROI, where more emphasis is applied to the ROI. The proposed method is evaluated using CLEF MED 2009 dataset and compared to the existing state-of-the-art methods in the literature. The evaluation results show that the proposed method has been able to maintain most of the visual information in the image, especially in the ROI, while significantly reducing the size of the data required to store this information.

In future work, the ability to encrypt the images is going to be investigated, so that the original image cannot be restored without the existence of the encryption key, even if the decompression neural network exists. Such encryption can be achieved by requiring the encryption key as an input at the compression neural network and training the autoencoder to retrieve that key. Hence, the values in the bottleneck are going to be different when different encryption keys are used, so the original image cannot be retrieved unless the same values are provided to the decompressing network.

Data Availability

The data used to support the findings of this study are available from the CLEF MED 2009 dataset.

Conflicts of Interest

The authors declare that there are no conflicts of interests regarding the publication of this paper.