Abstract

In this paper, a new reversible data hiding (RDH) scheme based on Code Division Multiplexing (CDM) and machine learning algorithms for medical image is proposed. The original medical image is firstly converted into frequency domain with integer-to-integer wavelet transform (IWT) algorithm, and then the secret data are embedded into the medium frequency subbands of medical image robustly with CDM and machine learning algorithms. According to the orthogonality of different spreading sequences employed in CDM algorithm, the secret data are embedded repeatedly, most of the elements of spreading sequences are mutually canceled, and the proposed method obtained high data embedding capacity at low image distortion. Simultaneously, the to-be-embedded secret data are represented by different spreading sequences, and only the receiver who has the spreading sequences the same as the sender can extract the secret data and original image completely, by which the security of the RDH is improved effectively. Experimental results show the feasibility of the proposed scheme for data embedding in medical image comparing with other state-of-the-art methods.

1. Introduction

Recently, most hospitals have developed medical information management system to provide better, safer, and efficient service for patients. In the system, the patient’s medical images are usually saved in DICOM format for future diagnostic, research, and long-term transmission, in which the patient’s personal information and their sufferings are usually saved simultaneously [1]. However, with the development of digital multimedia processing technology, illegally collecting and modification of medical image have become more and more easier. Most sensitive information of these multimedia is liable to be exposed to attackers in open wireless network environment; therefore, the multimedia such as the medical image need to be well protected to assure their safety. Consequently, the protection of multimedia data becomes more and more significant in the process of cloud storage and cloud computing [2]. RDH is a special kind of method that can embed secret data into the multimedia, and the cover can be losslessly reconstructed after the embedded data having been extracted completely. Presently, the traditional RDH methods for multimedia protection are facing a lot of challenges, especially for medical images. As the medical image is so sensitive, even a small change of pixels may cause huge impacts on disease diagnosis; namely, any changes of the medical image may do great harm to doctor’s judgment. At the same time, with the development of artificial intelligence (AI), machine learning has been widely employed to protect the security of wireless multimedia data. Thus, the machine learning based RDH algorithm is highly desired to guarantee the security of medical images in such situation [3].

There are a lot of reversible data hiding methods that have been proposed in past few years; they can be roughly classified into three categories: the method based on lossless compression, the method based on histogram shifting, and the method based on difference expansion. The method of lossless compression based RDH was firstly presented by Fredrich et al. [4, 5]. They embedded the secret data into the vacant room of least significant bit planes of the original image, which is achieved with compression. Celik et al. [6] proposed a high performance scheme which enhanced the lossless compression efficiency through prediction-based conditional entropy coder, and thus the capacity of lossless data embedding is improved.

Ni et al. [7] firstly presented the histogram shifting (HS) based efficient RDH method. The peak point and zero/minimum point of the histogram of original image are employed to determine the histogram bins to be shifted with one position. Then, the secret data are embedded into the empty spaces achieved through the shifting of histogram bins. From then on, a lot of RDH schemes have been proposed to improve the performance of HS-based RDH scheme. Xuan et al. [8] enhanced the performance of HS-based RDH method with IWT algorithm. The secret data are embedded into the space of medium frequency subbands to achieve high data embedding capacity and imperceptibility. Fallahpour and Sedaghi [9] separated the original image into some blocks and applied HS scheme on each one. In this scheme, the peaks and zeros (or minima) of the histogram are generated from each block and the amount of the highest histogram bins of the whole image is then increased, through which the data embedding capacity is improved at low image distortion. Xuan et al. [10] utilized the histogram pairs of image prediction-errors for data embedded, in which four thresholds were introduced to improve the data embedding performance, and thus they achieved excellent results especially at low-to-moderate embedding capacity than others. In addition, Li et al. [11] proposed a two-dimensional difference histogram modification based RDH scheme, by which the redundancy of the cover image is better exploited and high data embedding performance is achieved.

Difference expansion (DE) is another most widely used RDH method, which is firstly presented by Tian [12]. In their scheme, the difference of adjacent pixel pairs is expanded and the secret data are embedded to the expansion created spaces furtherly. As the largest data embedding capacity of Tian’s method is no more than 0.5BPP (bit per pixel). Later on, various DE based RDH schemes have been proposed to improve its performance. Thodi and Rodriguez [13] presented the first prediction-error expansion based RDH scheme, in which the prediction-errors of the object pixels are utilized for data embedding. According to the close correlation inherent in the neighbourhoods of the object pixel, the distortion of the cover image is greatly reduced at high data embedding capacity. Sachnev et al. [14] improved the performance of HS-based RDH scheme through prediction-error expansion and sorting method. In the scheme, the location map is not necessarily needed even with large data embedding capacity and thus the distortion of the cover image is reduced. Wang et al. [15] presented an efficient integer-to-integer transform based RDH method and demonstrated that Tian’s method can be reformulated as a special instance of integer-to-integer transform. Finally, they verified the superiority of the proposed scheme comparing with other traditional methods. Li et al. [16] suggested embedding secret information into scalable pixels according to local complexity of the cover image and adopting an adaptive prediction-error expansion method to achieve large data embedding capacity with low image distortion simultaneously. Lee et al. [17] presented an IWT algorithm based high capacity RDH scheme. The cover image is divided into nonoverlapping blocks and the secret data are embedded into the high frequency coefficients of each block. Thus, the proposed scheme obtained large data embedding capacity at a lower level of image distortion. Fallahpour et al. [18] divide the medical image into tiles and embedded data into each tile with HS-based RDH method. The experimental results demonstrate that the data embedding capacity can reach 30%-200% improvement and are still with low distortion. Furtherly, in [19], Coltuc and Chassery presented a low mathematical complexity RDH scheme based on reversible contrast mapping. As an integer-to-integer transform utilized on pixel pairs, this method does not require any additional lossless data compression. Ma et al. [20] combined the code division multiple access (CDMA) and IWT algorithm to improve the robustness of the RDH. According to the orthogonality of the spreading sequence employed in the scheme, the proposed method achieves high reversible data hiding performance especially at large data embedding capacity.

As the medical image plays a vital role on disease analysis [2124], they are very important and sensitive in the process of disease diagnosis and treatment process. Hence, the cover image needs to be completely recovered after the embedded data having been extracted, to guarantee the reliability and security of the medical image. Alqershi et al. [25] presented a hybrid RDH algorithm for medical images; the medical image firstly is separated into two categories: the region of interest (ROI) and the region of noninterest (RONI). The secret data are embedded into ROI areas with DE based RDH scheme, while the additional information is embedded into RONI through another data hiding algorithm. Agrawal et al. [26] introduced the IWT and HS algorithms based RDH scheme for data embedding in medical images and achieved better data hiding performance comparing with other methods. However, the data embedding capacity of medical image is still needed to be improved to conceal more patient’s privacy; at the same time, the robust and the security of the RDH in medical have not yet been studied in the past to enhance the credibility of the medical image in open wireless network environment.

In this paper, a new scheme based on CDM and machine learning is proposed for medical images RDH. In the process of data embedding, the original medical image is firstly converted into frequency domain with IWT algorithm; the secret data are then embedded into the medium frequency subbands of image with CDM and machine learning algorithms, so that the robustness and security of reversible data embedding are obtained. At the same time, the small-sample neural network algorithm is employed to optimize the embedding coefficients determination and thus high embedding performance is obtained. According to the orthogonality of the spreading sequences employed in the CDM algorithm, the elements of different spreading sequences would be mutually canceled when the data are repeatedly embedded, which enable the marked image to keep low image distortion even at high data embedding. In addition, as the secret data are represented by different spreading sequences for data embedding, only the receiver who has the same spreading sequences and the same data embedding gain factor as the sender can reconstruct the secret data and original image completely, which improves the security of the medical image. Consequently, the proposed scheme achieves both high data embedding capacity and security.

The structure of the rest paper is designed as follows: In Section 2, the algorithm of CMD based RDH for medical image is described. In Section 3, the RDH scheme combined with CDM and machine learning algorithms is provided. In Section 4, the experimental results are shown and analyzed. Section 5 draws the conclusions of the paper.

2. CDM Based Reversible Data Hiding for Medical Image

DICOM is a standard format in medical image exchange sponsored by National Electrical Manufacturers Association (NEMA). As it integrates the manufacturers of imaging facility and imaging information systems together in a file, DICOM format file is now widely used in medical image management including image processing, storing, printing, and transmitting.

CDM is a kind of wireless communication algorithm developed on spectrum spreading communication techniques. In a CDM based communication system, the to-be-transmitted signals are denoted by different orthogonal spreading sequences and transmitted noninterfering to each other in the same channel to save the frequency resources. Similarly, an RDH system can be viewed as a communication system, in which the secret data are the signals to be transmitted and the cover image is the communication channel.

2.1. CDM Based Reversible Data Hiding

In the CDM based RDH system, suppose is the original secret data to be embedded. The element of secret data can be represented by the opposite bits with the equation

Generate mutually orthogonal spreading sequences from a standard Hadamard matrix firstly. According to the character of Hadmard matrix, the number of “1” and “-1” is equivalent in each spreading sequence; as the length of the orthogonal spreading sequence is even, the candidate spreading sequences are zero-mean and orthogonal to each other.

Let represents the original image with the size of ; choose pixels of the image to form the original vector , where is the length of the vector (the same as the length of ). Then, the secret data can be embedded as

In (2), bits of secret data are embedded into vector . Here, is the number of orthogonal spreading sequences which have been added repeatedly onto the original vector; is the gain factor of data embedding, which is always a positive integer. The bigger the value of is, the higher the embedding strength of the proposed method is, and the stronger the data embedding would be. Finally, the marked image is obtained with the vectors .

It is also clear to see that the shorter the length of spreading sequence is, the more the secret data can be embedded. On the contrary, the long the spreading sequence is, the more the security would be obtained. Moreover, as the spreading sequences are orthogonal to each other, most elements of spreading sequences would be mutually canceled when the data are embedded repeatedly into the object vectors, and thus less image distortion would be achieved even with large data embedding capacity.

2.2. CDM Based Secret Data Extraction

Suppose is the marked image. Constructing with the same method as in the process of data embedding and then calculating the cross correlation of vector and spreading sequence , the secret bit can be extracted as follows:

As the spreading sequences are orthogonal to each other, (3) can be reduced towhere is always a positive integer and the result of the is always positive. Hence, the sign of expression is determined by . In the case of , the embedded data can be extracted as

Equation (5) shows that the condition of is greater than , the value of exactly equals the embedded bit , and thus the secret data can be extracted completely. As is a zero-mean spreading sequence, the expression of equals to gather the difference of adjacent pairs of pixels. Therefore, if the elements of are similar, the magnitude of is quite small, which enables more secret bits to be embedded into the original image with less image distortion. Moreover, as the secret bits are denoted by different spreading sequences, only the receiver who knows the same spreading sequences with the sender can extract the secret data and recover the original image completely; the security of the proposed scheme is greatly improved compared with those traditional RDH methods.

In addition, as the proposed RDH scheme is achieved with different orthogonal spreading sequences, the secret data can be embedded into the cover image repeatedly. The data embedding capacity is then improved multiply and that can be estimated withwhere denotes the embedding capacity, denotes the number of embedding levels, and are the rows and columns of the original image, denotes the length of orthogonal spreading sequence, and represents the size of the additional message.

2.3. CDM Based Original Image Recovery

After the secret data having been extracted from the marked image, according to the equations introduced above, the original image can be recovered completely with the formula:

In sum, as the secret data are embedded with different spreading sequences and gain factors, the receiver who has the embedding spreading sequences and gain factor the same with the sender can extract the corresponding secret data and recover the original cover image exactly. At the same time, most elements of different spreading sequences would be mutually canceled in the process of repeatedly data embedding. Consequently, the proposed CDM based RDH scheme achieves both high data embedding capacity and security.

2.4. Principle of Small-Sample Neural Network

As the small-sample neural network can solve the problem of large samples dependent of neural network, it is an effective machine learning algorithm widely employed for parameters optimization in complex system. The basic principle of small-sample neural network is to find the optimal parameters for a complexing system from small samples; therefore, the parameters can truly reflect the solution of the whole problem, and thus the blind selection of parameters is avoided.

In this paper, a second layer small-sample neural network is employed to optimize the parameters of the proposed RDH system. In the first layer of the small-sample neural network, the embedding capacity and the entropy are set as the network input of the neural network, the best length of spreading sequence as network output. In the second layer of the neural network, the length of spreading sequence , the Peak Signal to Noise Ratio (PSNR), and the structural similarity index (SSIM) are utilized as the input, and the gain factor is utilized as the output of the neural network. The flow of a small-sample neural network algorithm is shown in Figure 1.

The mathematical model of the proposed small-sample neural network iswhere is the entropy of original cover image, is data embedding capacity, is PSNR, and is SSIM of the recovered image. and are the length of the spreading sequence and the gain factor of data embedding separately. Part of samples for small-sample neural network is shown in Table 1.

In our experiment, the sample data are normalized firstly and the maximum training samples are 1000, the training target error and the learning rate are set to 0.01 and 0.1 separately, and the result value of training error is 0.001. The small-sample neural network is trained with pregenerated samples to optimize the coefficients for different data embedding conditions. The purpose of the training is to establish the nonlinear mapping relationship between the employed parameters of reversible data embedding and the quality of marked image successfully. The results show that, for most medical image (the data embedding capacity is no more than 5000 bits), the optimum length of the spreading sequence and the gain factor is 4 and 1, respectively. The training results indicate the feasibility and effectivity of the small-sample neural network for the proposed scheme.

3. Integer-to-Integer Wavelet Transform Based RDH

As the medical images generally has large flatten background areas, the IWT algorithm is then quite suitable for medical image transform, by which most low frequency parts of image can be filtered. Generally, when the data are embedded into the medium frequency subbands of image, high quality marked image and robust data embedding can be obtained even with large embedding capacity. In addition, as the high sensitive and important characters of medical image, it is necessary to recover the medical image completely after the embedded data have been extracted. However, in the condition of the image modified with conventional wavelet transform, the wavelet coefficients cannot be guaranteed to remain integer after image transform, and thus some embedded bits may be lost and the original image can not be completely recovered when any floating point value is cut off. Therefore, the integer-integer wavelet transform algorithm is highly expected to guarantee the reversibility of RDH for medical images.

The algorithm of IWT on an image can be achieved as (9) on row transform and then as (10) on column transform.

Row transformation:

Column transformation:where the subscripts and represent the decomposition levels of the coefficients, the column index, and the row index, respectively.

In the experiment, the image is decomposed into four subbands in the first level: low frequency subband (), medium frequency subbands (HL; LH), and high frequency subband (). Figure 2 shows the subbands after the IWT of a medical image. As medical image has large flatten background areas, the subband includes much image information, and high image visual distortion would be introduced if this subband is employed for data embedding. Therefore, the data is preferred to be embedded in HL, LH subbands to improve the robust of data embedding and reduce the image distortion after data embedding. In the process of data embedding, the length of spreading sequence and the data embedding strength is determined with small-sample neural network for reversible data embedding and extracting.

When the length of spreading sequence is set to 4, the maximum data embedding capacity would be 0.125BPP when the LH and HL subbands are involved for one-time data embedding. At the same time, according to the orthogonality of spreading sequence, the secret data can be embedded repeatedly on same subband without interfering to each other. Therefore, the data embedding capacity of medical image is highly improved, which ensure that the data embedding capacity is sufficient for patient’s personal privacy hiding with the proposed method. On the other hand, as most elements in subbands LH and HL are modified for data hiding in the proposed scheme, the histogram equalization is liable to be achieved and the contrast of the cover image is improved; hence, the visual quality of marked image would be enhanced with the CDM based RDH scheme.

In the process of reversible data embedding, at the sender side, the integer-to-integer wavelet transform algorithm is first utilized on the original cover image, then the CDM and machine learning based RDH is employed to embed secret bits into the medium frequency subbands of a medical image; finally, the inverse IWT algorithm is adopted to get the marked image. The process of data hiding is shown in Figure 3(a), the outline of our proposed RDH scheme in wavelet domain is as follows:(1)Segment the background and foreground of the medical image with Sobel operator, remove the segmented background of the original image, and obtain the region of interest (ROI) in medical images for the further processing.(2)Apply IWT to ROI region of the image, and then obtain low frequency subband LL, medium frequency subbands HL, LH and high frequency subband HH.(3)Utilize CDM and machine learning based RDH for data embedding in the medium frequency subbands HL and LH.(4)Construct the marked image with inverse IWT algorithm on the medical image.

At the receiver side, the whole process of data extraction is shown in Figure 3(b); the steps of data extracting in wavelet domain can be descripted in short as follows:(1)Convert the marked image into frequency domain with IWT algorithm.(2)According to the features of CDM based RDH, extract the embedded data correctly from LH and HL subbands of the marked image; the process is inverse to the data embedding.(3)Convert the marked image into its original state without distortion.

4. Experimental Results and Discussion

In the experiments, 6 DICOM format gray scale medical images with the same size of 512×512 obtained from the database of The Cancer Imaging Archive (TCIA) have been employed for the evaluation of the proposed RDH scheme. The secret data is a random binary sequence only containing elements “0” and “1”; meanwhile, a location map is utilized to mark the location where the secret data is embedded, whose size usually can be compressed very small. The medical images chosen from database TCIA are shown in Figure 4.

4.1. Results Evaluation with PSNR Indicator

For reversible data hiding techniques, generally, PSNR is utilized to demonstrate the distortion between the marked image and the original one. Higher PSNR values generally indicate that the marked image obtains excellent visual quality and thus with lower distortion.

PSNR between the marked and original image can be obtained with the following equations:where denotes the cover image with the size of and is the marked image. The expression of MSE is

Figure 5 shows the BPP-PSNR curves of the medical images (a)–(f) after data embedding. The results shown in the Figure 5 demonstrated the superiority of the proposed scheme. When the data embedding capacity is 0.125BPP, the PSNR value of all marked image is still above 52dB. The proposed scheme is sufficient for RDH in medical images. Meanwhile, the results also demonstrate that the image with large ROI areas achieves high PSNR than those with large RONI areas at the same image distortion. For instance, image (a) includes largest ROI areas in 6 images, and thus it achieves higher PSNR than others at the same data embedding capacity.

In sum, the proposed scheme in this paper could achieve excellent image visual quality even after high capacity RDH. Moreover, as the spreading sequences are employed to embed secret bits, the secret data and the original image can only be recovered completely by the receiver who has the same gain factor and the spreading sequences with the sender; thus, the security of the cover image is guaranteed and the patients’ personal information is protected completely.

4.2. Results Evaluation with SSIM Indicator

SSIM is another widely utilized indicator to evaluate the performance of RDH scheme. Here, we further employed SSIM to denote the performance of the proposed scheme. The formula of SSIM iswhere is the average value of and is the average values of , respectively, is the variance of , is the variance of , and is the covariance of and .

The experimental results on 6 medical images from TCIA image database are shown in Figure 6.

As shown in Figure 6, the results indicate that the marked image is very similar to the original one. The SSIM of the medical images drops slowly with the increases of data embedding capacity; at the same time, the SSIM of marked image with large ROI areas performs apparently superior to those with large RONI areas. When the embedding capacity is 0.1BPP, the SSIM of image (a) is 0.994, while the value is 0.991 for image (e) at the same embedding capacity. Moreover, as the image (e) has more RONI areas than other images (such as image (a)), the SSIM of image (e) drops faster than those with more ROI images. The experimental results show that the scheme proposed in this paper could achieve high data embedding capacity and security at low image distortion, which is sufficient for the protection of medical image and patient’s privacy.

5. Conclusions

This paper presents a novel RDH scheme based on CDM and machine learning algorithms for medical images. In the proposed scheme, the IWT algorithm is applied to the medical image to converse the image into wavelet domain; then the secret data are embedded into the medium frequency subbands with the CDM and machine learning algorithms. According to the orthogonality of spreading sequences employed for data embedding, the secret data can be embedded into the same subband repeatedly, and most elements of different spreading sequences are mutually canceled. Therefore, the data embedding capacity is improved and the image distortion is restrained at the same time. Moreover, the secret data and the original image can only be recovered completely by the receiver who has the same spreading sequences and embedding factor the same as the sender, which improves the security of the proposed RDH system as well. In the scheme, a small-simple neural network is also employed to optimize the data embedding coefficients, by which the performance of the proposed scheme is improved effectively. The experimental result shows that the proposed scheme achieves high performance even at large data embedding capacity for medical images, which indicates the promising prospect of proposed scheme for protection of medical images and patient’s privacy.

Data Availability

The format of DICOM images used to support the findings of this study has been deposited in the website http://www.cancerimagingarchive.net/.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The research reported in this paper was partially supported by National Natural Science Foundation of China (nos. 61802212, 61872203, and 61502241) and Project of Shandong Province Higher Educational Science and Technology Program (J18KA331).