Abstract

We propose a new markerless tracking technique of lung tumor motion by using an X-ray fluoroscopic image sequence for real-time image-guided radiation therapy (IGRT). A core innovation of the new technique is to extract a moving tumor intensity component from the fluoroscopic image intensity. The fluoroscopic intensity is the superimposition of intensity components of all the structures passed through by the X-ray. The tumor can then be extracted by decomposing the fluoroscopic intensity into the tumor intensity component and the others. The decomposition problem for more than two structures is ill posed, but it can be transformed into a well-posed one by temporally accumulating constraints that must be satisfied by the decomposed moving tumor component and the rest of the intensity components. The extracted tumor image can then be used to achieve accurate tumor motion tracking without implanted markers that are widely used in the current tracking techniques. The performance evaluation showed that the extraction error was sufficiently small and the extracted tumor tracking achieved a high and sufficient accuracy less than 1 mm for clinical datasets. These results clearly demonstrate the usefulness of the proposed method for markerless tumor motion tracking.

1. Introduction

In radiation therapy, to irradiate sufficient dose to tumors and avoid unnecessary dose to the surrounding healthy tissues are crucial to achieve significant treatment effects and reduce adverse effects. Stereotactic body radiation therapy (SBRT) can satisfy such clinical demand for accurate isocenter positioning to the static center of the target tumor volume [1]. Intrafractional tumor motion can, however, badly affect the accuracy of the irradiating position and additional margins should thus be designed to account for such geometric uncertainties [2, 3]. Inevitably, the larger margins cover the wider regions of surrounding healthy tissues. In this sense, motion management is necessary for effective treatment, especially for abdominal and thoracic tumors [24]. Indeed, such tumors can move several centimeter due to mostly respiratory and cardiac motions [5, 6].

To achieve highly accurate irradiation to moving tumors, tumor tracking to measure or monitor the motion can be an ideal direction. Image-guided techniques to capture the tumor motion [711] have thus been developed for the tumor tracking. A kV X-ray fluoroscopy is widely used for such image-guided radiation therapy (IGRT) because of its capability of direct position measurement of a target tumor inside the patient's body. However, image quality of the fluoroscopy may not be sufficient for accurate measurement of the tumor position, and thus fiducial gold markers, which form sufficient contrast to the surroundings on fluoroscopic images, are often implanted into or near the tumor [7, 12]. The markers position can be measured accurately and regarded as a good fiducial of the tumor position. Although implanted markers are very effective for the accurate monitoring, the implantation of markers is an additional burden in clinical routine [13]. Furthermore, for lung tumor cases, it is a serious problem that the implantation itself runs the risk of pneumothorax as high as 30% of such patients [14].

On the other hand, markerless techniques are fundamentally free from the risk and have thus been developed for “safer” tracking [1520]. Among them, Berbeco et al. [15] and Mostafavi [16] developed a filter to enhance tumor contrast with the surrounding tissues by averaging tumor images at the same respiratory phase over several periods. This technique assumes that the geometric relation between the tumor and the other structures such as bones, blood vessels, and other tissues is the same for anytime at the same respiratory phase. However, the relation can change, and thus the technique often fails to measure the tumor position accurately. Meyer et al. [17] and Wilbert et al. [18] have evaluated a conventional template matching technique for tumor tracking on megavoltage portal images. They compared the tracking performance of the matching technique with several objective functions to be minimized. The technique is based on the nonfluoroscopic image assuming that target image intensities can directly be defined by observing the target itself and may not be affected by the other structures intensity. In contrast, the fluoroscopic intensities of pixels at which the tumor may be located are dependent on not only the tumor itself, but also the other structures passed through by the X-ray [21]. Due to the fluoroscopic characteristics, only insufficient measurements of the tumor motion can be achieved by such conventional techniques.

In this paper, we propose a new markerless technique for accurate measurement of the target position by decomposing the fluoroscopic pixel intensity, which is the sum of intensity components of the tumor and the other structures passed through by the X-ray, into the target intensity component and the others. In other words, the proposed technique can extract the intensity component of the target tumor from the fluoroscopic intensity. The extracted tumor intensity component is independent of the other intensity components and the extracted tumor has sufficient contrast with the surrounding area. Consequently, it can be used to measure the position accurately. The performance evaluation of the technique is conducted by using both phantom and clinical data.

The rest of the paper is organized as follows. In Section 2, a new method for extracting tumor intensity component is proposed for the markerless tracking. Performance analysis to evaluate the markerless technique is given in Section 3. Concluding remarks are described in Section 4.

2. Methods

2.1. Concept of Dynamic Decomposition

Let us consider an matrix of a digital fluoroscopic image with pixel intensity at location , where and . The observed fluoroscopic intensity is the result of superimposition of a tumor intensity component and the background intensity that is the sum of all the other components of the rest of the structures passed through the X-rays. As shown in Figure 1, the extraction can then be represented by subtracting the background intensity from the observed fluoroscopic one

Note that the extraction of the tumor component from the fluoroscopic image is generally ill posed. In fact, (1) is an indefinite equation because not only the tumor image but also the background is unknown. Thus, (1) does not have a unique solution of the tumor image . Indeed, it is often very difficult for one to recognize a tumor in an X-ray fluoroscopic image. On the other hand, radiotherapists and radiologists can recognize the shape and image intensity component of the moving tumor on a fluoroscopic image sequence. This suggests that there is a mechanism to extract the shape and intensity component of the tumor not from each fluoroscopic image, but a sequence of them. In the following, we will formulate such extraction mechanism of the proposed dynamic decomposition.

It might be worth to mention that the spatial segmentation technique finds pixels or locations belonging to the target, while the intensity of a pixel may belong to more than two structures in a fluoroscopic image. Decomposition aims to extract the intensity component of the target structure from the observed fluoroscopic intensity. This is a main difference between the conventional segmentation and the proposed dynamic decomposition for tumor extraction.

Let us suppose that, for simplicity, tumor or background motion is a translation and consider each pixel of the extracted tumor component at a reference location with intensity that will move to a new location with intensity at time . In this case, the extracted tumor image at time is written by the reference image as where denotes the displacement vector at time .

Equation (2) implies that an extracted tumor image of any frame (time) of the image sequence, , can be represented by the reference and the displacement vector at each time . The tumor image extraction problem can then be solved by finding the reference image common for all the frames and each displacement .

For the background image at time with intensity , the same condition using the reference image at the reference position with intensity can also be applied as where denotes the background displacement. Similarly, the reference image is common for any frames.

If the displacements are known, we may formulate simultaneous equations consisting of a set of ill-posed equations of (1) for several frames. Such simultaneous equations accumulate temporal image constrains that must be satisfied by the extracted tumor, the background, and the observed fluoroscopic intensities. Ideally, if the accumulated constrains are sufficient, the simultaneous equations will be solved and the tumor image can be extracted. However, the number of frames required for regularization of the simultaneous equations is depended on the tumor and background motions and the other image configuration, and thus it is unknown in general. Then, we estimate the tumor image and optimize the estimation recursively, instead of solving the equations explicitly. In addition, the displacements are unknown in general, and thus we need to estimate them as well.

2.2. Dynamic Decomposition
2.2.1. Displacement Estimation

The displacement of the tumor in (2) can be measured by using a template matching technique [17]. Obviously, an ideal template for the tumor is the extracted tumor reference that is unknown. Here the current estimation of the reference can be used as an estimation of the template. Then, the displacement can be estimated by matching the template with the tumor image estimate .

From (1), an estimation of extracted tumor intensity matrix at time can be given by subtracting the current estimate of the background image matrix from the fluoroscopic image matrix as where denotes the estimation error matrix at time and The estimate of the background displacement can be given by matching the current background reference with the fluoroscopic image . This matching would be more accurate if the size of region of interest (ROI) is sufficiently large compared with the tumor size for ignoring a tumor effect on this background matching.

2.2.2. Estimation of Tumor Intensity Component

A quadratic objective function to be minimized at time , , is given as where and denote trace and transposition of a matrix , respectively. For simplicity, a sequential steepest descent method will be formulated for finding the best estimation, but many optimization methods can be applied to minimize various objective functions.

In the steepest descent method, the change of the tumor reference estimation , , to minimize is given as where is a step size of the optimization. The update of the variable is then represented as where is an iteration number. The background reference estimation can be updated by the same manner, but in this paper it is updated by using the updated estimate of the tumor image , so that the error is kept to be 0.

Finally, the algorithm is summarized as follows.

Step 1. Initialize and (see (12) and (13)); .

Step 2. Estimate the displacement by matching with and then estimate by matching with in (5); .

Step 3. Calculate the error in (4).

Step 4. Update and by (9) and (10), respectively (tumor intensity estimation).

Step 5. . Go back to Step 3 if , where ( in this paper) is the maximum number of optimizing iterations; otherwise go to next Step.

Step 6. Estimate the motion as the center of mass of the binarized image of (see Figure 5).

Step 7. . Go back to Step 2 if , where is the maximum number of frames; otherwise stop.

3. Experimental Results

We have evaluated performance of the proposed method by using phantom and clinical data sets.

The extraction performance was evaluated only for the phantom data because the ground truth of the extracted tumor image for clinical data is unknown. On the other hand, if the extraction is accurate, then the tracking is also accurate. Thus, the tracking performance can be a good index of the extraction performance as well.

For phantom motion tracking evaluation, the following error of Euclidean distance between the measured displacement and the ground truth   is calculated by averaging the distances over frames of the fluoroscopic image sequence: For clinical data evaluation, three radiologists and medical physicists manually contoured the tumor image, and then three centers of mass of the contoured image were averaged and used as the ground truth of the tumor position . For comparison purpose, we will also show motion tracking results by the same template matching technique [17] without the proposed tumor image extraction.

3.1. Phantom Data Case
3.1.1. Phantom Data

A chest phantom fluoroscopic image without any tumor was taken first, and it was fluoroscopically superimposed by a moving phantom tumor image, which created frames of size pixels with spatial resolution 0.26 mm/pixel. The phantom tumor image used in this experiment is shown as Tumor   (left) in Figure 1. Motion of fiducial markers implanted into a lung cancer patient was used as the phantom tumor motion, which was measured every 0.033 s by using the real-time tumor-tracking (RTRT) system at Hokkaido University Hospital [7].

Figure 2 shows ten phantom frames, , randomly chosen from the total hundred frames. The images shown are cropped to pixels around the center of the original size. The left-upper image in Figure 2 shows the initial phantom fluoroscopic image, and the tumor location in the initial image is considered as the reference position with zero displacement .

3.1.2. Tumor Image Extraction

In this experiment, the tumor outline was initialized manually on the first frame , the left-upper image in Figure 2. Intensities inside the outline were initialized as a constant value . Then the reference image was initialized as where denotes the region inside the initial outline. The background reference was initialized by using the initial observation of the fluoroscopy subtracted by the initial tumor image as

Figure 3 shows extracted images of the moving phantom tumor, , corresponding to the ten frames of phantom images in Figure 2. As seen in Figure 3, the extracted tumor image was initially different from the ground truth shown in Figure 1 (left), but approaching to the truth gradually. Indeed, an extraction error decreased gradually as shown in Figure 4. The error is defined by using the error between the extracted tumor image and the ground truth as where .

Figure 5 shows a comparison between the truth and the extraction results. In this figure, images were binarized for visibility. The initial shape of the tumor shown in Figure 5(b) was obviously bigger than the truth in Figure 5(a). Nevertheless, the extracted tumor intensity component shown in Figure 5(c) seems sufficiently similar to the truth and the error , reflecting the extraction error of the different shape and 2 line motion traces seen in Figure 3, was negligible for accurate tumor tracking (see Section 3.1.3). We may thus conclude that the tumor image can be extracted from the X-ray fluoroscopic image sequence.

3.1.3. Motion Tracking

By using the binarized extracted tumor, the average error of the motion measurement in (11) converged to zero, implying that the tumor motion can be measured within the spatial resolution, for example, 0.26 mm/pixel in this example. On the other hand, the error was more than  mm by using the same matching technique [17] without the proposed tumor extraction due to the fluoroscopic characteristics especially mismatching with the background structure. The comparison clearly demonstrates effectiveness of tumor extraction from the fluoroscopic images.

3.2. Clinical Data Cases

We also applied the proposed method to three clinical datasets of fluoroscopic image sequences.

3.2.1. Clinical Data

Fluoroscopic images of size pixels with spatial resolution 0.42 mm/pixel were taken by the X-ray simulator system (Ximatron CX, Varian Medical Systems, Palo Alto, CA) at Tohoku University Hospital. The sampling interval of the image observation was every 0.5 s (i.e., images/s). The number of image frames was for each case. The less sampling frequency and smaller number of images make it more difficult for the proposed method to extract the tumor image because the number of accumulated constrains described in Section 2.1 is less than the phantom case. However, the number of frames can be larger or even equal to that of phantom case without clinical difficulty before the therapeutic fraction. The peak-to-peak displacements in craniocaudal direction were 9.42 mm, 5.88 mm, and 7.14 mm for cases 1, 2, and 3, respectively.

3.2.2. Tumor Image Extraction

Figures 6 and 7 show a fluoroscopic sequence of frames chosen from tested frames of the clinical case 1 and the corresponding frames of extracted tumor images, respectively. As seen in Figure 6, clinical images have different characteristics from the phantom case and can also be different from the model supposed in Section 2.1 These include unclearness of tumor contour, low contrast, noisiness, and changes of intensities possibly caused by motion of blood vessels, cardiac motion, and changes of exposure time. Although the characteristics may badly affect the extraction accuracy, such as blurred and noisy extraction, the result demonstrates that extracted tumors can be recognized clearer than those in the original fluoroscopic images.

3.2.3. Motion Tracking

Table 1 summarizes tumor motion tracking performance for three clinical cases. First, by using the proposed extraction, the motion tracking errors for all cases are less than the conventional method without extraction. For example, the average with standard deviation of the tracking error for the three cases by the proposed method is  mm and less than  mm by the conventional method.

Second, the fact that the error is also less than a minimum clinical requirement within  mm may be more important. It might be worth to mention that even though the extraction image was blurred and noisy as shown in Figure 7, the method can still achieve a good tracking performance. These clearly demonstrate the clinical usefulness of the extraction for the markerless tumor motion tracking with sufficient accuracy.

3.3. Discussions

In the experiments, we manually initialized the tumor outline and intensities. Indeed, the radiologists can easily draw a rough outline on the fluoroscopy. Note that even started from a rough initial estimation of the outline and intensities, the proposed extraction can achieve a good tracking performance within 1 mm accuracy for both phantom and clinical cases. On the other hand, the more accurate initial estimation gives the more accurate extraction, especially for clinical cases. Such accurate outlines are available by using the X-ray CT image or digitally reconstructed radiography (DRR). Thus, the more accurate tracking can be achieved by the proposed extraction with the more accurate initial outline.

As mentioned earlier, better optimization techniques can improve extraction performance. The objective function of the sequential optimization formulated in this paper involves only one frame constrain and is good for real-time computation during treatment. On the other hand, constrains from more than 2 frames can simultaneously be incorporated into an objective function of a batch optimization, such as . This is good for offline computation and can provide better intensity components before treatment. In this case, larger iterations may be chosen. For further improvement, many tracking techniques other than the simple template matching [17] can also be incorporated into the proposed method. In fact, the phase only correlation [2225] for low contrast cases and the particle filters [2628] for noisy and stochastic deformation can be applied to the extracted tumor image.

Although the proposed technique can achieve real-time tracking, the current radiotherapy machine may have latency to control the irradiation position. In this case, motion prediction based on the real-time tracking can be used and such prediction methods have been proposed [4].

The number of clinical data used is not good enough and the image quality of clinical data is different from that of the phantom case, but this is because no new or extra data were taken other than from normal planning routine to avoid any extra radiation dose for the developmental phase. However, a large number of clinical data with the same image quality of the phantom case will be tested for the evaluation phase of the proposed method.

4. Conclusions

We have developed the dynamic decomposition method to extract the moving lung tumor image component from kV X-ray fluoroscopic images and applied it to the tumor tracking. The tracking does not require any fiducial markers implanted into the tumor and thus is fundamentally free from the risk of implantation troubles. Sufficiently high accuracy of the extraction and motion tracking has been demonstrated by using both phantom and clinical datasets. The results suggested that the proposed method is an ideal solution for the implantation risk and can achieve a low-risk and highly accurate tumor motion tracking for the real-time IGRT.

Acknowledgments

Authors would like to thank Dr. Shirato and his colleague at Hokkaido University Hospital for sharing the clinical time series data of human lung motion with them. A part of this work was supported by the A-STEP grants from the Japan Science and Technology Agency.