Abstract

For augmented fluoroscopy during cardiac ablation, a preoperatively acquired 3D model of a patient’s left atrium (LA) can be registered to X-ray images recorded during a contrast agent (CA) injection. An automatic registration method that works also for small amounts of CA is desired. We propose two similarity measures: The first focuses on edges of the patient anatomy. The second computes a contrast agent distribution estimate (CADE) inside the 3D model and rates its consistency with the CA as seen in biplane fluoroscopic images. Moreover, temporal filtering on the obtained registration results of a sequence is applied using a Markov chain framework. Evaluation was performed on 11 well-contrasted clinical angiographic sequences and 10 additional sequences with less CA. For well-contrasted sequences, the error for all 73 frames was 7.9 6.3 mm and it dropped to 4.6 4.0 mm when registering to an automatically selected, well enhanced frame in each sequence. Temporal filtering reduced the error for all frames from 7.9 6.3 mm to 5.7 4.6 mm. The error was typically higher if less CA was used. A combination of both similarity measures outperforms a previously proposed similarity measure. The mean accuracy for well contrasted sequences is in the range of other proposed manual registration methods.

1. Introduction

Atrial fibrillation is the most common heart arrhythmia affecting around 2.2 million people in the USA. A possible treatment option is catheter ablation, which is a minimally invasive procedure. Depending on the ablation device used, it is carried out using either electroanatomic mapping systems, an X-ray guided approach, or a combination of both. X-ray guidance is, for example, required if a cryoballoon is used for ablation as current cryoballoons cannot be directly located by electroanatomic mapping systems. In this paper, we focus on X-ray guided approaches. One weakness of X-ray imaging is poor soft-tissue contrast. As a consequence, the left atrium (LA) can only be seen if contrast agent (CA) is injected. However, to reduce the risk of contrast-induced nephropathy, physicians try to keep the use of CA to a minimum often highlighting only a part of the left atrium. To provide orientation to the physician when no CA is present, a model of the patient’s LA, for example, generated by a preoperative CT or MRI scan of the patient can be overlaid [1]. The use of such an overlay was found to reduce procedure time and fluoroscopy time significantly [2]. Since a preprocedural 3D scan is often acquired to obtain prior knowledge about a patient’s anatomy of the left atrium and to rule out unusual pulmonary vein configurations, it is readily available for augmented fluoroscopy applications. In general, however, the coordinate systems of the preprocedurally acquired 3D heart model and the patient during the intervention differ, and a registration step is usually needed. Today, this registration is usually carried out manually. If the 3D heart model was acquired using CT, several authors proposed to use further landmarks in addition to the LA for registration having obtained associated 3D models by segmentation. Examples are the carina [3], the coronary sinus (CS) [4, 5], the spine [6], or a combination of spine and heart anatomy [7]. For 3D volumes acquired by MRI, for example, MRI angiographies (MRAs), however, a segmentation of the CS or the carina can be very challenging or has not yet been adopted for routine clinical procedures. Fortunately, registration based on contrast injection has been shown to be a fast and accurate alternative [8]. A good moment to obtain contrast-enhanced X-ray images is after the transseptal puncture, in particular if physicians use contrast injections to verify puncture success [9]. Example images of such injections are shown in Figure 1 to illustrate the images available. Since manual registration either needs the attention of the treating physician or requires a trained assistant, more automatic registration methods are desirable.

(1) Related Work. There is a significant body of research on registration of 3D objects to 2D fluoroscopic images, for example, for bones [1012], implants, joints [13], or vessels [14]. An overview is given by Markelj et al. [15]. Compared to implants, a registration of the LA is more complicated for at least two reasons: First, for implants and bones, all parts of the object are visible. During a contrast injection, however, only a part of the left atrium may appear under X-ray.

Second, depending on the amount of CA injected, the overall LA visibility may be poor. For example, in our case EP physicians use contrast at the beginning of the procedure to verify the success of the transseptal puncture. It may also be used later to enhance the anatomy, for example, to make sure that a catheter, for example, a circumferential mapping catheter or a cryoballoon catheter, was placed correctly. This is different to vessel angiography. In this case, higher amounts of contrast are injected to derive diagnostic information, for example, about a stenosis. For ablation procedures in the LA, the CA density and so the visibility of the LA under X-ray may be poorer as the small amount of contrast is injected into the high blood volume of the LA. As a consequence, further effort is needed to develop robust registration methods that can also be applied if CA is used sparingly.

Currently, very few publications deal with registration based on contrast agent: In a first approach towards automatic LA registration, Thivierge-Gaulin et al. [16] tried to find a 3D pose of a model such that its projected shadow matches the contrasted area in a selected image, enhanced by digital subtraction angiography (DSA), best. However, if only a small amount of contrast agent is injected into a somewhat large chamber such as the left atrium, this approach will not lead to a distinct optimum, because the anatomy may not be fully opacified.

Based on CT images, a second approach by Zhao et al. [17] relied on digitally rendered radiographies (DRRs) of the segmented left atrium. The rendered image was compared to a DSA image using normalized gradient correlation. This approach uses a weighting scheme that puts the focus on the roof of the LA. How well this method performs for injections into other areas of the LA is still unclear.

Besides contrast-based registration, 3D data and 2D data can also be aligned using devices. A feature-based method by Sra et al. [4] uses a segmentation of the coronary sinus (CS) in a CT volume to register a 3D LA model to a single 2D fluoroscopic image. A further approach by Brost et al. [5] uses a segmentation of the CS in an MRI volume and a 3D reconstruction of the CS catheter from two fluoroscopic images. Unfortunately, how well the CS can be extracted from a 3D MRI data set depends very much on the MRI scan protocol. Furthermore, due to the strong motion of the CS, it is difficult to relate the position of the CS catheter to the position of the LA, especially for patients having no sinus rhythm but heart arrhythmia [18, 19].

(2) Contributions and Outline of This Work. At the beginning of Section 2, we discuss the use of the normalized cross-correlation for registration of a 3D object to 2D X-ray projections based on extracted contrast agent. Afterwards, we propose two new registration techniques for contrast agent-based registration. They require a segmented left atrium as input and can, in contrast to [17], accommodate both CT and MRI data. The first method is described in Section 2.2. Here, we take explicitly apparent edges extracted from a 3D model and compare them to LA edges present in the fluoroscopic images as proposed by [10, 11] for automatic registration of bones and by [20] for manual registration of the LA, respectively. Although our method is conceptually similar to a DRR generation followed by gradient correlation, this calculation can be carried out on a GPU much more quickly than a DRR.

The second contribution described in Section 2.3 is the introduction of a novel similarity measure for biplane X-ray that is tailored to cases in which only parts of an object are visible. Based on a 3D model of the LA, our second method estimates the contrast agent distribution inside the 3D object from a simultaneously acquired pair of fluoroscopic images taken under two different view angles. Then we evaluate how consistent the contrast agent distribution estimate (CADE) is with the acquired fluoroscopic images. As the CADE depends on the transformation used for registration, the transformation leading to the most plausible CADE is used as final position estimate. A brief description of both methods was previously published by us [21].

In Section 2.4 we propose to treat the registration results for each frame of the sequence no longer as independent. Based on our previous publication [22], we use a Markov chain approach to exploit the temporal dependency between successive frames instead.

In Section 3, we evaluate a similarity measure based on the projected shadow which is close to the approach by Thivierge-Gaulin et al. [16] and the two new methods as well as combinations of them. The results are discussed in Section 4.

2. Registration Method

For registration, two X-ray sequences showing a CA injection are used. These sequences are acquired simultaneously from two different angles using a biplane system. For each plane, the projection matrix that describes the X-ray camera setup is known. We denote the associated projection operator by and for the A-plane and the B-plane of the system, respectively. We also assume that a 3D model of the patient’s LA is available, either as a triangle mesh or as a binary volume, as they can be converted into each other.

2.1. Contrast Agent Extraction

The contrasted area is found based on a difference image (DSA) , involving a frame that contains contrast agent and an uncontrasted frame . To distinguish between contrasted and uncontrasted frames, either manual annotation or a learning based method can be used, for example, the method described in [23]. Depending on the chosen contrasted frame , may contain artifacts, for example, due to motion of the diaphragm or from catheters, if they are at different positions in and . Such motion artifacts depend, unlike the information about contrast agent, to a large degree on the choice of . For example, if the catheters in are at the same position as in , their intensities cancel out. Otherwise, has high positive values at the position of the catheter in and high negative values at the position of the catheter in . To keep motion artifacts to a minimum, we propose a best reference selection, which chooses an appropriate reference frame that matches the chosen contrasted frame as much as possible. Out of all uncontrasted frames, that frame is selected which minimizes the -norm of the resulting DSA image:By following (1), we get frames for which the catheters and the diaphragm cancel out as much as possible. See Figure 3(b) for an example. In , only pixels with positive values contain contrast agent. To extract them, we set the intensity of pixels with negative value to 0. Afterwards, we compute a filtered image by applying a median filter with a large kernel size. Smaller structures, for example, caused by motion artifacts that remained despite the optimized choice of , do not pass this filter, and the noise in the contrasted area is reduced as well. Finally, a binary image of the filtered image is computed using a threshold at , where and   denote the mean and standard deviation of , respectively. Thus, a contrasted pixel is indicated by .

A previous approach [16] tried to find a transformation of the 3D model such that the projected shadows of the model into the A-plane and the B-plane of a biplane C-arm system fit best to the contrasted region. Using the normalized cross-correlation (NCC), denoted as , of two images with corresponding mean values and standard deviations the similarity of the projected shadow and can be measured. Instead of , one can also use the binary version of it, that is, . Then a registration transformation can be estimated by maximizing either one of the two functions

2.2. Edge Feature

Unfortunately, a registration approach only based on contrasted area has multiple solutions if the amount of CA is so little that it does not completely fill the LA; see Figure 2. Fortunately, CA is often injected against the roof or into the pulmonary veins. This results in perceivable edges of the contrasted area which can then be used as registration features. Edge-based registration can be carried out using only the silhouette boundary of the projected object [13] (see Figure 2) or all apparent edges [10, 11] (see Figure 3(f)).

We decided to consider all apparent edges to improve robustness against injections of small amounts of CA, as the silhouette-based approach requires that the LA is contrasted in its entirety. For a partially contrasted left atrium, internal contours may, however, also appear in the fluoroscopic images. This was already found to be beneficial for manual LA registration [20]. Instead of considering edges implicitly by comparing the DSA image to a DRR using gradient correlation [17], we computed them explicitly. To extract edges in the fluoroscopic images, we used the filtered image . After applying a median filter, edge-like variations inside the contrasted areas may remain. They, however, correspond rarely to anatomical structures and would trigger a response, if an edge filter was applied. To obtain an edge response only at the boundaries of the contrasted area, the image needs to be homogenized before applying an edge filter. Using a simple threshold method would result in a loss of the intensity drop-off at the boundary which provides important information about the edge intensity. Therefore, we weigh all image pixels by a sigmoid functionThe value of is set to , and the parameter depends on the pixel intensity range of the input image. An example of is given in Figure 3(d). Finally, is filtered using a derivative of Gaussian (DOG) filter to obtain the edge image . The kernel size of the DOG filter is set to a large value to get a smooth similarity measure; see Figure 3(e).

The projection of the 3D triangle mesh edges into 2D is done differently than in [10, 20]. We rendered the whole surface mesh and, depending on the viewing direction and the surface normal at a point, we set the opacity of projected triangles to ; see Figure 3(f) for an example. By doing so, areas that are parallel to the imaging plane are rendered transparent while areas with a normal vector orthogonal to the viewing direction are rendered opaque. The similarity between edges extracted from the fluoroscopic images and the edge images rendered from the 3D model transformed by is measured by

2.3. Contrast Agent Distribution Estimation (CADE)

Previous approaches [16, 17] for LA registration searched for a rigid transformation of the LA such that either its projected shadow or its DRR fit to the contrasted area in both fluoroscopic images. 3D information was taken into account insofar as the resulting projections came from the same 3D position of the model. However, such an approach does not necessarily guarantee that corresponding objects in both fluoroscopic images are matched to the same 3D structure of the LA. More precisely, the registration result could be such that, in plane A, the contrast agent is located in a left pulmonary vein (PV) whereas, in plane B, the contrasted area corresponds to a right PV. This is possible as for a given 2D registration in one plane the 2D registration in the other plane has one degree of freedom, which corresponds to an out-of-plane motion in the first plane. An illustration of this problem can be found in Figure 4.

To solve this problem, we compute for a given transformation a CADE inside the LA using binary reconstruction. Then, is optimized such that the contrast agent distribution estimate is most consistent with the projection images. More precisely, a voxel is estimated as contrasted if it satisfies all of the following conditions: (a) the voxel transformed by is projected on a contrasted pixel in plane A, (b) is projected on a contrasted pixel in plane B, and (c) is part of the left atrium (as contrast agent can only be found inside the LA). To compute the CADE, we define the indicator functionGiven the binary images and with corresponding projection operators and the indicator function , the CADE for a transformed voxel, , can be computed asfor a given rigid transformation . This product is the mathematical equivalent of the three conditions introduced above.

If is chosen suboptimally, the resulting 3D CADE will be inconsistent with the CA observed in the 2D images. That is, a pixel in the 2D image is contrasted but no corresponding voxel along its projection ray is estimated as contrasted. This can be due to the following reasons as shown in Figure 5: (a) the projection ray from a contrasted pixel does not intersect the left atrium as the LA has not been placed at the proper position yet; (b) the projection ray hits the LA, but all voxels intersected by this ray cannot contain CA because their corresponding pixels in the other plane are uncontrasted. Additional inconsistencies are introduced by pixels which are erroneously labeled as contrasted, for example, due to motion artifacts. To verify the validity of the CADE, we compute binary 2D images by forward projecting all contrasted voxels in using . We assess the consistency of the CADE for the given transformation by computing the similarity between the fluoroscopic images and the projected CADE by

Alternatively, the number of contrasted voxels inside the volume could be maximized. The set of potentially contrasted voxels is defined by the intersection of the projection ray bundles of all contrasted pixels from views A and B, respectively. Maximizing the number of contrasted voxels would lead to a transformation such that as much as possible of this set is contained inside the LA. If, for example, a pulmonary vein was filled with CA, the transformation would be chosen such that the set of possible contrasted pixels was located in the main body of the LA which provides the most space to include most possibly contrasted voxels; see Figure 6. To avoid this registration bias towards large structures inside the LA, we decided to optimize the consistency to the 2D images.

2.4. Confidence-Based Temporal Markov Filtering

To exploit dependencies between successive frames of a sequence, we model the position of the LA model in 3D as a time-dependent continuous Markov chain of the first order as proposed by us previously [22]. The states are transformations, that is, a translation and a rotation of the model. The parameter denotes the actual transformation of the LA in the th contrasted frame, refers to the estimate of the transformation for the th frame, and is the probability that for frame the transformation is observed. For convenience, we define . The transition probability from one state into another is independent of the frame number . It is denoted as . Finally, a sequence of transformations is to be determined such that the termis maximized.

Transition probabilities and state probabilities control the filtering differently: Due to the transition probabilities, small motions of the LA from a frame to its next frame are preferred. This results in an averaging of the LA transformation over time. The state probability determines the impact of the averaging, that is, how much the registration result is changed by the filtering. We will model the state probability based on a confidence measure such that frames with high-confidence registration results are less subject to temporal filtering. If the confidence value is low, that is, the estimated error of the registration result is higher, we want to rely more on the results of the previous and next frame. Thus, high temporal averaging is performed for frames with a low confidence value. The state probabilities and the transition probability are determined as follows.

2.4.1. State Probability

The contrast agent visible in frame determines the probability of the LA for being transformed by in this frame. If the dependency on other frames is ignored, the most probable transformation is the transformation obtained by optimizing the similarity measures defined in (3), (5), and (8) or a combination of them.

The similarity measure depends on the contents of images . Zhao et al. suggested using the value as a confidence measure: from all registration results the frame where is maximum should be selected as registration result for the complete sequence. We confirmed a correlation of and the registration error. The value of can therefore be used as a confidence measure. By linear regression, a function can be determined to estimate the error based on the value of . With the transformation of the LA found during optimization for frame , the probability can be modelled as normal distributionThe confidence range is incorporated by setting the covariance matrix to .

2.4.2. Transition Probability

The transition probability states how likely a movement of the LA is from frame to . Over multiple breathing cycles, the LA moves about a mean position. So the mean transformation is the identity. The likelihood of a transformation change depends on the magnitude of the change. The larger the change in translation and rotation is, the less likely the transformation transition is. To account for different frame rates, we consider the translational and rotational velocity which comprises three translational velocities and three rotational velocities. In a training step, velocity vectors are computed for annotated data. The covariance matrix of the velocities gives an estimate for how likely a transition from one transformation to the next is. We model the probability of a transition as a normal distributionBesides the frame rate , the transition probability depends also on the current breathing phase. Compared to breathing motion, cardiac motion can be neglected as it rather deforms the LA. If information on the breathing phase can be estimated [24], superior motion should be, for example, more likely during the inhale phase. During exhalation phase, inferior motion should have a higher probability. To account for breathing motion, for example, the mean value and the covariance matrix could be estimated separately for each stage of the breathing cycle.

2.4.3. Most Probable State Sequence

The most likely state sequence,can be found using a log-likelihood method: By applying the logarithm to (9), we get This convex optimization problem can be solved using the BFGS method.

3. Experiments and Results

We retrospectively evaluated our method on 21 clinical biplane X-ray sequences from 10 different patients. All of the patients provided their informed consent for the analysis of their clinical data. For all patients, a segmentation of their left atrium from a preoperatively acquired MRI scan was available as a triangle mesh. When using the CADE measure, this mesh was converted into a binary volume. The data set comprised 11 sequences showing an initial contrast agent injection where 15 mL CA was injected into the LA center through a sheath to verify the success of the transseptal puncture. Besides the sheath, only a coronary sinus catheter was present. There were 10 more sequences showing subsequent injections to verify catheter placement. In these cases, about 10 mL CA was injected and additional catheters were present. All X-ray angiography sequences were acquired during normal breathing using a standard acquisition protocol. This resulted in a total number of 133 contrasted frames. For our experiments, the first contrasted frame was determined manually. For 129 frames, reference registrations performed by three clinical experts were available; for the remaining four frames registrations performed by two clinical experts were available. Reference 3D registrations were established by manually shifting the mesh in each of two orthogonal views. The resulting 3D translation was refined this way until a good match to the associated contrast distributions as seen in the two 2D X-ray projections had been found. These reference registrations covered only 3D translation for the following reasons: First, patients were positioned head first, supine, during both preoperative and intraoperative imaging. This patient positioning rules out large degrees of rotation a priori. Furthermore, given the small amounts of contrast injected in our cases, the remaining small rotations were very difficult to detect. This made it practically impossible for our clinical experts to reliably correct them. This is why we decided to evaluate our approach without rotation.

As initialization for optimization, the 3D model was placed at that 3D position which corresponded to the centers of both 2D images. In some cases, the initialization was more than 30 mm away from the correct solution and beyond the capture range for gradient-based methods. Therefore we applied an octree-like coarse-to-fine scheme where we evaluated several positions at a coarse resolution. At positions in space that yielded a good similarity value, we performed subsequent evaluations at an increasingly finer resolution. The 3D translation found by the optimization process of the respective objective function was compared to the mean translation vector of the three manual registration results. The distance was used as error measure. The significance of the results was measured using a Wilcoxon signed-rank test [25] and a significance level of .

All sequences contained 12-bit images of size pixels; all image processing, including rendering from the 3D model, was performed on the full image size. The kernel size for median filter was 30 pixels; the value of (4) was 0.1. The standard deviation of the DOG filter was 24 pixels.

In the evaluation, we compared the similarity measures , , , and and the combined similarity measures , , and . We investigated different weightings. Giving both terms equal weights, that is, setting , turned out to be a good choice. We did two types of evaluation: an evaluation on all frames and an evaluation using only a single frame of each sequence, namely, the one which provides the best similarity measure.

3.1. Evaluation on All Frames

We computed a registration for each contrasted frame and compared the result to the manual registration of the physicians. These results are clinically relevant if the physician requires a registration for a desired frame of his choice, for example, depending on the breathing phase. As potentially every frame could be selected by the physician, the overall registration accuracy should be high. The overall accuracy is also of importance if the results are postprocessed by temporal filtering.

The evaluation results are presented in Table 1. The overall interuser variability observed in the manual registrations was  mm. Considering all sequences, performed significantly better than , , and the state-of-the-art method by Thivierge-Gaulin et al. [16]. Although gave significantly worse results than all other measures, a combination with could improve the results of and significantly. For subsequent injections, performed significantly better than , , their respective combinations with , and the method by Thivierge-Gaulin et al. [16].

For and , the error distribution for each sequence is shown in Figure 8, first for all initial injections and then for subsequent injections. In Figure 9, the error distribution is given for each frame number as counted after the CA injection. An example for a result is shown in Figure 7.

3.2. Evaluation on a Single Frame

Figure 8 indicates that many sequences have at least one frame with a registration error of less than 5 mm. In fact, there exists such a frame for over 75% of all sequences when using or and for over 85% of the sequences when using , , , or . Zhao et al. [17] proposed to automatically find a good frame in each sequence and consider only these single good frames for evaluation. In other words, from all frames of a sequence, a single frame was automatically chosen for registration. From a clinical point of view, this would, however, only make sense if the physician accepted an automatically selected frame for registration or if further motion compensation steps for the other frames were done, for example, using device tracking [26].

Zhao et al. suggested selecting this single frame as follows: For each frame out of the contrasted frames, the transform that maximizes the similarity measure for this frame is computed. In a second step, going through all frames of a sequence, the frame with the highest overall similarity measure is picked. The associated transform is denoted by . The error to the average manual registration result is then computed as

The evaluation results are presented in Table 2. Recalling Section 3.1, where we found that the CADE method yielded good results for all frames, the CADE performance for a single frame was, however, not that good. This is why we investigated if a different single-frame selection strategy would lead to better results. Instead of using both for translation estimation and for best-frame selection, we used only for estimating the translation for each frame . Among the obtained translations , we selected the frame with the corresponding translation such that was maximized. So (14) was modified toThe results of this frame selection method are denoted by . We did the same also for the combination with . Although only 10 sequences were available for initial contrast injections, , , and gave significant better results when compared to . Also the performance of was significantly less. For subsequent injections, our proposed method and the corresponding combination with outperformed the method by Thivierge-Gaulin et al.

3.3. Temporal Filtering

The error estimate function and the transition probability covariance matrix were trained in a leave-one-patient-out cross-validation. The results for the temporal filtered frames are given in Table 3. For a combination of the CADE and edge similarity measure, Markov filtering reduced the mean error to  mm for initial injections and  mm for subsequent injections. The respective median errors were 4.0 mm and 5.7 mm. The results obtained by the Markov filtering were in all cases significantly better than the results without filtering.

3.4. Runtime Performance

The image preprocessing took 0.5 s on an Intel Xeon E3 with 3.4 GHz and 16 GB RAM. The evaluations of the similarity measures were performed completely on the GPU. On an NVIDIA GeForce GTX 660 the evaluation of , , or took 1.8 ms for a given translation and  ms for . The whole registration for a single frame took 2.9 s for or and  s for . For a combination with it took 5.8 s and  s, respectively. The runtime of the Markov filtering was  ms per sequence.

4. Discussion and Conclusions

4.1. Similarity Measures

Our novel CADE-based method outperformed the shadow based similarity measures and , especially for reregistration sequences where only a small amount of contrast agent was used. The problem with nondistinct optima for shadow based similarity measures which was already sketched in Figure 2 becomes also apparent in the sections of the optimization function in Figure 10 for : for a small amount of contrast agent like in Figure 10(a), the optimum is not at the position of the ground truth, but about 15 mm off. Also for secondary injections into small structures such as the pulmonary veins, the objective function for in Figure 10(c) has a large plateau. In both cases, is more distinct, although clear extrema as for bones [12] are not obtained. Especially for secondary injections where CA was often injected into pulmonary veins, registration errors leading to inconsistent results could be avoided by . For well-contrasted sequences, the improvement by was less as inconsistencies played only a minor role and the shape of the contrasted region in the image was more similar to the projected shadow of the left atrium. This becomes also apparent in Figure 10(b) where the shapes of the objective function of and look very similar.

We found that the similarity measure using explicit apparent edges, , yielded poor results when used on its own. A possible reason is that the objective function of has several local optima and also the global optimum does not necessarily correspond to the ground-truth position. However, can improve results significantly when combined with other similarity measures. It often has a more distinct local optimum at the position of the ground truth that facilitates fine registration. This is important as the other similarity measures define a rather plateau-shaped optimal region.

4.2. Time-Dependency

Figure 9 shows that the accuracy of the registration depends on when the frame was recorded after contrast agent injection. The best results were achieved at the 5th frame. This was in many cases the frame before the ejection into the ventricle; that is, it contains the most contrast agent. Especially for the first frames, where only little CA was present, registration accuracy was lower. The first frame is an exception, as it has a lower mean error than the second frame. This is probably because the previous frame, that is, the last uncontrasted frame, is used as mask frame for DSA computation which leads to low motion artifacts and a better extraction of the contrasted region.

4.3. Best-Frame Selection

We found that, for our registration to work best, a well opacified frame should be selected from the sequence. This strategy was evaluated in the context of the single-frame evaluation by selecting the frame which had the best objective function value. For methods like and , which are based on the projection shadow, the best frame usually corresponded to the most contrasted frame. The measure , however, is not related to the amount of contrast agent, but it depends on consistency. When relying on for best-frame selection, we get the frame yielding the most consistent registration. This is, however, not necessarily the frame with the most contrast agent. And it is usually the frame with the most contrast agent which provides the most information for registration. We believe that this is the reason why the errors in Table 2 for are higher. If, however, from the translations estimated by , the single frame was selected based on , better results were achieved as now again the frames with the most contrast agent were selected.

Sometimes, the automatic best-frame selection does not provide a frame with a satisfactory result. Still, 85% of all sequences contain at least one frame that is below a clinical relevant threshold of 5 mm [27, 28]. To benefit from the fact that at least one frame with a good registration result is likely to be found, the frame selection could be performed manually by stepping through the frames and the associated registration results. The user can then quickly select the frame with the best registration result. Although this implementation still involves user input, the required user interaction is less than for fully manual registration.

For the remaining 15% of the cases or if higher accuracy level, for example, 3 mm [29], is required, small manual adjustment may be necessary in these cases.

4.4. Temporal Filtering

Temporal filtering based on the Markov chain could reduce the error significantly. Particularly high errors for frames at the beginning or at the end of the contrast injection were reduced. If a registration is not desired for an arbitrary frame but rather for a frame determined by the physician, for example, based on the breathing phase, temporal filtering should be applied.

4.5. Runtime Performance

It is notable that the runtime for was considerably higher compared to, for example, . This is because the contrast agent distribution needs to be updated in each iteration before projecting it. Moreover, this part of the implementation has not yet been optimized. Although the fast runtime of, for example, will remain out of reach, we are confident that the runtime for could be reduced to a clinically acceptable level.

4.6. Impact of Clinical Issues

In general, the performance for initial registration was better, as the amount of contrast agent was higher and fewer catheter artifacts were present. We found by visual inspection that also the amount of breathing motion had an impact on the registration accuracy as it caused motion artifacts in the DSA computation. As a consequence, if the patient was not anesthetized, acquisition of the CA injection should be performed under breath-hold or shallow breathing. For cases with intubation, also jet ventilation [30] or a small period of apnea could be applied to reduce breathing motion. Since our data was not acquired using a dedicated DSA program, the different brightness levels within a sequence changed. This interfered with the subtraction image computation. Although intuitively appealing, our data did not allow us to make a statement if the registration accuracy depends on the part of the LA in which CA was injected.

For initial contrast injections, an average accuracy of mm and a median error of 3.7 mm could be achieved using together with the best-frame selection approach. This error is in the range of the interuser variability of mm. The faster registration method using reached an accuracy of mm. These numbers are also similar to the accuracy reported when performing manual registration based on segmentations of the CS, the whole heart, and the spine [7] or a segmentation of the CS when 3D and 2D data are in the same breathing and cardiac phase [31] which was achieved by ventilation and rapid pacing of anesthetized patients. However, our method does not require structures other than the LA to be segmented and requires no anesthesia. Compared to a registration based on the CS alone [32], the error was reduced by over 50%.

The average accuracy for all frames after applying temporal filtering is 5.7 mm for and 6.1 mm for . Though the results of are close to the 5 mm threshold [27, 28], it remains open if this is sufficient for a clinical application, but we believe that these results should at least provide users with an acceptable initial estimate for further manual adjustments.

4.7. Future Work

By now, the registration method was evaluated only for left atria. In a next step, the performance of this approach to contrast-based registration of the right atrium should be evaluated. A registration based on the right atrium would also provide a registration for the LA which is available for transseptal puncture or one of the ventricles could be assessed. Due to the lack of pulmonary veins and arteries, the ventricles have a more simple anatomical structure but are subject to stronger cardiac motion. How strongly this affects the registration accuracy is open.

5. Conclusions

Compared to the approach by Zhao et al. [17], the use of special weights for different heart regions is not needed for any of the proposed approaches. In addition, for , a time-consuming DRR generation can be avoided. As a result, a registration approach based on a combination of shadow and edge features can be computed fast. If sufficient computational power was available, the novel CADE-based measure, which takes consistency into account, should be used as it improves results significantly, especially when very small amounts of contrast agent are injected.

Disclosure

The concepts and information presented in this paper are based on research and are not commercially available.

Conflict of Interests

Andreas Maier and Joachim Hornegger have no conflict of interests. Matthias Hoffmann, Christopher Kowalewski, and Klaus Kurzidim are supported by Siemens Healthcare GmbH, Forchheim, Germany. Norbert Strobel is an employee of Siemens Healthcare GmbH, Forchheim, Germany.

Acknowledgments

This work was supported by the German Federal Ministry of Education and Research (BMBF) in the context of the initiative Spitzencluster Medical Valley-Europäische Metropolregion Nürnberg, Project Grants nos. 13X1012A and 13EX1012E, respectively. Additional funding was provided by Siemens Healthcare GmbH. Andreas Maier and Joachim Hornegger gratefully acknowledge funding of the Erlangen Graduate School in Advanced Optical Technologies (SAOT) by the German Research Foundation (DFG) in the framework of the German excellence initiative.