Abstract

We introduce an area-based method for remote sensing image registration. We use orthogonal learning differential evolution algorithm to optimize the similarity metric between the reference image and the target image. Many local and global methods have been used to achieve the optimal similarity metric in the last few years. Because remote sensing images are usually influenced by large distortions and high noise, local methods will fail in some cases. For this reason, global methods are often required. The orthogonal learning (OL) strategy is efficient when searching in complex problem spaces. In addition, it can discover more useful information via orthogonal experimental design (OED). Differential evolution (DE) is a heuristic algorithm. It has shown to be efficient in solving the remote sensing image registration problem. So orthogonal learning differential evolution algorithm (OLDE) is efficient for many optimization problems. The OLDE method uses the OL strategy to guide the DE algorithm to discover more useful information. Experiments show that the OLDE method is more robust and efficient for registering remote sensing images.

1. Introduction

Image registration is an important step for many fields [1], such as change detection, image fusion, and object recognition. In order to provide complete information about the image, it is necessary to register the images taken from different sensors or from the same sensor at different times. The result of image registration will greatly influence the performance of the follow-up procedure. So remote sensing image registration methods should be efficient, robust, and accurate.

Image registration methods are usually divided into two categories: feature-based and intensity-based methods [2, 3]. Many feature-based methods have been proposed [4, 5]. These methods usually need to initially extract salient features, such as point, edge, contour, and region. Those features are matched using similarity measures to establish the geometric correspondence between two images. One of the main advantages of these approaches is that they are efficient and robust to noise, complex geometric distortions, and significant radiometric differences. However, they will only perform well on the condition that suitable features are extracted and reliable algorithms are used [3]. For some images, where features are not obvious, intensity-based methods perform better than feature-based methods.

The key procedure of intensity-based method is to find the optimal similarity metric. The similarity metric is to measure how closely the gray values of two images are matched. The similarity metric for remote sensing image registration must be robust. There are many commonly used similarity metrics [2, 611]. Mutual information which is based on the Shannon definition of entropy [9, 10] was widely used in a lot of current work. Mutual information has been shown to be robust and does not depend on the intensity scaling or specific dynamic range of the images [12].

The searching strategy optimizes the similarity metric. Both local and global search strategies are commonly used. Many local methods have been used in image registration [13, 14]. These local methods yield the best registration when the initial orientation is very close to the true transformation. Besides, they are still easy to be trapped in a local optimum [14, 15]. So the global optimization is often required and global methods have been successfully applied to image registration [8, 12, 1622].

The differential evolution (DE) algorithm was proposed by Storn and Price for global optimization over continuous search space [23]. DE is a version of evolutionary algorithm (EA) that has proven to be fast and reliable in many applications [2329]. The DE algorithm has shown to be efficient in remote sensing image registration [30]. The seminal idea of DE is to generate a new vector by adding the weighted difference between two trial vectors to a third vector. The new vector is defined as , where , , and are three randomly selected trial vectors from the population and is a multiplier, which is the main parameter of the DE algorithm [25].

The orthogonal experimental design (OED) offers an ability to discover the best combination levels for different factors with a reasonably small number of experimental samples [31]. Owing to the OEDs orthogonal prediction ability and test ability, the orthogonal learning (OL) strategy can construct a guidance exemplar with the ability to predict promising search directions toward the global optimum [31]. The OL strategy has been successfully applied to many areas [3238]. In this paper, we will apply the OL strategy to image registration problem. Considering the effectiveness of the DE algorithm in image registration [30], we combine the OL strategy with the DE. The main idea of our method is based on the observation that the major step in the DE can be considered to be an “experiment.” Based on this “experiment,” the OL strategy will construct a guidance exemplar with an ability to predict promising search directions toward the global optimum. In our method, the OED is used to discover the best combination of three trial vectors.

The rest of this paper is organized as follows. In Section 2, image transformation, similarity metrics, and optimization techniques are discussed. In Section 3, the orthogonal learning strategy is formulated. In Section 4, the proposed OLDE for image registration is presented. Experimental results are described in Section 5. Finally, conclusions are drawn in Section 6.

2. Image Registration

2.1. Image Transformation

The image registration process is actually a process to seek the one-to-one mapping between two images. The process links the points in two images corresponding to the same spatial position. The mapping is commonly referred to as a transformation. It is a two-dimensional transformation in a two-dimensional space. The proposed approach in this paper is to be used in image registration in two-dimensional space. We are using a widely applied affine transformation model to transform the target image. This will allow us to demonstrate the efficacy of the OLDE method in image registration.

2.2. Similarity Metrics

At the correct registration, similarity metrics must be robust. They should attain a global or a very distinct local maximum. Most of the current work on remote sensing image registration utilizes mutual information which has been shown to be robust for remote sensing image registration. Mutual information represents the relative entropy of two images [6]. The larger the value of mutual information, the better the registration of the two images. In general, given two images and , their mutual information is where is the joint entropy and the and are the entropies of and . Respectively, , , and are given as where is the marginal probability density function of  , is the marginal probability density function of , and is the joint probability density function of and . The , , and can be estimated with the Parzen windows [10]. Normalized mutual information is given as

Normalized mutual information is less sensitive to the size of overlap area. It has been proved that normalized mutual information is an accurate and robust image registration similarity metric in previous studies [6, 8, 39]. Therefore, in current study, normalized mutual information was selected as the similarity measure. In image registration, the greater the value of , the better the match between the two images. So this is a maximization problem. Many optimization problems are formulated as minimization problems, where the objective function is denoted as , and, thus, it is understood that for image registration, the goal is to minimize , without loss of generality.

2.3. Optimization of Similarity Metrics

The goal of image registration is to find the best transformation parameters to maximize the object function (similarity metric). Both image interpolation and joint probability density estimation are involved in image registration. Thus, computing input parameters for (3) is computationally expensive. Therefore, it is necessary to get an effective method to reduce this cost. The OLDE algorithm, which this paper proposes, is an efficient global method for image registration.

3. Motivation

3.1. Motivation of Orthogonal Learning Strategy

The OL strategy can discover more useful information toward the global optimum [31]. Here is a simple case to illustrate the importance of the OL strategy. Given a 3-dimension sphere function, , whose global optimum value is 0 and the minimum point is . Suppose that the current trial vectors are , , and . Furthermore, assume that the value of is 0.5, and then the new vector is . This results in the new vector with a cost value of 20, which is worse than both and . However, if we can discover good dimensions of the three vectors, we can then combine them to form three new trial vectors ,  , and . Then, the updated vector is , resulting in the updated vector with a cost value of 2.25. Thus, the object function is moving toward the global optimum value of 0.

The simple case above has illustrated the importance of designing three new trial vectors to generate the updated vector . In order to find the best combination of trial vectors, we use orthogonal experimental design (OED) [40], which can get a relatively good vector through only a few experimental tests [31].

3.2. Orthogonal Experimental Design

We use a simple example to explain OED. In this example, to yield maximum chemical product, we should find the best level combination of the three factors: temperature, time, and alkali. These three factors which will affect experimental results are shown in Table 1 and we denoted them as factors , , and . The temperature has three levels: 80°C, 85°C, or 90°C. The time can be 90 min, 120 min, and 150 min. And the alkali can be 5%, 6%, and 7%. Therefore, there are 27 () combinations of experimental designs totally. It is desirable to obtain or predict the best combination by sampling only a few representative experimental cases.

Let denote an orthogonal array, where is the number of factors and each factor has levels, is the number of the combinations of levels, and “” denotes a Latin square. Table 2 shows an orthogonal array . Each row in this table shows one combination of levels. The orthogonal array has three properties. First, for the factor in any column, every level occurs in equal times. Secondly, for the two factors in any two columns, every combination of two levels occurs in equal times. Thirdly, the selected combinations are uniformly distributed over the whole space of all the possible combinations [32]. We apply to the example of chemical experiment. An orthogonal array with three factors is shown in Table 3.

In this paper, we use OL strategy to guide the DE algorithm to select promising search directions towards the global optimum, which will enable us to achieve the best image registration results in terms of the normalized mutual information similarity metric.

4. OLDE Algorithm for Image Registration

4.1. Encoding

We make use of the aforementioned affine transformation model: where , , , , , and are six transformation parameters. Then, the transformation formulas of image are represented as where is the coordinate of the target image and is the coordinate of the transformed target image. Given 6 real parameters, , , , , , and , trial vectors are formed based on those values. Therefore, each trial vector in the initial population of trial vectors is an array with 6 positions, with the parameter vector denoted by . The initial population is randomly initialized so that each parameter can uniformly vary within a range of its own.

4.2. Population Evolution

To create a new population, both orthogonal crossover and the DE algorithm are executed consecutively. We refer to this combination as the OLDE algorithm.

4.2.1. Orthogonal Crossover Based on Orthogonal Array

Here, we introduce the process of -to- orthogonal crossover. For example, we have a population with vectors. We select of them to be the parent vectors and each parent vector with real values. The details of the orthogonal crossover that these parents are based on to produce vectors are as follows.

-to- Orthogonal Crossover

Step  1. Input parent vectors , where and has real values.

Step  2. For all , randomly and independently generate .

Step  3. For all , produce the vectors based on the th combination of factor levels in the orthogonal array .

Step  4. Evaluate the fitness of vectors , and then select of them to be the offspring (these offspring are denoted as ).

Step  5. Output offspring vectors , for all .

Example. We choose three parent vectors , , and , and each vector has 6 real values. Consider three-to-one orthogonal crossover. Let the parent vectors be In Step 2, we can choose , , , , , and . After that, using Step 3, we get 9 vectors based on the 9 combinations of the orthogonal array in Table 2 as follows: Evaluate the fitness of vectors , and then select the best one of them to be the offspring.

4.2.2. Differential Evolution Algorithm

Then we execute DE times to generate new vectors. Each new vector is created by combining three randomly selected trial vectors from the population. This combination process is defined as follows: where , , and are three randomly selected trial vectors from the population and is a multiplier, which is the main parameter of the DE.

4.3. Fitness of Image Registration Using the OLDE

We take (3) as a fitness function. Given two images and , the aim of the problem becomes finding the best affine transformation for so that the normalized mutual information of and is maximized.

4.4. The Procedure of Image Registration Using the OLDE

The procedure of the OLDE is as follows.

Step  1 (input). Input target image and reference image.

Step  2 (initialization). Within the range of 6 parameters, randomly initialize population consisting of trial vectors, denoted as . Each trial vector is made up of 6 real values, represented as , where and we initialize the generation number to 0.

Step  3 (population evolution)

Step  3.1. DE, as (8), is executed times using multiplier . A new population is generated with , .

Step  3.2. Randomly choose three vectors in to undergo three-to-three orthogonal crossover. After evaluating the fitness of vectors as (3), the worst three vectors of them are recorded. The worst three vectors are replaced with three new vectors generated by the orthogonal crossover. This results in a new population .

Step  3.3. Evaluate the fitness of vectors in , and choose the best one and record its fitness value . After that increment the generation number by 1.

Step  3.4. If stopping criterion is satisfied, then the algorithm terminates, and generate the best vector ; otherwise, return to Step 3.1.

Step  4 (transform the target image). Affine transform the target image as (5), where , , , , , and , and get (target) image.

Step  5 (fusion). Fuse the (target) image and reference image to generate the result image.

In Step 3, the population is evolved and improved iteratively until halting criterion is satisfied. One possible halting criterion is to stop when the number of generation is equal to a given maximum value. In Step 3.1, we must check that if has been out of the range of th parameters, we should replace with the initialized value , where , . In Step 3.2, execute three-to-three orthogonal crossover. An orthogonal array is generated as in Table 2. Carry out the procedure of -to- orthogonal crossover. Here, we set and . In this step, three worst vectors are eliminated. Experiments proved that the speed of convergence can be improved and the diversity of the population can be kept by replacing the three worst vectors with three vectors generated by orthogonal crossover.

5. Experiments and Discussion

To investigate the performance of our method, we have compared it against three image registration methods: genetic algorithm (GA), particle swarm optimization (PSO) [12], and the differential evolution (DE) [30] algorithm.

In both experiments, we set the size of the population and the maximum allowed number of generations . In the GA algorithm, we set , , where is the probability of mutation and is the probability of crossover of GA algorithm. In the PSO algorithm, the weight is declining linearly from 0.9 to 0.4 and . In the DE algorithm, we set the multiplier and the probability of crossover . In the OLDE algorithm, we set initial value , where is the multiplier of DE procedure in the OLDE. The allowed variation ranges of 6 affine transformation parameters used in our experiments are shown in Table 4. To test the performance of all methods, we have run each method for 20 times. We record the value of the normalized mutual information (NMI) for the best solution , the worst solution , and the mean value over the 20 times and is the standard deviation (see Tables 5 and 6).

5.1. Ottawa Dataset Task

In the first experiment, we select two images from Ottawa dataset as shown in Figure 1 as the experimental images. These two images are the portions of the city of Ottawa acquired by RADARSAT SAR sensor in May 1997 and August 1997, respectively. They were provided by Defense Research and Development Canada (DRDC), Ottawa. Figure 1(a) shows the image acquired in May 1997 during the summer flooding and Figure 1(b) shows the image acquired in August 1997 after the summer flooding. The resolution of both images is 290 × 350 with 8 bits per pixels.

In Figure 2, image registration results are shown only for the best experiment out of 20 experiments; that is, an image for each search method corresponding to the highest NMI value is shown in the figure. The matching image is created by one superimposed on the other. Table 5 records the statistical results for each method. The table also contains optimal transformation parameter values obtained by each optimization method. Our method can get the best result compared with the other three methods. In addition, our method can receive much smaller value than the other methods. In fact, the standard deviation of 0.0002 for the OLDE is very small. Therefore, the OLDE method is robust with respect to the initial parameter values.

In Figure 3, the NMI values are shown as a function of the generation number. As shown in the figure, the OLDE has a much higher NMI value than the other methods. Thus, our method outperforms all the comparison methods.

5.2. Yellow River Dataset Task

In the second experiment, we use two images acquired by RADARSAT-2 at the Yellow River Estuary region in China in June 2008 and June 2009 as the experimental images. The two images are shown in Figure 4. The resolution of both Figures 4(a) and 4(b) is 600 × 500 pixels with an 8-bit dynamic range.

The numerical results of independently running the optimization algorithms 20 times are shown in Table 6. Also in this case, the OLDE method is the best method in terms of the NMI. Our method can receive much smaller value than DE, PSO, and GA. Therefore, the OLDE method is robust with respect to the initial parameter values.

In Figure 5, the NMI values are shown as a function of a generation number. As can be observed from the results, with the increase of generation, the performance of the other two methods, PSO and GA, is far worse than the OLDE method and the DE method. In more detail, after the 60th generation, the OLDE method emerges as a better method than DE method. Therefore, using OL to guide DE can improve the performance of DE method.

In Figure 6, image registration results are shown only for the best experiment out of 20 experiments; that is, an image for each search method corresponding to the highest NMI value is shown in the figure. The best results for each one of the four optimization methods are included in the figure.

In the third experiment, we also select two images acquired by RADARSAT-2 at the Yellow River Estuary region in June 2008 and June 2009. The two images are shown in Figure 7. The resolution of both images is 400 × 400 pixels with an 8-bit dynamic range.

The numerical results of independently running the optimization algorithms 20 times are shown in Table 7. Also in this case, the OLDE method can receive the largest value of the NMI and much smaller value. In Figure 8, the NMI values are shown as a function of a generation number. In Figure 9, image registration results are shown only for the best experiment out of 20 experiments; that is, an image for each search method corresponding to the highest NMI value is shown in the figure. The best results for each one of the four optimization methods are included in the figure.

6. Conclusion

This paper proposed a method for remote sensing image registration using the orthogonal learning differential evolution (OLDE). The orthogonal learning (OL) strategy can construct a guidance exemplar with an ability to predict promising search directions toward the global optimum. Differential evolution (DE) is a version of evolutionary algorithm (EA) that has proven to be fast and reliable in many applications. The OLDE method uses the OL strategy to guide the DE algorithm to select promising search directions towards the global optimum.

To investigate the performance of our method, we have compared it against three image registration methods: genetic algorithm (GA), particle swarm optimization (PSO), and the differential evolution (DE) algorithm. The OLDE method was shown to be able to achieve the best image registration results in terms of the normalized mutual information similarity metric. Furthermore, experiments showed that the OLDE method was robust and efficient with respect to the initial parameter values.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.