#### Abstract

This paper presents an algorithm for the classification of targets based on the fusion of the class information provided by different imaging sensors. The outputs of the different sensors are combined to obtain an accurate estimate of the target class. The performance of each imaging sensor is modelled by means of its confusion matrix (CM), whose elements are the conditional error probabilities in the classification and the conditional correct classification probabilities. These probabilities are used by each sensor to make a decision on the target class. Then, a final decision on the class is made using a suitable fusion rule in order to combine the local decisions provided by the sensors. The overall performance of the classification process is evaluated by means of the “fused” confusion matrix, i.e. the CM pertinent to the final decision on the target class. Two fusion rules are considered: a majority voting (MV) rule and a maximum likelihood (ML) rule. A case study is then presented, where the developed algorithm is applied to three imaging sensors located on a generic air platform: a video camera, an infrared camera (IR), and a spotlight Synthetic Aperture Radar (SAR).

#### 1. Introduction

The ability to quickly and reliably recognize non cooperative targets is of primary importance for surveillance operations in Homeland Security (HS) applications. The development of efficient fusion strategies and the improvements in the design of more reliable sensors have increased the research interest in the classification techniques in many fields. In particular, automatic surveillance systems based on imaging sensors are gaining significant interest, as proved by recent research work [1–3] addressed to improve the quality of image-based surveillance systems. The recognition techniques can include approaches based either on the human interpretation of the data provided by a sensor system or on automatic methods. The Automatic Target Classification (ATC) techniques can use data coming from sensors of different nature, such as infrared, electro-optic cameras, and radar systems. As described in [4], the process of target recognition can be conceptualized as composed by five levels or subprocesses: the detection, that is, the process of distinguishing the target from thermal noise; the discrimination, that is, the capability to extract potential targets from surrounding clutter; the preclassification, that is, a sort of prescreen in order to exclude targets that are not of interest from further processing; the classification, that is, the process during which the targets are characterized as belonging to a specific class according to some particular features; and, finally, the identification, that is, a more sophisticated process which may refer to the individuation of the target cooperativeness or to the extrapolation of more specific features, for example, in a maritime environment the type or the name of a ship previously associated to a naval class. This work assumes that the first three processes [5] have occurred and is only concerned with the classification process. In particular, we investigate how data coming from sensors of different nature can be combined to improve the classification task. In ATC the classification task can be accomplished using several approaches. A model-based technique uses a model of the target, obtained, for example, by a Computer-Aided Design (CAD) or an Electro-Magnetic (EM) simulation [6], to compare the simulated models with the signature of the target under test. The computational load of this methodology can be very high, especially when more than one sensor are used. Another methodology can consist of collecting many real versions of the target signature and of comparing them with the signature of the current target under test, but in this case a very large database is required and if the target (or observing environment) changes significantly the classification process may be unsuccessful.

In this work a classification algorithm is developed, which uses a different approach, where the information on the target class is provided by imaging sensors of different nature and it is expressed by means of a confusion matrix (CM). This approach allows us to overcome the difficulties related to the high computational load of the methodologies described above and to insert the classification task in the analysis and simulation of a large and complex system. The CM is analytically computed for each imaging sensor and it models the performance of the sensor during the classification process. The entries of this matrix are the conditional error probabilities in the classification and the conditional correct classification probabilities. These probabilities amount to the target class likelihood functions and are used to make the decision on the target class by each sensor. The sensor CM is analytically computed as a function of its sensitivity features, its resolution, and using a database of prestored reference images. Then a final decision on the class is made, using a suitable fusion rule in order to combine the decisions coming from the different sensors. The overall performance of the classification process is evaluated by means of the “*fused*” confusion matrix, that is, the CM pertinent to the final decision on the target class. Two fusion rules are considered to combine the class information coming from the different sensors: a majority voting (MV) rule and a maximum likelihood (ML) rule*.* The ultimate purpose of this fusion process is to combine the outputs of the different imaging sensors to obtain an accurate and reliable estimate of the target class. This analytical approach is then applied to a case study, where three imaging sensors located on a generic platform, performing in coastal surveillance, are considered: a video camera, an infrared (IR) camera, and a spotlight Synthetic Aperture Radar (SAR). The final information on the class, considered by means of the “*fused*” CM, could then be exploited by the system where the sensors are located to perform other surveillance operations, such as the evaluation of the threat level of a noncooperative target.

In [7], different levels of abstraction in the fusion of data coming from different imaging sensors are described: the signal-level fusion is the combination of signals from different sensors, performed before the production of images; the pixel-level fusion consists of merging different digital images; and the feature-level fusion extracts specific features from different image and combines them. The approach developed in this work, using the CM to model the classification capability of the imaging sensor, refers to a higher level of abstraction. A similar approach, where the CM is used to model the classification capability of the sensor, is also used in [8], but there the classification process is used to support the data association and to improve the tracking, especially in presence of association uncertainty in kinematic measurements. In the literature many applications are proposed where radar images are combined with images from different kind of sensors [7, 9–11] or where heterogeneous data sets coming from dissimilar imaging sensors are combined at an information fusion level [12]. In [13] we have described the classification algorithm based on the CM applied to visible and infrared images. In the present work three sensors are considered instead of two. In particular, we have considered electromagnetic simulated images from a spotlight SAR, in addition to those from the visible camera and from the IR sensor. The results of a similar case study have already been presented in [14], in relation to three sensors. In the work presented herein we present a more complete and methodical description of the algorithm, we show more details about the numerical case study considered and the figures of the simulated images, and we report in the appendix the entire mathematical details of the analytical computation of the CM.

The main contribution of the proposed classification algorithm is the development of a methodology that allows us to emulate and incorporate the classification process in the study and simulation of a complex multisensor system, without increasing the computational load of the overall simulation. In fact, in [15] the proposed algorithm has been inserted into the simulation of a multisensor system for coastal border surveillance, without increasing the computational load of the whole simulator.

The paper is organized as follows. Section 2 describes the classification algorithm, based on the analytical computation of the CM. The fusion of the decisions on the target class coming from different imaging sensors is presented in Section 3. Two decision rules are considered, that is, a majority voting (MV) rule and a maximum likelihood (ML) rule. The performance of the decision rule is described in Section 4. In Section 5 a case study is illustrated, where the developed algorithm is applied to three imaging sensors located on a platform performing in a maritime surveillance scenario: a video camera, an IR camera, and a spotlight SAR. The numerical results for this case study are presented. Finally, in Section 6 some conclusions are drawn. The analytical details of the computation of the CM are reported in the appendix.

#### 2. The Classification Algorithm

The generic entry of the CM of a classifier is the probability that a target belonging to the class is misclassified as belonging to class :

where *H _{i}* represents the hypothesis that the target belongs to class

*i*and represents the event the

*k*th sensor decides for

*H*. Thus the

_{j}*i*th row of the CM represents the event the true class of the target is and the class likelihood function for the sensor output

*j*is the

*j*th column of

**C**[8]. The off-diagonal elements of the CM represent the conditional error probability during the classification and the diagonal elements are the conditional correct classification probabilities for a given sensor, under the hypothesis

*H*:

_{i}
Then the correct classification probability for the *k*th sensor is

where *M* is the number of hypotheses (i.e., the number of classes considered), the term is the conditional correct classification probability given by the diagonal element of the CM, and is the probability that the *i*-th hypothesis is true. The error probability conditioned to the *i*th class, for the *k*th sensor, is

The entries of the CM are used to model the performance of each sensor during the classification and to make the decision on the target class. This means that a target detected by the system is declared as belonging to class *j* with a probability equal to derived from the elements of the CM and this probability is used by the sensor as a threshold for the decision on the class. More specifically, in order to associate a class to an incoming target, a random variable uniformly distributed in the interval is generated and it is compared with the threshold given by the entries of the sensor CM:

This is done to simulate the classification event without generating the data. The simulation of the classification event based on the elements of the CM is shown in Figure 1.

In the classification approach described here, the entries of the CM are computed in an approximated closed-form by means of an analytical evaluation, whose details are described in the appendix. The parameters required for the analytical evaluation for each sensor are: (i) the signal-to-noise ratio (*SNR*) at the output of the sensor; (ii) the sensor resolution; (iii) a set of reference images stored in a database; and (iv) the cross-correlation between the images of the database. The CM can be expressed as the following function:

where *SNR* is the signal-to-noise ratio, *N _{H}* and

*N*represent the sensor resolution in terms of number of pixels on the horizontal and vertical planes, respectively, and

_{V}*M*is the dimension of the reference database. In order to simplify the analysis, the following assumptions are made:

As described in more details in the appendix, the computation of the entries of the CM in the *i*th row is derived from the computation of the classification error probability for the *i*th class. The error probability is computed in an incremental way, by defining the elementary error event in the space composed by all the possible hypotheses *H _{1}, *

*, H*and by adding the contribution of this event to the overall error probability. The partial contributions for the

_{M}*i*th class are assigned to the off-diagonal elements . The diagonal elements can be computed as

In the case considered here, the dimension of the reference database is equal to the number of classes considered, since the hypothesis of exhaustive database is made. The images of the reference database for each sensor can be derived from a CAD model of the target. The algorithm for the computation of the CM is schematically represented in Figure 2. An example of database construction is mentioned in the case study of Section 5 and described in [13].

#### 3. Fusion of the Decisions on Target Class

The purpose of the fusion process is to combine the outputs of all the imaging sensors in the system to obtain an accurate and reliable estimate of the target class. As stated before, the performance of each imaging sensor during the classification process is modelled by means of its confusion matrix .

The fusion process is described in Figure 3 in the case of imaging sensors. For simplicity, let us consider sensors. For each imaging sensor, the CM matrix is analytically computed as described in Section 2 and its entries are used to make a local decision on the class, that is, , , and . Then these local decisions are combined using a suitable decision rule. Thus the observed data is a three-dimensional vector whose elements are discrete random variables (r.v.) that take values in the set , where is the number of classes considered, and represent the decision on the target class coming from each imaging sensor. Moreover, we assume that the elements of are mutually independent, that is, the decisions made by different sensors are independent.

Let us consider the set , that is, the set of all the observable sequences of elements, that can be constructed with the elements of the set . The dimension of is . Our purpose is to map the three-dimensional vectors into a scalar value belonging to the set** I**_{S} and representing the final estimated class of the target, that is, in Figure 3. This means that there are possible hypotheses . We assume that these hypotheses have the same a priori probability:

We indicate with the fusion rule, that is, the function that maps the observed vector into a final decision in favour of one of the hypotheses:

where represents the event we decide in favour of . This approach, based on the fusion of the decisions made by each sensor through the CM entries, allows us to manage the combination of information coming from very dissimilar imaging sensors and to compensate for the sensor parameter differences, such as the fields of view, the resolutions, and the noise features. The overall performance of the fusion process can be expressed by means of a “*fused*” confusion matrix, that is, the matrix pertinent to the final decision on the target class *d _{f}*. Two fusion strategies are investigated and compared in this work: one based on the majority voting decision rule and the other based on the maximum likelihood decision rule.

##### 3.1. Majority Voting Decision Rule (MV)

The majority voting decision rule consists in choosing for target class that occurs more times in the observed sequences. In the case of the three-dimensional sequences considered here, the MV rule can be analytically expressed as follow

where is the number of times the value *q* appears in the sequence , that is, the number of occurrences of the *q*th class in the observed sequence. When for and 3, that is, , the MV rule is not applicable. In these cases, we choose the class in favour of which the “*more reliable sensor*” has decided. Note that the more reliable sensor is the sensor for which the conditional correct classification probability, given by the diagonal elements of the CM, is higher. For instance, if the sequence is observed, the final class will be for which is maximum, for = 1, 2, 3. In this example we have: , , and ; then we consider the diagonal elements for the first sensor, for the second sensor and for the third one and we decide

The observable three-dimensional vectors are all the possible sequences of objects that can be made with the elements of the previously defined set . Table 1 shows all the 64 observable sequences. According to the decision rule described by (10), we can construct a fusion table for MV decision rule, as shown in Table 2. The last column of the table contains the final decision on the target class made according to the MV rule. Using this table, we can construct the “fused” matrix **,** after the fusion of the information on the target class:

The entries of this matrix are

where is the decision zone of , that is, the set of *m*’s for which we decide in favour of . It is defined as

The elements of the sum can be computed as:

that represents the product of the entries of the CMs of the three sensors:

with .

Note that the sum with respect to of the elements in (13), that is, the sum of the elements of each row of the fused CM, is equal to 1 by construction. In fact, the combination of elementary events (i.e., single-sensor decision events) belonging to three distinct probability sets, where probability sums to 1, will provide a set of probabilities whose sum will be again equal to 1.

##### 3.2. Maximum Likelihood Decision Rule

In many applications, the most common approach utilized to distinguish between two or more hypotheses is based on the *Bayes rule*, that assume a priori knowledge of the probabilities of the hypotheses under test. The Bayes rule is based on the minimization of the expectation of the cost function *C _{ij}*, defined as the cost assigned to the decision to choose in favour of

*H*when

_{j}*H*is true [16]. The analytical formulation of the Bayesian approach applied to the decision on the target class is

_{i}
The rule expressed by (17) is called *M-ary maximum a posteriori probability* (MAP) decision rule, since is the probability that the hypothesis *H _{j}* is the true one after the observation of the data

**d**, thus it is an a posteriori probability. As stated before, this decision rule assumes prior knowledge of the likelihoods of the hypotheses.

According to the Bayes theorem, the a posteriori probability can be expressed as

where is the prior probability of the *j*th hypothesis and is the probability mass function (pmf) of the discrete data . Since is a positive function that does not depend on the hypothesis, it does not affect the maximization of . When the assumption of equal prior probability of the hypotheses can be done, the decision rule of (17) can be expressed as

This is called *M-ary maximum likelihood* (ML) decision rule, since is the likelihood function of the *j*th hypothesis. Note that the decision rule (19) provides the minimum error probability only when the prior probabilities are all equal.

According to the ML rule, in order to decide the final class of the target using the observed data sequences , we have to choose for the hypothesis that maximizes the following probability mass functions:

where

The elements of the product in (21) are the entries of the confusion matrices , , and , respectively. The joint conditional probability mass function of can be expressed as follows:

This is shown in Figure 4 for the sequence . According to the ML decision rule described above, we can derive a fusion table for all the observable sequences, as shown in Table 3. The last column of the table contains the final decision on the target class made according to (19). Similarly to the case of the MV rule, from this table we can evaluate the fused confusion matrix , by using (13) and (14).

#### 4. Performance Analysis

The performance of the decision rule can be expressed in terms of closeness of the fused confusion matrix to the identity matrix, which represents the ideal case. In fact, an ideal classification process is characterized by probabilities of error (off-diagonal elements of the CM) equal to zero and by probabilities of correct classification (diagonal elements) equal to one, that is,

where is the identity matrix of order . The conditional correct classification probabilities for the fused matrix can be expressed from its diagonal elements, similarly to those of the CMs of the sensors, , :

The probability of correct classification is then

where in the last part of the equality, we have used the assumption (8) of equal a priori probability for the hypotheses. By replacing (24) in (25), we obtain

where we have considered that the sum of the diagonal elements represents the trace of matrix . To evaluate the performance of the fusion process, we consider the probability of correct classification expressed in (26) and we select as the best performing matrix the one for which the probability of correct classification, and then the ratio , is the nearest to 1, that is, its maximum value. This occurs when the trace of matrix at the numerator is close to , which is the trace of the identity matrix. From this point of view, the correct classification probability is an indication of the closeness of matrix to the identity .

The same performance criterion can be explained by considering an alternative interpretation. In order to evaluate the closeness of the fused matrix to the identity, we can define the following quality factor [14]:

The parameter belongs to the interval and it is close to 1 when the fused matrix is very close to identity and it is close to 0 when is significantly different from identity. As we can see by comparing (26) and (27), the parameter is equivalent to the probability of correct classification. Thus, the best fused matrix is the one for which this quality factor is nearest to 1, that is also the maximum value of the correct classification probability. The difference at the numerator of the first side of expression (27) represents the sum of the off-diagonal elements of :

This property is due to the fact that the sum of all the elements of the matrix is equal to , that is due to the fact that the sum of the elements in each row of the CM is equal to 1.

#### 5. Numerical Case Study

In this section a case study is presented, where the developed algorithm is applied to three imaging sensors located on a generic air platform: a video camera, an infrared camera (IR), and a spotlight Synthetic Aperture Radar (SAR). A numerical example, concerning the classification process performed by the three imaging sensors, is provided, for four classes of naval targets. This case study allowed us to include and test the algorithm proposed in this work inside the simulation of a complex multisensor system, which performs its operations in a realistic scenario for maritime border surveillance. The considered system is notional. The numerical values considered in this example reflect a typical maritime situation, with standard environmental conditions.

##### 5.1. The System and the Scenario

The background of this case study is represented by an integrated multisensor system for the coastal surveillance. The focus herein is on the classification process, in particular on the fusion of the target class data coming from different imaging sensors. This is a part of a research activity whose final goal is the development of a computer simulator that emulates the main functions of the integrated multisensor system for coastal surveillance (see [15, 17, 18]). The integrated multisensor system is composed of two platforms of multiple sensors: a land-based platform, located on the coast, and an air platform, moving in front of the coast. The land platform is equipped with a Vessel Traffic Service (VTS) radar, an infrared camera (IR), and a station belonging to an Automatic Identification System (AIS) that provides an information on the target cooperativeness. The air platform carries an Airborne Early Warning Radar (AEWR), which can operate in a spotlight SAR mode, a video camera, and a second IR camera. The mission of the system is the detection, the tracking, the identification and the classification of multiple targets that enter a sea region, the assessment of their threat level and the selection of a suitable intervention on them. The threat evaluation and the selection of the intervention are performed by a command and control centre (), which coordinates all the operations of the multisensor system. The threat evaluation logic is based on a deterministic comparison between the target kinematical parameters detected by the two radars and some tolerance thresholds on the speed, on the distance between the target and the coast, and on the direction. This logic also takes into account for the class information provided by the imaging sensors inside the system. The three imaging sensors of the air platform are considered herein. After that the decision on the target class is made by each imaging sensor, according to the algorithm described in Section 2, the fusion of the decisions is performed as described in Section 3. The information on the class is generally not very reliable for long distance from the air platform, when only the spotlight SAR is active, and it becomes more reliable when the target is closer to the platform, when also the video camera and IR camera are active.

According to the evaluation logic a *Threat Level* is assigned to the non-cooperative targets in the set TL0, TL1, TL2, where TL0 indicates a neutral target, TL1 a suspect target, and TL2 a threat target. The intervention is only selected for the targets assessed as threat and it consists in the allocation of a system resource in order to inspect the nature of the target. Two types of resources are considered here: a helicopter and a patrol boat; both the resources are used only for the target inspection [15, 18]. The architecture of the surveillance multisensor system we refer to is shown in Figure 5. The simulated scenario is composed of: the geographical area considered, the position of the sensors in this area, the multiple naval targets that enter the scene, and the resources of the system. Four classes of naval targets are considered in this scenario.

*Class 1*: high speed dinghy;(ii)

*Class 2*: motor boat;(iii)

*Class 3*: fishing boat;(iv)

*Class 4*: oil tanker.

##### 5.2. Numerical Results

The CMs of the three sensors have been computed by considering the analytical algorithm described above, for the four classes of naval targets. The analytical computation of the CM requires the setup of a database of reference images. In this numerical example the reference database for each sensor is composed of simulated images, no real data have been considered for now. This database has been constructed using a three-dimensional (3D) CAD model of the naval targets considered in the scenario. The same CAD models have been exploited for the construction of the reference database for the video camera, for the IR camera, and for the spotlight SAR. The sizes considered for the naval targets reported are: (10 4.6 3)?m for the dinghy; (15 4.7 5.3)?m for the motor boat; (16 5.3 7)?m for the fishing boat; (100 33.5 37.6)?m for the oil tanker. For the video camera the image generation is simply obtained by the projection of the 3D CAD on the camera focal plane. For the IR camera, the images are simulated using a specific simulation software, the Open-source Software for Modelling and Simulation of Infrared Signatures (OSMOSIS) [19], developed at the Military Royal Academy of Brussels. For the spotlight SAR the CADs have been processed by a software for the simulation of electro-magnetic (EM) images. An example of the simulated images for the dinghy for a view angle equal to and is shown in Figures 6(a)–6(c), respectively, for the video camera, the IR camera, and the spotlight SAR. The distance between the sensor and the target is 5?Km for the video camera and 1?Km for the IR camera, where the temperature information is represented by the gray scale of the images.

**(a)**

**(b)**

**(c)**

The SNR over the single pixel of the reference images has been evaluated by considering the noise level of each imaging sensor. As it concerns the electro-optical sensors (EO) we have considered the Noise Equivalent Illuminance (NEIL) for the video camera and the Noise Equivalent Temperature Difference (NETD) for the IR camera. In both cases we have taken into account that the SNR decreases with the distance between the target and the sensor because of the atmosphere attenuation. For the video camera we have analytically computed the atmosphere extinction coefficient [20], assuming a wavelength value of 550?nm. For the IR camera the extinction coefficient has been computed by LOWTRAN (LOW resolution TRANsmission model) for standard weather conditions: a temperature equal to , a relative humidity equal to 43%, and a sea state equal to 0. For the evaluation of the SNR in the case of the spotlight SAR, we have considered the radar equation, revisited in order to take into account for the SAR geometry [21, 22]. The simulated images of the three sensors are referred to the same geometrical and environmental conditions, but the SNR value can be different from one sensor to the other, due to the different nature of the sensors. The classification approach described in this work allows compensating for the sensor parameter differences, such as the fields of view, the resolutions and the noise features.

In this case study we have assumed that the decisions coming from the three sensors are aligned in time. In a future development of this work we will also consider the time misalignment in the decisions, by introducing a delay in the fusion process in order to take into account for the sampling rate of the slower sensor.

The CMs of the three imaging sensors considered in the case study are for the video camera, for the IR camera, and for the spotlight SAR. According to the definition given in (1), the generic entry of the matrix is the following probability:

Similarly, the entries of matrices and are defined as

Moreover we have:

(i);(ii);(iii).Thus, the conditional correct classification probabilities are for the video camera; and for the IR camera; for the spotlight SAR.

In the case study a distance between the target and the sensor equal to 10?Km and a view angle equal to 180° have been considered. The resulting CMs for the video camera, the IR camera, and the spotlight SAR are reported, respectively, in Tables 4, 5, and 6. These tables show that the less reliable sensor, as it concerns the classification of the four targets considered, is the spotlight SAR. On the other side this sensor has a major coverage with respect to the other two sensors. The correct classification probability conditioned to *Class 4* (oil tanker) is always , due to the fact that the size of this class of target (100?m) is significantly different from the size of the other targets considered. The fused CMs obtained by the majority voting rule, , and by the maximum likelihood rule, , are shown in Tables 7 and 8, respectively. From these tables we can observe that the best performing fused matrix, that is, the nearest to , is the one obtained by the ML decision rule, . The goodness of the CMs in Tables 4–8 is expressed by means of the probability of correct classification that is equal to the quality factor Q defined in (27). As shown in Table 9, the value of nearest to 1 is that corresponding to the fused matrix . The fusion process can provide an improvement in the correct classification probability equal to 3.59% for the MV rule and equal to 5.24% for the ML rule, with respect to the most reliable sensor, that is, the video camera in the numerical example considered here. The value for all the CMs considered in this numerical example is also graphically shown in Figure 7.

#### 6. Conclusions

This work describes a classification algorithm based on the fusion of the class information provided by multiple imaging sensors. The classification algorithm automatically exploits the a priori knowledge provided by the sensor CM, which is used to model the sensor performance during the classification process. The entries of the CM are the conditional error probabilities in the classification and the conditional correct classification probabilities, and they are used to make the decision on the target class by each sensor. The CM is analytically computed as a function of the sensor SNR, the sensor resolution, a set of simulated reference images stored in a database, and the cross-correlation between the reference images. Then a final decision on the class is made, using a suitable fusion rule, in order to combine the decisions coming from the three sensors. The fusion, operated on the single decisions, allows us to manage the combination of information coming from very dissimilar imaging sensors and to compensate for the sensor parameter differences. The overall performance of the classification process is evaluated by means of the fused CM, that is, the matrix pertinent to the final decision on the target class. Two decision rules are described in the paper: a majority voting (MV) rule and a maximum likelihood (ML) rule. A numerical example is finally proposed where the described classification algorithm is applied to a case study where three imaging sensors are located on a generic platform. The three imaging sensors are a video camera, an IR camera, and a spotlight SAR, and they operate into a multisensor system for coastal surveillance. The final information on the class is used in the multisensor system, as a support to other processes required during the surveillance operation. This methodology allowed us to include the classification process inside the simulation of a complex multisensor surveillance system, without increasing the overall computational load [18].

As a final remark, we note that in this analysis we have assumed that a recognition process always occurs. Future developments of the described approach are expected to refine the model, by considering the possibility that the image under test is not contained in the image database and by evaluating the performance of the joint process of recognition and classification.

#### Appendix

#### Analytical Computation of the Confusion Matrix

The generic entry of the CM of a sensor is the probability that a target belonging to the class *i* is misclassified as belonging to class *j*:

where *H _{i}* represents the hypothesis that the target belongs to class

*i*. The computation of the entries of the CM in the

*i*th row is derived from the computation of the classification error probability for the

*i*th class. The error probability is computed in an incremental way by adding the contribution of the generic elemental error event to the overall error probability: these partial contributions to the error probability for the

*i*th class are assigned to the off-diagonal elements

*c*of the of the CM. The diagonal elements, representing the conditional correct classification probabilities, can be consequently computed as

_{ij}where is the number of classes considered. The elemental error event in the classification of an image belonging to a given class is defined through the correlation between the reference images and through their energy differences, compared to the variance of the noise over the single pixel. The error probability can be defined as the average of the conditional error probabilities:

Consider a database of M reference images , one for each class, and a received image . These images are represented by matrices whose dimension depends on the sensor resolution, in terms of pixels on the horizontal and vertical planes. In general, these matrices are dependent on the target coordinate along the azimuth and the elevation. In the rest of this appendix this dependence is omitted to simplify the notation. The elements of these matrices are intensities proportional to the optical power received from the target for the video camera and the IR camera, and to the target RCS for the spotlight SAR. In this analysis the following assumptions are made.

(i)The image database is exhaustive, that is, the possibility that the image of the target under test is not contained in the database is not considered. (ii)The reference images of each database do not contain any source of noise, but this is added during the analytical computation of the CM.(iii)The noise added over each image is additive, Gaussian, and independent of pixel to pixel.Let us indicate with the observation space and let us divide this space in decision zone , such that if the image belongs to the zone *Y _{k}* then the hypothesis

*H*is true. The error probability is

_{k}
where the term in (A.4) represents the probability that an image generated by the *i*th class belongs to the decision zone , thus generating an error in the classification, and the term in (A.5) represents the probability that an image belonging to the decision zone *Y _{k}* is generated by the

*i*th class, thus generating an error in the classification.

A given image, , belongs to the decision zone *Y _{k}* when

that is, when the error probability conditioned to the hypothesis *H _{k}* is minimum [23]. Equation (A.6) can be written as follows:

so that the image belongs to the decision zone *Y _{k}* when

The set of inequalities in (A.8) defines the boundaries between the decision zones.

Since the reference images have finite energy over their range of definition, that is, the sensor field of view, the set of the reference images can be seen as a vector space where we can define a scalar product, by using the correlation function between a generic pair of elements inside the set. Using the scalar product, we can describe the elements of this vector space by means of their coordinates with respect to an orthonormal base, constructed by a Gram-Schmidt orthonormalization. This representation for the reference images can be used to express the error probability given a certain hypothesis, .

In the vector space each image **I**_{i}, for , is represented by an -dimensional vector , and the energy difference between two images represented by the vectors and is

where represents the *k*th component of the vector .

Let us indicate with the vector representing the noise, assumed to be zero-mean, additive, Gaussian, and independent of pixel to pixel with variance , and with the elements of the orthonormal Gram-Schmidt base. The statistics of the noise vector are related to the sensor signal-to-noise ratio and they do not depend on the class of the image under test.

Assuming that the noise is Gaussian and that the hypotheses have the same a priori probabilities, the decision criterion of (A.8) can be expressed as

where is the vector representing the received image in the vector space. Equation (A.10) can be written as:

since the scalar quantities and are equal. Then the error event can be characterized as follows:

(i)the inequality represents the elemental error event in the classification of an image belonging to the*k*th class;(ii)by defining the normalized vector where is the cross energy between an image belonging to the

*k*th class and another image belonging to the

*j*th class, the conditional error probability can be expressed as (iii)Considering that the conditional error probability becomes and it can be divided in the following contributions: such that the conditional error probability can be computed by considering two elements in the union and by reiterating M-3 times the operation expressed in (A.17).

More details about this analytical computation can be found in [18].

#### Nomenclature

AEWR: | Airborne Early Warning Radar |

AIS: | Automatic Identification System |

ATC: | Automatic Target Classification |

C^{2}: | Command and Control |

CAD: | Computer-Aided Design |

CM: | Confusion Matrix |

EM: | Electromagnetic |

EO: | Electro-Optical |

HS: | Homeland Security |

NEIL: | Noise Equivalent Luminance |

NETD: | Noise Equivalent Temperature Difference |

IR: | Infrared |

ML: | Maximum Likelihood |

MV: | Majority Voting |

SAR: | Synthetic Aperture Radar |

TL: | Threat Level |

VTS: | Vessel Traffic Service. |

#### Acknowledgments

The authors would like to acknowledge Fabian D. Lapierre (Royal Military Academy, Brussels, Belgium) for kindly providing the temperature file exploited for the simulation of the infrared images; Ugo D’Elia and Maria Grazia Del Gaudio (MBDA, Rome, Italy), and Francesco Prodi (SELEX Sistemi Integrati, La Spezia, Italy) for kindly providing the simulated electromagnetic images exploited in this work; Paolo Marrucci (SELEX Galileo, Pomezia, Italy) for his help with the meteorological vectors for IR simulation.