#### Abstract

For real-time acoustic source localization applications, one of the primary challenges is the considerable growth in computational complexity associated with the emergence of ever larger, active or passive, distributed sensor networks. These sensors rely heavily on battery-operated system components to achieve highly functional automation in signal and information processing. In order to keep communication requirements minimal, it is desirable to perform as much processing on the receiver platforms as possible. However, the complexity of the calculations needed to achieve accurate source localization increases dramatically with the size of sensor arrays, resulting in substantial growth of computational requirements that cannot be readily met with standard hardware. One option to meet this challenge builds upon the emergence of digital optical-core devices. The objective of this work was to explore the implementation of key building block algorithms used in underwater source localization on the optical-core digital processing platform recently introduced by Lenslet Inc. This demonstration of considerably faster signal processing capability should be of substantial significance to the design and innovation of future generations of distributed sensor networks.

#### 1. Introduction

Acoustic source localization by means of distributed sensor networks requires very accurate time delay estimation. Also, due to phenomenon like reverberation or environmental additive noise, the intrasensor distance cannot be made very large without reducing the coherence between the signals whose mutual delay has to be estimated. The use of passive sensor arrays for estimating the position of a generic acoustic source represents an old and well-investigated area. Time delay estimation techniques have been applied extensively to this area. Many of these techniques are specific to the geometrical configuration adopted for array placement thus imposing heavy restrictions on the choice of sensor configuration. For example, in the area of naval surveillance, much attention has focused on adaptive beam-forming, primarily in the context of rigid-geometry towed arrays [1–4]. Recently, however, a great deal of effort has been devoted to the extraction of spatiotemporal information from a matrix of spatially distributed sensors [5]. Some very innovative schemes for the deployment and efficient performance of distributed sensor networks have surfaced. The concept of vector hydrophone was introduced to capture the vector characteristics of impinging underwater acoustic fields [6]. In contradistinction to conventional arrays, where the Time Difference of Arrival (TDOA) is embedded in the spatial phase offsets of the sensors, here the TDOA is captured through the intrinsic directionality of each component of the vector hydrophone. Consequently, this technology requires no a priori information on the signal frequency, and avoids complications related to possible near-field curvature effects. Another example is the spatiotemporal inverse filter [7], a focusing technique developed primarily for medical imaging but with clear underwater acoustics applicability, in which the space and time propagation operator relating the signal source to the sensor array is inverted in the Fourier domain. Notwithstanding the considerable progress reported over the years, today's leading paradigms for acoustic source localization still face substantial degradation in the presence of realistic ambient noise and clutter [8]. Consequently, researchers have started focusing on previously unexplored areas to pose novel solutions for signal processing in distributed sensor networks. Some promising new ideas of relevance to distributed sensor-nets are emerging from the field of source localization in multimedia applications [5]. There has also been a rapidly growing interest in near-real-time remote detection and localization of underwater threats using information provided by dynamically evolving sensor networks. This interest has been driven by the requirement to improve detection performance against much stealthier targets using ever larger distributed sensor arrays under a variety of operational and environmental conditions. Figure 1 illustrates a typical distributed sensor network employed for submerged threat detection. The sensor matrix is comprised of randomly placed GPS-capable sonobuoys. The buoys are passive omnidirectional sensors that provide sound pressure measurements of the ambient conditions and of the signal emitted/reflected from the target. A self-localizing sonobuoy field provides a unique mode of underwater target detection in terms of its deployment flexibility, signal acquisition speed, focused ranging, and capability for net-centric information fusion. Once the buoys are placed, the aircraft monitors their transmissions and processes the data to detect, classify, and localize the threat. However, demanding calculations need to be performed to achieve source localization, and the computational complexity is known to increase significantly with the size of the sensor array. This increase in complexity may be attributed, for example, to the increasing number of sensor pairs for which correlation functions have to be computed for TDOA estimation. In fact, the development and deployment of acoustic sensors are considered to be less challenging than identifying and implementing the appropriate signal processing algorithms and computing hardware that do not stress the limited power budget of distributed sensor networks. Without the simplifying assumption of regularly placed sensors, a substantial processing power requirement is necessary that cannot readily be met with standard, off-the-shelf computing hardware.

The Center for Engineering Science Advanced Research (CESAR) at the Oak Ridge National Laboratory is involved in the development and demonstration of exciting unconventional technologies for Distributed Sensor Signal (DSS) processing. The CESAR efforts in the area of DSS processing are driven by the emergence of powerful new processors such as the IBM CELL [9], and the EnLight processing platform recently introduced by Lenslet Inc. The latter, a tera-scale digital optical-core device, is optimized for array operations, which it performs in fixed-point arithmetic at 8-bit precision (per clock cycle). Its peak performance is at least two orders of magnitude faster than the fastest Digital Signal Processor (DSP) available today. The primary objective of this article is to introduce this revolutionary new processor, and to illustrate the utilization of the hardware on a typical algorithm that might be useful in distributed sensor networks. For illustrative purposes, we consider a methodology for locating underwater threat sources from uncertain sensor data, which assumes the availability of wavefront TDOA measurements at each array element of a distributed sensor network. A novel paradigm for implementing the TDOA calculation on an EnLight device is also discussed. The specific goals of this proof-of-concept effort were to demonstrate the ability to achieve required accuracy in the computations and to quantify the speedup achieved per EnLight processor as compared to a leading-edge conventional processor (Intel-Xeon or DSP). A successful demonstration of such ultra-fast signal processing capability will enable the design of building blocks for other processing-heavy distributed sensor applications such as underwater communication and large array beamforming.

This paper begins with a presentation of the key concepts of threat-detection algorithms such as TDOA estimation via sensor data correlation in both time and frequency domains. A brief overview of the EnLight device is also presented along with the above mentioned fundamental concepts. Next, the implementation of TDOA calculations on the EnLight platform is presented with the aid of numerical simulation and actual optical hardware runs. The paper concludes by highlighting the major accomplishments of this research in terms of computational speedup and numerical accuracy achieved via the deployment of optical processing technology in a distributed sensor framework. This paper omits discussions of the statistical nature and hypothesis testing associated with target detection decision. The theory assumes that the received signals are cross-correlated for an estimation of the TDOA which provides a starting point for target-tracking in time, velocity, and space. The algorithm is designed for a single sound source localization using a distributed array of acoustic sensors. Conventional TDOA estimation procedures are used. The major focus of this paper is the time-domain implementation of TDOA estimation although the frequency domain analysis is briefly discussed. The frequency domain counterpart of the analysis, complete with matched filter bank simulation for active sonar platforms detecting both target range and velocity via Doppler-sensitive waveform synthesis and generation, is presented in previous publications by the authors [10, 11]. A shorter version of this paper appeared in [12].

#### 2. Technical Background

##### 2.1. Source Localization in a Moving Sensor Field

Locating/tracking an acoustic target involves the estimation of mutual time delays between the direct-path wavefront arrivals at the sensors. Using an array of multiple sensors, the TDOAs of the received signals are measured. The TDOAs are proportional to the differences in sensor-source range, called range differences. In order to reduce analytical and computational complexity, it is common practice to make a number of critical assumptions for TDOA calculations. Far-field geometry is usually assumed concerning the location of the target, which is justified by the nominally small aperture of the sensor array. This, in turn, allows the use of the plane wave approximation in the design of TDOA algorithms. For intrasensor spacing, a regular grid is considered with a grid resolution in excess of of the target transmission, thus limiting the localization of sources emitting at higher frequencies. In a moving sensor field as depicted in Figure 1, where each individual sensor is subject to random motion, such design assumptions are no longer valid. For dynamically evolving distributed sensor-nets, the sensors may have arbitrary spacing between them, and the aperture of the distributed array may be comparable to the distance to the source. Several acoustic source localization methodologies based on TDOA estimation in distributed sensor-nets are available [13–15]. In [14], an estimate for the source location is found given the TDOAs and the distributed sensor positions using Maximum Likelihood (ML) procedures. The algorithm of Ajdler et al., as presented in [14], consists of two steps. In the first step, TDOAs are estimated and in the second step ML estimation for the source position is performed. After evaluating the Cramer-Rao bound on the variance of the location estimation, and comparison with simulation and experimental results, Ajdler et al. demonstrated that a general purpose distributed computing platform presents an attractive alternative to conventional rigid-geometry sensor networks. In an alternative strategy to the ML method, an attempt is made to directly obtain a closed-form solution of the source location [16]. Conventionally, the source location is estimated from the intersection of a set of hyperboloids defined by the range difference measurements and the known sensor locations. The conventional methodologies for the emitter location problem usually include iterative least squares and/or ML estimates as described above. However, closed-form noniterative solutions can be derived that are usually less computationally burdensome than iterative least squares or ML methods. Recently reported results indicate that excellent accuracy can be achieved under minimal operational constraints of sensor noncollinearity using this paradigm [17]. Source localization algorithms based on maximizing the Steered Response Power (SRP) of an array and different variations of SRP such as the SRP-PHAT (PHAse Transform), where a phase transform weight function is used to prefilter noise and reverberation interference, also deserve to be mentioned [18].

Explicitly accounting for uncertainties in model parameters and sensor measurements has been found critical in many areas of science and engineering. Here, the source localization problem could be addressed by adapting the recently developed Nonlinear Optimization Generalized Adjustments (NOGA) methodology [19] that has proven to be highly successful in modeling and uncertainty analysis of complex nonlinear systems. The novelty of the NOGA methodology for threat source localization resides in the fact that it enables simultaneous estimation of uncertain TDOAs and the target location. In order to simultaneously estimate the TDOAs and the threat source coordinates, a Lagrangian optimization of a generalized Bayesian loss function is carried out that simultaneously minimizes the differences between (i) the best estimate responses and the sensor based responses and (ii) the best estimates and the calculated parameters. It is important to note that the NOGA methodology is entirely based on matrix-matrix or matrix-vector multiplication operations. This makes it ideally suited for implementation on array processors such as the EnLight platforms to be described in the following subsection. It is interesting to observe (Figure 2) that most methodologies mentioned above require, as a necessary first step, accurate estimates of TDOAs for each combination of sensor/target to be obtained. Thus, for this proof-of-concept demonstration, effort has focused on TDOA computations.

A signal emanating from a remote source is attenuated and corrupted by noise as it travels through the propagation medium. Signal is received as and at two spatially distributed sensors. The received signals can be mathematically modeled as Here the signal and noises and are assumed to be uncorrelated and is the attenuation constant. In distributed sensor networks, it is of interest to estimate the delay, . The arrival angle of signal relative to the sensor axis may be determined from the time delay [20]. One common method of determining the time delay is to compute the cross-correlation function where denotes expectation. The argument that maximizes (2) provides an estimate of time delay. Because of finite observation time, however, can only be estimated. For example, an estimate of the correlation for ergodic processes is given by [21] It is also possible to extract the time domain function from its frequency domain counterpart, the cross power spectral density . The cross-correlation between and is related to the cross power spectral density by the following well-known equation [13]: For some applications, it may be necessary to include a frequency weighing filter in the above equation for noise cancelation. In practice, an estimate of can be obtained to yield an estimate of . This is of interest, because can be computed very fast by the optical-core processor introduced in the sequel. For the purpose of this research, a Generalized Cross Correlation (GCC) method in the frequency domain was implemented on the EnLight device. A time domain analysis, calculating the correlation function directly from the sliding sum of the discrete-time sampled data sequences and , was also implemented. These results are presented and discussed in the following sections.

##### 2.2. EnLight Optical-Core Processor

Research efforts at Oak Ridge National Laboratory include the feasibility demonstration of high-precision computations for grand challenge scientific problems using the novel, Lenslet-developed, processing platform. is a small factor signal-processing chip () with an optical core. The optical core performs the Matrix-Vector Multiplications (MVM), where the nominal matrix size is . The system clock is . At each clock cycle, multiply-and-add Operations Per Second (OPS) are carried out, which yields a peak performance of trillion operations per second (or TeraOPS). The architecture of such a device provides a strong rationale for using it in matrix-based applications. Due to the inherent parallelism of the architecture, the computational speed increases with the scale of the problem. The scaling penalty of the optical chip is relatively small compared to standard DSP electronics. The TDOA algorithm discussed in this paper was implemented on both the existing prototype hardware and the scaled-up simulator. The prototype board is a proof-of-concept demonstration hardware for the optical processor technology with a reduced size optical core. The hardware is in the development process while the simulator provides the opportunity to examine DSS implementation on this faster platform. Subsequent demonstrations will be carried out on the implementation platform. has an operating clock of . The optical core has input channels, comprised from vertical cavity surface emitting lasers that are configured in groups of per channel. The size of the active matrix is , which is embedded in a larger Multiple Quantum Well (MQW) spatial light modulator of size . Sixty-four light detectors, integrated with an array of analog-to-digital converters, comprise the output channels. The optical core performs the MVM function at the rate of Giga operations per second. Each of the 64 data components in the input and output channels has an 8-bit accuracy, which results in a data stream of bits/s = 30.7 Giga bits per second. Figure 3 shows the prototype board.

#### 3. Numerical Simulation

In mobile target detection schemes, such as active sonar systems, the accurate estimation of TDOA by filtering through severely noisy data is crucial for tracking and target parameter (such as velocity) estimation. To benchmark the EnLight performance, three computer codes were written, one using the Intel Visual FORTRAN-95 compiler, one using the simulator, and the other in MATLAB. The former was needed to enable the fastest possible execution on an Intel IA-32 dual Xeon processor system and to serve as a benchmark for numerical accuracy. The MATLAB code readily interfaces with the software of the simulator, which is used to design the actual algorithm that either runs on the existing hardware platform or is used to project the scaled performance for the . In that framework, a number of operational simplifications are made. In particular, the following is assumed: only a single target is present during the TDOA estimation process, the same speed of sound is experienced at each sensor location, each sonobuoy position is known exactly (via GPS) as it drifts, and the measurement errors for TDOAs are zero-mean Gaussian and independent for each sonobuoy. For the TDOA calculation, a set of synthetic data was generated. The sensor-net comprises 10 sonobuoys. Figure 4 shows the projections on the and the planes of the 10 sensor locations (in red) and the target position (in blue) used to generate the synthetic data for the TDOA estimation process. It is assumed that only 7 sensors are able to detect the signal emanating from the target. The issue of detection accuracy as a function of the number of sensors is not considered in this article. It is assumed that both the optical processor and the conventional processor use the same number of sensors, and therefore have to calculate the same number of correlations. Moreover, the issue of whether correlations for all sensor pairs or only for a selected subset should be used is not considered as that issue would face both processors. However, as the optical processor is so much faster (as demonstrated in subsequent sections), the user would have the option of considering (if warranted) a larger number of sensors without including a time-penalty compared to the conventional processor.

**(a)**

**(b)**

For assessing the accuracy of the EnLight computations, a very simple model is considered. It is assumed that the target emits a periodic pulsed signal with unit nominal amplitude. Pulse duration is 1 SI (Sample Interval) and interpulse period is 25 SIs. The size of one sampling interval is 0.08 seconds. Noise and interference are taken as Gaussian processes with varying power levels (typically up to unity). Each sensor stores sequences of measured signal samples. Sequence lengths can range from 1 K to 80 K samples. The signature from the threat source becomes harder to distinguish as the noise and interference level rises. This contributes to the rationale for using correlation techniques in the source localization process.

##### 3.1. Numerical Simulation via Frequency Domain Analysis

The simulation comprised of two approaches. For the first scenario, calculations were done in the frequency domain and the cross-power spectrum for each pair of sensors was computed from the corresponding finite-length data sequences following the methodology described in (4). Cross Correlations (CC) were calculated in terms of the inverse Fourier transform of the cross-power spectra. The maximum of each CC provided an estimate of the associated TDOA. The required algorithms were implemented both in 64-bit Visual FORTRAN and at 8-bit precision for the simulator. It was assumed that only 7 of the 10 sensors were able to detect a signal emanating from the acoustic source. Since synthetic data were available, the exact results can be calculated from the definition of the TDOAs according to the relation Here , , and are the spatial coordinates of sensor , sensor , and the source, respectively. The quantity is the sonic speed (assumed to be identical at all sensor locations). Calculations were carried out using Intel Visual FORTRAN in 64-bit precision. In Figures 5 and 6, the corresponding TDOA values (5) are colored in blue. Sensor pairs are ordered lexicographically on the ordinates, that is, . Next, the TDOAs were estimated from noise-corrupted data samples collected at each sensor. The correlations were calculated in terms of Fourier transforms and the computations were again carried out using 64-bit Intel Visual FORTRAN. The values of the corresponding TDOAs (4) are colored in brown in Figures 5(a) and 6(a). Next, the distributed sensor data processing was implemented on the simulator (4). The TDOA values obtained from the simulator are colored in yellow in Figures 5(b) and 6(b). For benchmark purposes, two sets of data were used. Each set corresponds to a different SNR level. These levels were selected to show the break-point of correct TDOA estimation for signals buried in ever stronger noise, when calculations are performed in high precision (floating point). This also illustrates the occurrence of potential additional discrepancies introduced by the fixed point limited precision EnLight architecture. As observed in Figures 5(a) and 5(b), both the EnLight simulator and the high-precision visual FORTRAN computations from sensor data produce TDOA estimates that are identical to the exact model results for . Similar quality results were obtained for all sets of equal or higher SNR, and for sequence lengths of at least 2 K samples. Next a target signal embedded in noise at was considered. Figures 6(a) and 6(b) illustrate the emergence of discrepancies in the calculated values of correlation peaks due to the increased noise level. The TDOA for the sensor pair (2,7) is estimated incorrectly (wrong correlation peak selected as result of noise). Figure 6(b) shows that two discrepancies appear in the EnLight computations at SNR. The TDOA discrepancy for the sensor pair (2,7) corresponds to the one noted in Figure 6(a) for the 64-bit Visual Fortran calculations. Here another error (peak misclassification) is introduced for the sensor pair (4,5). It is a direct consequence of the limited precision used in EnLight. Although the overall quality of the results is exceptional, the items discussed above do provide some indication of the slight limitations in precision exhibited by the EnLight processor.

**(a)**

**(b)**

**(a)**

**(b)**

##### 3.2. Numerical Simulation via Time Domain Analysis

For the time domain analysis, the cross-correlation , for two discrete-time sequences and ] (each of length ) of sensor data, is calculated as where . The correlation values obtained from the above equation can be divided by the factor to obtain the estimated mean lagged product [22]. The correlation function was calculated for and ] sequences, both with length and heavily corrupted by zero-mean Gaussian noise of . A 128 shift cross-correlation was calculated in MATLAB. Therefore, for the current example, , where . These calculations were also implemented on the actual optical hardware and compared with the MATLAB simulation. Some loss of accuracy is evident due to conversion to 8-bit fixed-point representation in . However, the same values of the TDOAs, as identified by the cross-correlation peaks, were obtained as the MATLAB simulations, even in the presence of significant noise signal. The hardware implementation scheme, experimental results, and simulation results from MATLAB are presented in the next section and in Figures 7, 9(a), and 9(b).

#### 4. Hardware Implementation

The EnLight processor is ideal for implementing large time series correlation calculations in terms of matrix-vector multiplication operations. The processor works as a matrix-vector multiplier in which a complete MVM operation is performed for each machine cycle (). Moreover, a new vector can be presented for multiplication at every machine cycle. For cases where a new vector is multiplied by the same matrix, there is no Input/Output (IO) communication latency in the processing time. Since a IO time is currently needed to reload an entire matrix memory, there is a strong incentive to avoid algorithm constructs where this would have to be done often, and would thereby create an imbalance between IO and core computation. However, changing the entire matrix for every multiply operation would be an extremely inefficient and relatively unlikely event. Therefore, the matrix is prebuffered or loaded onto the spatial light modulator (“local memory") in order to achieve the required processing speed. The algorithms employed take this into account. The particular scheme for correlation calculation on the EnLight platform depends on the length of the two time series and the maximum correlation shift to be calculated. The loading scheme for the matrix memory and the vector register needs to be modified according to the specifics of the data sets to be manipulated. A detailed description of the hardware loading scheme for a correlation calculation of is presented in Figure 7. As shown in Figure 7, the initial step in the calculation is to build a matrix from time series where the sequence length . Next a matrix is built from time series where each row is shifted to the left by one element with respect to the previous row. The end elements are padded with zeros. This scheme is followed for the first 128 rows, as a correlation for maximum shift of 128 is performed for this example. Rows 129–256 are padded with zeros. Next, is partitioned into four matrices each with dimension as shown in Figure 7. After the matrices and are constructed, they are loaded into the optical hardware. First the submatrix is loaded into the EnLight matrix memory. Then the first row of matrix is loaded into the vector register. Matrix-vector multiplication is performed as . Steps 5-6 are repeated three more times and the products are added at the end to produce the 128 shift correlation. For the example in hand, the data sequence length is 1024, EnLight matrix size is , and the vector register size is . Four machine cycles are needed to implement the calculations (). Each matrix-vector multiplication in the optical core takes . With one processing node, a total of is required to complete the entire 128-shift correlation function. If multiple processing nodes are used, then this time is further reduced. The reduced computational complexity of the EnLight processor arises from the fundamental innovation enabled by optics, namely, that an MVM operation, the conventional complexity of which is order (matrix dimension ), can now be performed in order 1. That is, the processor performs a matrix-vector multiplication in a single clock cycle. This is true if the data sequence fits within the matrix memory and vector register. Otherwise more machine cycle is needed (4 in this example). As is evidenced by the previous example, one has to be somewhat aware of hardware architecture while programming in the EnLight device. The optimization of the loading schemes for the matrix memory and vector registers, as dictated by the details of the algorithm, is also another area of intellectually stimulating research. The signal processing flow diagram of Figure 8 outlines the hierarchical structure of software interfaces with the EnLight processing board, where the higher level programming languages such as FORTRAN, C, or MATLAB (current implementation) generate Hardware Description Language (HDL) files and bit-streams via the use of Xilinx Sysgen blocks of the MATLAB/Simulink module to program the FPGAs that access the optical core. As shown in Figure 9, excellent results were obtained using simple data-scaling procedures, without need to invoke (at this point) available [23], more sophisticated techniques for high-accuracy computation with low-precision devices.

**(a)**

**(b)**

#### 5. Results and Discussion

The correlation functions were calculated for each sensor pair in the time domain and implemented on the hardware in order to demonstrate the loading scheme discussed in the previous section (also illustrated in Figure 7). As the device does not exist yet, the actual hardware calculations were performed on the prototype board. The extension of the loading scheme to the board is straight-forward but more machine cycles ( times as many) are needed to perform the same calculations. Figures 9(a) and 9(b) compare the MATLAB simulations with the hardware runs. As can be seen, the numerical accuracy (with respect to the correct locations of the cross-correlation peaks) of the hardware runs compares very favorably with the high precision MATLAB simulations. The red plots represent hardware runs and the blue plots represent MATLAB simulations. The axes are expanded for each plot for better visualization of the correlation peaks. Some loss of accuracy in the magnitudes of the correlation functions is evident due to the conversion to an 8-bit precision scheme. This loss of accuracy is due to quantization. However, the locations of the correlation peaks coincide with the MATLAB results for , , , , , and . The simulation and hardware data sets were further compared by calculating the percent difference in the magnitudes of the cross-correlation function as . The values were calculated for the cross-correlation peaks that identify the estimated time delay . Table 1 lists the various values. As can be seen, the values range from () to (). For some applications, these deviations in numerical values may be considered too high. There are many applications where inherent higher numerical precision is needed. These include computation involving relative orientations of objects undergoing multiple translations and rotations, the Gauss method in Linear Algebra, multiscale problems, and so forth. However, for the benchmark source localization problem discussed in this paper, the absolute magnitudes of the cross-correlation functions are not important for the accuracy of TDOA estimation. It is the locations of the cross-correlation maxima and the relative ratios of the magnitudes of the maxima that are crucial for the determination of the quantity . The limited precision EnLight optical-core processor correctly identifies the TDOA values as does the high precision MATLAB simulation. In order to take advantage of the processing speed of the optical-core processor for DSS applications, one has to be aware of the device architecture and its limitations. The algorithm also needs to be adapted to circumvent the device limitations. One such circumventing technique is to trade higher precision (in bits) for added clock cycles of the processor [24]. In [24], the authors present schemes to enhance the bit resolution of a charge domain device MVM processor by storing each bit of each matrix element as a separate CCD (Charge Coupled Device) charge packet. The bits of each input vector are separately multiplied by each bit of each matrix element in massive parallelism and the resulting products are combined appropriately to synthesize the correct product. It is possible to extend the accuracy of the EnLight calculations by employing similar advanced parallel data processing techniques as discussed in [23]. However, as has been demonstrated, for properly structured algorithms, the 8-bit native accuracy of the optical chip is not an impediment to accurate underwater source localization. On the other hand, the high processing speed of the EnLight platform offers advantages for DSS applications that are unparalleled by conventional processors. The research presented in this paper identify optical-core computing devices as ideal signal processing nodes for distributed sensor networks performing real-time target/threat detection and tracking. Research is also underway to improve the native accuracy of optical-core platforms via improved hardware design. The present work serves as a preliminary investigation of the suitability of optical-core processors as distributed sensor-net compute nodes. In terms of processing speed, benchmark calculations were carried out for Fourier transforms of long-signal sequences. In particular, the execution speed of the was compared to that of a computing platform using dual Intel Xeon processors running at and having 1 GB RAM. The benchmark involved the computation of 32 sets of complex samples transforms. For each sample, both the forward and the inverse Fourier transforms were calculated. The measured times were on the dual Xeon system, versus 1.42 ms on the EnLight. This corresponds to a speedup of over on a per processor base. More details on these computations can be found in [10, 11].

We have presented an example case where the correlation lags (Figures 9(a) and 9(b)) are positive. However, (6) may be easily modified to consider negative lags. We refer the reader to [25] for a discussion of calculating negative lags using the modification of the methodology presented here.

#### 6. Conclusion

Distributed sensors with optical computing platforms as onboard devices present an attractive alternative to conventional dedicated sensor arrays. Future advances in DSS signal processing for improved target detection, tracking, and classification in highly noise-corrupted environments can be realized through the development of distributed systems that combine superior sensors and highly efficient computational nodes consisting of optical-core devices such as the EnLight platform. Emerging classes of distributed sensors for naval target detection algorithms employ data/information fusion of diverse transmit waveforms such as Constant Frequency (CF), Linear Frequency Modulation (LFM), and Sinusoidal Frequency Modulation (SFM) [26]. The fusion scheme is not only more robust, but also preferable in terms of detection probability and estimation accuracy. Fusion algorithms are, however, notoriously computationally intensive and demand the use of highly efficient computational platforms. The numerical simulations and hardware implementation presented in this paper build the first stage in creating a testbed for evaluating the performance of digital, optical-core processors in facilitating DSS signal processing. Preliminary estimates for the TDOA computation, the core of many source localization algorithms, implemented on an EnLight prototype processor indicate a speedup factor of the order of compared to a dual processor Xeon system. Combined with its low power requirements (approximately per processor), the projected tera-scale throughput of optical-core processor technology can alleviate critical signal processing bottlenecks of relevance to many distributed sensor-net programs. This, in turn, should enable the efficient implementation of new classes of algorithms not considered heretofore because of their inherent computational complexity such as asynchronous, multisensor, multitarget tracking under uncertainty of noise characteristics and sensor spatial coordinates. Future research in this area will focus on demonstrating the ability to achieve the required speed and accuracy in probabilistic source localization algorithms through the seamless integration of optical-core processors in distributed sensor networks. Efforts will also be made to further quantify the speedup achieved per processor as compared to leading-edge DSP and multicore processors over a broad range of applications, to determine the scaling properties per processor as a function of the number of sensors, and to characterize the SNR gain and detection improvement as functions of various sensor network parameters such as size and geometry.

#### Acknowledgments

The authors acknowledge helpful discussions with Michael Wardlaw (Office of Naval Research), Aviram Sariel (Lenslet), Shimon Levit (Weizmann Institute and Lenslet), and Jeffrey Vetter (ORNL). Primary funding for this work was provided by the Office of Naval Research. Additional support was provided by the ORNL Laboratory Directed Research and Development (LDRD) program. Oak Ridge National Laboratory is managed by UT-Battelle, LLC for the US Department of Energy under contract number DE-AC05-00OR22725.