Research Article  Open Access
Jérôme Leclère, Cyril Botteron, René Jr. Landry, PierreAndré Farine, "FFT Splitting for Improved FPGABased Acquisition of GNSS Signals", International Journal of Navigation and Observation, vol. 2015, Article ID 765898, 12 pages, 2015. https://doi.org/10.1155/2015/765898
FFT Splitting for Improved FPGABased Acquisition of GNSS Signals
Abstract
With modern global navigation satellite system (GNSS) signals, the FFTbased parallel code search acquisition must handle the frequent sign transitions due to the data or the secondary code. There is a straightforward solution to this problem, which consists in doubling the length of the FFTs, leading to a significant increase of the complexity. The authors already proposed a method to reduce the complexity without impairing the probability of detection. In particular, this led to a 50% memory reduction for an FPGA implementation. In this paper, the authors propose another approach, namely, the splitting of a large FFT into three or five smaller FFTs, providing better performances and higher flexibility. For an FPGA implementation, compared to the previously proposed approach, at the expense of a slight increase of the logic and multiplier resources, the splitting into three and five allows, respectively, a reduction of 40% and 64% of the memory, and of 25% and 37.5% of the processing time. Moreover, with the splitting into three FFTs, the algorithm is applicable for sampling frequencies up to 24.576 MHz for L5 band signals, against 21.846 MHz with the previously proposed algorithm. The algorithm is applied here to the GPS L5 and Galileo E5a, E5b, and E1 signals.
1. Introduction
The question of computing a circular correlation between a local code replica and an incoming code having a bit sign transition is a recurrent problem in global navigation satellite system (GNSS) [1–9]. This problem, already present with the GPS L1 C/A signal, is even more important with the modern GPS and Galileo signals, because of the higher data rate and the presence of the secondary code that imply a potential bit sign transition in each consecutive period of the primary code. This problem appears with the parallel code search (PCS) acquisition method, where a circular correlation is performed through fast Fourier transforms (FFTs) over one period of the primary code [4].
The straightforward solution to this problem is to at least double the length of the sequences, by using more samples of the input signal (to observe at least two code periods and thus to be sure to observe one code period that is free of sign transition) and by zeropadding the local code replica [4, 9]. However, this method implies using longer FFTs, which increases the processing complexity, and at least half of the calculated samples are discarded, making this solution suboptimal.
Note that this straightforward solution is also a solution to other problems. (1) Still with the PCS, the length of the sequences may need to be increased to satisfy a constraint on the FFT length. For example, if the FFT length must be a power of two and if one code period corresponds to 4000 samples, applying directly zeropadding on both incoming and local sequences to get sequences of 4096 samples will result in losses (in general, the zeros will not be inserted at the same position inside the received and local codes; see [10] for more details). To avoid these losses, at least two code periods should be observed in the incoming or local signal. In the previous example, the incoming sequence would thus be composed of 8000 signal samples padded with 192 zeros, whereas the local code would be composed of 4000 samples padded with 4192 zeros. (2) With another acquisition method known as double block zero padding (DBZP) [11, 12], the incoming signal and the local code are cut in small portions and the correlations are computed on these small portions. However, in order to compute correctly the correlation, each portion of the local code is padded with zeros and correlated with two consecutive portions of the incoming signal, as in the straightforward solution.
In this paper, we propose the use of a method to reduce the zeropadding in order to improve the efficiency. The method is based on the fact that an point FFT can be computed using point FFTs [13, Section 13.47]. In particular, we focus on the computation of an point FFT using three point FFTs or five point FFTs. This method is applied in other areas (this is used, e.g., in new mobile phone long term evolution technology, where 512point FFTs are used to compute a 1536point FFT [14]), but it has not yet been applied to the acquisition of GNSS signals.
The rest of the paper is organized as follows. Section 2 only briefly recalls the problem and the current solutions, since these were already detailed in [4]. Section 3 describes the FFT splitting algorithm. Section 4 evaluates the performance of the proposed algorithm for the acquisition of the L5, E5a, E5b, and E1 signals, in terms of complexity and resources for an FPGA implementation. Finally, Section 5 concludes this paper.
2. Problem and Current Solutions
The signal received from a GNSS satellite contains a spreading code, whose beginning is unknown, and an unknown residual carrier frequency due to the Doppler effect. The aim of the acquisition is to determine the code delay and the carrier frequency for all visible satellites [15]. Among different methods, there is the basic serial search (SS) [16], there are onedimensional parallel searches such as the parallel frequency search (PFS) and the PCS [16], [22, Chap. 2], and there are twodimensional parallel searches such as the DBZP method [11, 12] (which is an extension of the PFS) or the methods proposed in [17, 18]. The parallel code search method seems to be among the most attractive methods for the acquisition of modern signals, because it can compensate the code Doppler (especially important with signals having a high chipping rate and with weak signals requiring a long integration) whereas some other methods (such as those based on the PFS) cannot compensate it [19], and onedimensional parallel methods are less complex to implement than twodimensional parallel methods.
As mentioned in the introduction, the PCS performs a circular correlation between a local code replica of the satellite searched and the received signal using FFTs, usually over the primary code period. Then, extra coherent integration or noncoherent integration can be performed, as shown in Figure 1 (note that the FFT of the local code could be precomputed and stored in a memory to use only 2 FFTs in real time). If the Doppler frequency is correctly eliminated and if there is no sign transition, a peak will appear at the correct code delay, as illustrated in Figure 2(a). However, in the presence of a sign transition between two consecutive code periods (due to the data or the secondary code), there may be a loss depending on the position of this sign transition in the received signal, as shown in Figures 2(b) and 2(c). Nevertheless, note that when the correlation peak decreases at the correct Doppler frequency bin due to a sign transition, other peaks will appear with a lower amplitude at incorrect Doppler frequency bins [8].
(a)
(b)
(c)
2.1. Straightforward Solution (3FFT Solution)
As mentioned in the introduction, the straightforward solution to this problem is to at least double the length of the sequences, by using more samples of the input signal and by zeropadding the local code [9], as shown in Figure 3. In this case, we are sure to observe a complete period of the code in the received signal and thus to obtain a peak that is not attenuated. This peak is in the first half of the correlation output, while the second half contains a peak potentially attenuated (depending on the bit sign transitions). Therefore, the second half is usually discarded, which is not optimal. For the rest of this paper, this solution will be called the 3FFT solution. As mentioned earlier, it would be possible to use only 2 FFTs in real time by precomputing the FFT of the local code and storing it in a memory, but this case is not studied here. However, it is relatively easy to adapt the results of Section 4 to this case.
The L5, E5a, and E5b signals have a spreading code period of 1 ms, and the usual minimum sampling frequency considered is 20.46 MHz (assuming a complex sampling since the signal bandwidth is 20.46 MHz) [4], which is twice the chipping rate (note that in practice this exact frequency is never used because of accuracy problems in tracking and positioning [20]). Therefore, in one code period there are at least 20 460 samples, and in two periods there are at least 40 920 samples. The E1 signal has a code period of 4 ms, and when this signal is processed as a BOC(1,1) signal with the PCS, the usual minimum sampling frequency considered is 6.138 MHz (still assuming a complex sampling since the signal bandwidth is 4.092 MHz) [4], which is six times the chipping rate (the ratio between the sampling frequency and the chipping rate is higher for BOC than BPSK modulation because the autocorrelation functions are different [21]). Therefore, in one period of the code there are at least 24 552 samples, and in two periods there are at least 49 104 samples. Thus, for all these signals, if the FFT length must be a power of two, the 3FFT solution will use 65 536point FFTs, for sampling frequencies between 20.46 and 32.768 MHz for the L5, E5a, and E5b signals and between 6.138 and 8.192 MHz for the E1 signal processed as BOC(1,1). If the sampling frequency is in the lower part of these ranges, this solution is not computationally efficient because it implies a lot of zeropadding and discarded samples.
Note that the sampling frequencies given here are those required for the acquisition. These frequencies can be the ones used by the RF frontend, but it is also possible to have an RF frontend using a higher sampling frequency and to have a resampling performed before the acquisition block (see [13], Chapter 10).
2.2. Other Solutions in the Literature
In the literature, there are different propositions to overcome this problem other than the straightforward solution. For example, some authors proposed performing averaging in the Doppler search space [8]; having two steps to first find the code delay and then the Doppler frequency [6]; or generating two local codes, with and without a transition, requiring the computation of five FFTs instead of the usual three [1, 5]. However, since these methods change the essential correlation operation (the result is not only a usual correlation between the incoming and local signals), there is a negative impact on the signaltonoise ratio (SNR) and thus on the probability of detection compared to the 3FFT solution. Another approach was developed in [3], where it is proposed to perform bit synchronization collectively using several satellites.
In a recent paper [4], the authors proposed an algorithm that reduces the complexity by computing less samples that are further discarded, while not affecting the SNR (because the required correlation samples are computed accurately). With this proposed algorithm, the zeropadding could be reduced, and it was shown that it was possible to compute five FFTs of 32 768 points instead of three FFTs of 65 536 points. Later, a slightly different algorithm was found (still computing five FFTs of 32 768 points but removing the need of a memory to store an intermediate result), using slightly less memory resources for an FPGAbased receiver (see [22, Chap. 5]). Next, this solution will be called the 5FFT solution (as for the 3FFT solution, the FFT of the local code could be precomputed, but we do not consider this case here). Note that this solution has a complexity similar to the algorithms proposed in [1, 5], since they all compute five FFTs of same length. The difference is that our algorithm computes exactly the correlation and thus the detection performance is not affected but the algorithm can be applied only for a certain range of sampling frequencies, whereas the algorithms in [1, 5] affect the detection performance but they can be applied for any sampling frequency; therefore, the best solution depends on the context.
In summary, with the 5FFT solution proposed in [22, Chap. 5], the number of operations is reduced by about 20% compared to the 3FFT solution, which is potentially interesting for digital signal processor based receivers; and the required memory is reduced by about 50% for the same processing time for FPGAbased receivers. However, this algorithm can be used only for sampling frequencies between 20.46 and 21.846 MHz for the L5, E5a, and E5b signals and at most of 5.4615 MHz for the E1 signal (this low sampling frequency implies additional losses due to the higher code step in the acquisition) [22, Chap. 5].
In the next section, a different approach is presented, still with the same goal, namely, computing exactly the correlation and reducing the complexity by decreasing the amount of zeropadding.
3. New Solution Based on Smaller Size FFTs
It is known that an FFT can be computed by combining the results of several smaller FFTs, at the expense of an increase in the number of operations (see [13], Section 13.47). This can be useful for our problem because the zeropadding can be reduced, as shown in this section.
3.1. Computing an FFT Using Three Smaller FFTs
The discrete Fourier transform (DFT) of a sequence of points is defined as with . If we divide the input sequence into three consecutive sections of points (assuming that is a multiple of three), we have Then, changing the index to and in the second and third sum, respectively, we obtain Finally, we can group again the terms in the same sum to obtain
Then, if we divide the output sequence into three sequences of points with a downsampling by a factor three (as done in polyphase decomposition), we obtain with . Therefore an point DFT (or FFT) can be computed using three point DFTs (FFTs), as shown in Figure 4, where the combination block includes the elements to compute the operation in parentheses in (5). There are different ways to compute these operations, discussed in Appendix A. The same development can be done for an inverse DFT, providing a symmetric schematic.
A similar development can be done for the computation of an point FFT using five point FFTs (assuming that is a multiple of five). The computation of the combinations for this case is discussed in Appendix B.
3.2. Application to the GNSS Signal Acquisition Problem
Using the developments of the previous section to compute one FFT using three FFTs, it is thus possible to compute a 49 152point FFT using three 16 384point FFTs. Thus, using this approach for the acquisition of the L5, E5a, E5b, and E1 signals, the sequences would need to be zeropadded only up to 49 152 (much less than 65 536). The reduction of the zeropadding directly reduces the complexity of the acquisition, as shown in the next section, with great benefits for its implementation. Next, this solution will be called the 9FFT solution.
This method works if the initial length of the sequence is lower than or equal to 49 152, which corresponds to a maximum sampling frequency of 24.576 MHz for the L5, E5a, and E5b signals and to 6.144 MHz for the E1 signal. Therefore, the range of applicability of this solution is larger than that of the 5FFT solution (21.846 MHz for the L5, E5a, and E5b signals, 5.4615 MHz for the E1 signal [4], [22, Chap. 5]), and the proposed algorithm does not introduce any loss with the E1 signal.
Similarly, it would be possible to compute a 40 960point FFT using five 8192point FFTs. The sequences would need to be zeropadded only up to 40 960; that is, only 20 zeros would be padded. However, this method works if the initial length of the sequence is lower than or equal to 40 960, which corresponds to a maximum sampling frequency of 20.48 MHz for the L5, E5a, and E5b signals and to 5.120 MHz for the E1 signal. Therefore, the range of applicability in this case is very small, and it implies a loss for the E1 signal. Next, this solution will be called the 15FFT solution.
Remember that the sampling frequencies given are at the acquisition stage. The actual sampling frequency used by the RF frontend can be higher if a resampling is performed before the acquisition.
It is clear that it is not interesting to use higher splitting in our case. Indeed, an FFT can also be split into seven smaller FFTs; however, with seven 4096point FFTs we can compute a 28 672point FFT, which is much below the length of our signal (40 960 or 49 104); and with seven 8192point FFTs we can compute a 57344point FFT, which is much higher than the length of our signal; therefore, such a solution would be less efficient (because there would be more zeropadding and because the combinations of the FFTs inputs would become much more complex), and the only advantage is that such solution could be used for higher sampling frequency, for example, up to 28.672 MHz for the L5, E5a, and E5b signals.
4. Comparison of the Solutions
In this section, we compare the 3FFT solution, the 5FFT solution (proposed in [22, Chap. 5]), and the two solutions proposed in this paper: the 9FFT and 15FFT solutions. The schematics of these solutions are given in Figure 5, except for the 15FFT solution, which can be easily obtained looking at the 9FFT implementation. As mentioned previously, for each solution, the FFTs of the local code could be precomputed and stored in a memory to reduce the actual number of FFTs; however, in this paper we do not consider these cases, although the reader can transpose the results easily to these cases.
(a)
(b)
(c)
First, we review the sampling frequency range applicable to the different solutions. Then, we compare the complexity by evaluating the number of operations, which can be interesting for a digital signal processor based receiver. Finally, we compare the resources for an FPGAbased receiver. Remember that the 5FFT solution has a complexity similar to the algorithms proposed in [1, 5] (since they all compute five FFTs of same length); thus, the following results regarding the complexity and the FPGA implementations for the 5FFT solution are valid for these algorithms too.
4.1. Sampling Frequency
As mentioned in the previous sections, each algorithm can be applied for a certain range of sampling frequency. These ranges are summarized in Table 1.
 
A indicates that there is a loss due to a higher code step during the acquisition. 
Note however that these ranges are for an exact computation of the code correlation, and it is possible to use a higher sampling frequency at the cost of a potential loss. For example, it would be possible to use a sampling frequency of 26 MHz with the 9FFT solution. In this case, two code periods would correspond to 52 000 samples, and by removing the last 2848 samples the circular correlation can be performed on the 49 152 remaining samples. This can lead to a little loss, depending on the received code delay. If the delay of the code is between 0 and 23 152 (49 152–26 000), there will be no loss. If the delay is higher, a portion of the local code will not be aligned with the received code, introducing a loss. In the worst case of this example, 2848 samples will not be aligned, leading to a loss of about 1 dB. Of course, the higher the sampling frequency is, the higher the potential loss becomes.
4.2. Complexity Analysis
To evaluate the implementations’ complexity, the number of complex multiplications and additions is computed. For this, we consider that an point FFT requires complex multiplications and complex additions [13].
The summary of the complexity analysis is given in Table 2. For the 9FFT solution, one complex multiplier and seven complex adders are considered for each combination (see Appendix A), and, for the 15FFT solution, four complex multipliers and 24 complex adders are considered for each combination (see Appendix B).
 
The percentage is the reduction compared to the 3FFT solution. 
Therefore, for a digital signal processor based receiver, the 9FFT solution is not significantly better than the 5FFT solution, but the 15FFT is (20% less multiplications and 11% less additions).
4.3. FPGA Implementation and Analysis
To compare the implementations on an FPGA, we compare the resources in terms of logic elements, memory size, and number of multipliers, when the implementations provide approximately the same processing time. For this evaluation, we consider the Altera Stratix V FPGA family [23] and the FFT core provided by Altera [24], with the streaming data flow and 16 bits of resolution for the input, output, and twiddle factor [24].
The resources of the different elements of the implementations have been evaluated separately. The FFTs resources have been estimated after place and route of a design containing one FFT with the Quartus II software 14.0, the adders and multipliers are estimated using models given in [19] (adapting the models from the Stratix III to the Stratix V FPGA family), and the control signals have been neglected. Note that this analysis is an estimation, which can slightly fluctuate depending on the FPGA family, on the complete design inside the FPGA, and on the FFT parameters.
For the 3FFT solution, we consider the implementation given by Figure 5(a). For the other solutions, the direct FPGA implementation of Figures 5(b) and 5(c) would lead to an increase of the resources and to a decrease of the processing time by factors 2 and 4, respectively (which is the ratio in the FFTs length) [22, Chap. 4 and 5]. Therefore, we will consider time multiplexing for the implementations to use fewer FFTs and to obtain approximately the same processing time as the 3FFT FPGA implementation of Figure 5(a).
For the 5FFT solution, the timemultiplexing implementation is given in Figure 6, where is computed first and is computed next. Since the FFTs length is half the FFT length of the 3FFT solution and that the FFTs are used twice to get a complete correlation result, the processing time is the same as with Figure 5(a) (see [22, Chap. 4 and 5] for more details).
For the 9FFT solution, the timemultiplexing implementation is given in Figure 7. First, the three sections of and are accessed in parallel and combined to obtain and , respectively; their circular correlation is computed with the FFTs; and the result is stored in a memory. Then, this is repeated with and , which are multiplied by a complex exponential before the FFTs; the IFFT output is also multiplied by a complex exponential to obtain , also stored in memory. Finally, this operation is repeated with and ; when is available, the memories are read, and , , and are combined to obtain the three sections of the correlation output. With this implementation, since the FFTs length is divided by four compared to the 3FFT solution and the FFTs are used three times to get a complete correlation result, the processing time is 75% the one of Figures 5(a) and 6.
The same approach is considered for the 15FFT solution. In this case, the FFTs length is divided by eight compared to the 3FFT solution, and the FFTs are used five times to get a complete correlation result; therefore, the processing time is 62.5% the one of Figures 5 and 6.
The resources used by each implementation are given in Table 3, in terms of adaptive logic module (ALM, element containing a bit of logic and four registers), memory blocks of 20 Kibit (i.e., of bits, denoted M20K), and digital signal processing (DSP) blocks (containing two 18bit hardware multipliers); see [23] for more details about the elements inside Stratix V FPGAs. The details of the resource’s estimation can be found in Table 4.


In Table 3, which summarizes the resources, it can be verified that the implementation of the 5FFT solution uses less than half the memory of the 3FFT solution (the reduction is 56.6% whereas it was about 50% in [22] because of the different FPGA families considered) and slightly less logic.
With the proposed 9FFT solution, the memory is again reduced, by 39.8% compared to the 5FFT solution; however, the logic is slightly higher (+6.4%, in part because the 16 384point FFT requires more logic than the 32 768point FFT), and the DSP blocks is also higher (+21.1%, because of the multiplication with the complex exponentials and the product between the FFTs outputs). However, the memory saving is more valuable than the increase of the DSP blocks (except in the cases where the overall system implemented into the FPGA strongly limits the number of DSP blocks available). Indeed, the minimum number of DSP blocks in a Stratix V FPGA is 256 [23]; therefore, the 8 additional DSP blocks represent 3.13% of the total available. For the same FPGA, the memory can be between 957 and 2560 M20K blocks. Therefore, the 308 M20K blocks saved (equivalent to about 6 Mbit) represent between 32.2% and 12.0% of the total available. As for the logic, the 531 additional ALMs represent less than 1% of the total available (minimum of 128 300 ALMs). Thus, it can be concluded that the gain of memory is much more valuable than the loss of other elements.
With the proposed 15FFT solution, the memory is further reduced, by 63.9% compared to the 5FFT solution, but the number of DSP blocks is again increased. However, the number of additional DSP blocks represents at most 5.5% of the total available in a Stratix V FPGA, whereas the number of M20K blocks saved represents between 20% and 53% of the total available.
It can be noted that, compared to the 3FFT algorithm, the memory used by the 9FFT and 15FFT algorithms is, respectively, 3.8 and 6.4 times smaller, which represents a huge amount of memory saved, and allows the implementation of the algorithm in the smallest of the Stratix V FPGAs.
5. Conclusions
This paper discussed the problem of the complexity of a circular correlation computed by FFTs when two code periods are used. This problem appears in the parallel code search acquisition of GNSS signals, when we want to manage the sign transitions due to data or secondary code or when we need to zeropad the signals to reach a specific FFT length (e.g., a power of two).
With the structure of new GPS L5 and Galileo E5a, E5b, and E1 BOC(1,1) signals, the straightforward solution to this problem uses three 65 536point FFTs (3FFT solution) for sampling frequencies between 20.46 and 32.768 MHz for the L5, E5a, and E5b signals and between 6.138 and 8.192 MHz for the E1 signal processed as BOC(1,1). However, for sampling frequencies in the lower part of these ranges, this solution is not computationally efficient because it implies a lot of zeropadding.
Previously, in [4, 22], we proposed an alternative algorithm that computes exactly the output samples (i.e., without any approximation), but using five 32 768point FFTs (5FFT solution). The main benefits were a reduction of the number of operations of about 20% and a reduction of the memory resources of about 50% for an FPGA implementation compared to the 3FFT solution, without increasing other resources (logic and multipliers) nor the processing time. However, the 5FFT solution can be used only for sampling frequencies between 20.46 and 21.846 MHz for the L5, E5a, and E5b signals and at most of 5.4615 MHz for the E1 signal (this low sampling frequency implying additional losses due to the higher code step in the acquisition).
In this paper, we proposed two algorithms that still compute exactly the output samples, one using nine 16 384point FFTs (9FFT solution) and one using fifteen 8192point FFTs (15FFT solution), respectively. These algorithms exploit the fact that an FFT can be computed using FFTs of length times smaller. Therefore, using three 16 384point FFTs, it is possible to compute a 49 152point FFT and thus to reduce the zeropadding necessary compared to the previous algorithms. Likewise, with five 8192point FFTs, it is possible to compute a 40 960point FFT.
Compared to the 5FFT solution, the 9FFT solution has a slightly lower number of operations, and for an FPGA implementation the processing time is reduced by 25%, and the memory is reduced by about 40% (which means a reduction of about 75% compared to the straightforward algorithm). In return, there is a small increase of the logic and DSP blocks (+6% and +21%, resp.) but, compared to the resources available in an FPGA, the memory saved represents more than the DSP blocks lost. Moreover, this algorithm can be used for sampling frequencies between 20.46 and 24.576 MHz for the L5, E5a, and E5b signals and between 6.138 and 6.144 MHz for the E1 signal, that is, larger ranges compared to the 5FFT solution. Therefore, this algorithm is overall more efficient and more versatile than the 5FFT solution.
Regarding the 15FFT algorithm, compared to the 5FFT solution, the number of operations is reduced (20% less multiplications and 11% less additions), which is interesting for software defined DSP based receivers. For an FPGA implementation, the processing time is reduced by 37.5% and the memory is reduced by about 64%, for an increase of the logic and DSP blocks, but again this increase is small compared to the resource available in an FPGA. However, the sampling frequency range is more limited since it should be between 20.46 and 20.48 MHz for the L5, E5a, and E5b signals and at most of 5.12 MHz for the E1 signal, implying additional losses due to the higher code step in the acquisition.
In conclusion, two algorithms were proposed that reduce significantly the processing time and the memory resources compared to previously proposed algorithms, one providing better performance than the other but for a limited range of sampling frequency. Therefore, the context will indicate which one is the most interesting. Note also that the mentioned sampling frequencies are those at the acquisition stage; the actual sampling frequency used by the RF frontend can be higher if a sample rate conversion is performed before the acquisition.
Appendices
A. Combinations of the FFT Inputs for the 9FFT Algorithm
When computing an FFT of points using three FFTs of points, we need to perform the following combinations between the sections of the input: with . Using a matrix notation, this gives This section details different ways to compute these combinations and discusses their FPGA implementation.
A.1. Methods to Reduce the Number of Multipliers
The direct computation of these combinations implies 4 complex multipliers and 6 complex adders, that is, 16 real multipliers and 20 real adders (assuming that a complex multiplication requires 4 real multiplications and 2 additions).
However, it is possible to exploit the fact that and are multiplied by a complex number and its conjugate. Indeed, denoting and , the real and imaginary part of , respectively, we have Therefore, these two complex products can be computed using only 4 real multipliers (counting the multiplication by ) and 4 real adders. Thus (A.1) can be computed with 8 real multipliers and 20 real adders.
There is a third option to reduce the complexity, by exploiting the fact that and are both multiplied by the same complex number and its conjugate. It is well known that the following operation can be computed as requiring 2 multiplications and 6 additions (not counting the factor ) instead of 4 multiplications and 2 additions. Applying this to our combination, we have which requires only 4 real multiplications (counting the multiplication by ) and 8 real additions. Thus, (A.1) can be computed with 4 real multipliers and 16 real adders.
A.2. FPGA Implementation
First, implementing (A.6) in an FPGA requires only 2 real multipliers instead of 4, because the multiplication by can be performed by a right shift of the bits of the signal. Thus, the combinations of the sections of can be implemented with two real multipliers and 16 real adders.
For the local code replica , two properties can be used to further reduce the resources; it is binary and real. Since is binary, the first sum and difference in (A.6) can be neglected, and the multiplication by can be replaced by a simple change of sign according to the value of the sample (which is equivalent to an adder in terms of resources). Since is real, only 3 real adders are thus needed to compute (A.6). Therefore, the combinations of the sections of can be implemented with 7 real adders only. Note that this is a basic evaluation; for example, the adders used to compute have a much lower resolution than the others since the signals added together are binary, so they will not consume the same amount of resources. Anyway, these adders represent a very little percentage of the total resources used as shown in Table 4.
It is also possible to exploit more the characteristics of the signals, although this would not allow a significant reduction of the resources. Indeed, if the second half of contains only zeros, contains only zeros, and therefore the first sum and difference in (A.6) (neglected in the previous paragraph) are not needed, and the combinations could be simplified by removing one adder for the computation of . Then, if we do not care about the second half of the output , it is not necessary to compute , which would remove four adders.
B. Combinations of the FFT Inputs for the 15FFT Algorithm
When computing an FFT of points using five FFTs of points, we need to perform the following combinations between the sections of the input: with . The direct computation of this equation implies 16 complex multipliers and 20 complex adders, that is, 64 real multipliers and 72 real adders. However, rewriting the equation as follows we find again the pattern present in (A.2). Applying the third option described in Section A.1, (B.2) can be computed with 16 real multipliers and 52 real adders.
Regarding the FPGA implementation, now neither the real nor the imaginary part of the complex exponentials is a rational number. Therefore, 16 real multipliers and 52 real adders are needed for the combinations of sections . The same amount is needed for the combinations to get the sections of (although 8 real adders could be saved since the last two sections of are not wanted).
For , since it is binary and real, only 4 real adders are needed to compute the operation similar as (A.6). Therefore, the combinations of the sections of can be implemented with 28 real adders only.
C. Details of the FPGA Resources
The details of the FPGA resources for the different solutions are given in Table 4.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The authors thank MarcAntoine Fortin for its suggestions that improved this paper.
References
 C. Zhu and X. Fan, “Weak global navigation satellite system signal acquisition with bit transition,” IET Radar, Sonar and Navigation, vol. 9, no. 1, pp. 38–47, 2015. View at: Publisher Site  Google Scholar
 M. Foucras, B. Ekambi, F. Bacard, O. Julien, and C. Macabiau, “Optimal GNSS acquisition parameters when considering bit transitions,” in Proceedings of the IEEE/ION Position, Location and Navigation Symposium (PLANS '14), pp. 804–817, IEEE, Monterey, Calif, USA, May 2014. View at: Publisher Site  Google Scholar
 T. Ren and M. Petovello, “Collective bit synchronization for weak GNSS signals using multiple satellites,” in Proceedings of the IEEE/ION Position, Location and Navigation Symposium (PLANS '14), pp. 547–555, Monterey, Calif, USA, May 2014. View at: Publisher Site  Google Scholar
 J. Leclère, C. Botteron, and P.A. Farine, “Acquisition of modern GNSS signals using a modified parallel codephase search architecture,” Signal Processing, vol. 95, pp. 177–191, 2014. View at: Publisher Site  Google Scholar
 S. Jeon, H. So, G. Kim, C. Kee, K. Kwon, and S. W. Choi, “Analysis of GNSS signal acquisition methods for the bittransition problem for a single code period,” Transactions of the Japan Society for Aeronautical and Space Sciences, vol. 56, no. 1, pp. 31–41, 2013. View at: Publisher Site  Google Scholar
 K. Sun and L. Lo Presti, “Bit sign transition cancellation method for GNSS signal acquisition,” Journal of Navigation, vol. 65, no. 1, pp. 73–97, 2012. View at: Publisher Site  Google Scholar
 D. Borio, “Msequence and secondary code constraints for GNSS signal acquisition,” IEEE Transactions on Aerospace and Electronic Systems, vol. 47, no. 2, pp. 928–945, 2011. View at: Publisher Site  Google Scholar
 L. Lo Presti, X. Zhu, M. Fantino, and P. Mulassano, “GNSS signal acquisition in the presence of sign transition,” IEEE Journal on Selected Topics in Signal Processing, vol. 3, no. 4, pp. 557–570, 2009. View at: Publisher Site  Google Scholar
 D. Borio, M. Fantino, and L. Lo Presti, “The impact of the Galileo signal in space in the acquisition system,” in Satellite Communications and Navigation Systems, E. Del Re and M. Ruggieri, Eds., Signals and Communication Technology, pp. 151–167, Springer, New York, NY, USA, 2008. View at: Publisher Site  Google Scholar
 J. Leclère, C. Botteron, and P.A. Farine, “Resource and performance comparisons for different acquisition methods that can be applied to a VHDLbased GPS receiver in standalone and assisted cases,” in Proceedings of the IEEE/ION Position, Location and Navigation Symposium (PLANS '10), pp. 745–751, Indian Wells, Calif, USA, May 2010. View at: Publisher Site  Google Scholar
 N. I. Ziedan, “Acquisition and fine acquisition of weak GPS L2C and L5 signals under high dynamic conditions for limitedresource applications,” in Proceedings of the 18th International Technical Meeting of the Satellite Division of the Institute of Navigation (ION GNSS '05), pp. 1577–1588, September 2005. View at: Google Scholar
 M. Foucras, O. Julien, C. Macabiau, and B. Ekambi, “A novel computationally efficient Galileo E1 OS acquisition method for GNSS software receiver,” in Proceedings of the 25th International Technical Meeting of the Satellite Division of the Institute of Navigation (ION GNSS '12), pp. 365–383, Nashville, Tenn, USA, September 2012. View at: Google Scholar
 R. G. Lyons, Understanding Digital Signal Processing, PrenticeHall, 3rd edition, 2010.
 C. Yu and M.H. Yen, “Areaefficient 128 to 2048/1536point pipeline FFT processor for LTE and mobile WiMAX systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 9, pp. 1793–1800, 2014. View at: Publisher Site  Google Scholar
 F. van Diggelen, AGPS: Assisted GPS, GNSS, and SBAS, GNSS Technology and Applications Series, Artech House, 2009.
 K. Borre, D. M. Akos, N. Bertelsen, P. Rinder, and S. H. Jensen, A SoftwareDefined GPS and Galileo Receiver: A SingleFrequency Approach, Applied and Numerical Harmonic Analysis, Birkhäuser, Boston, Mass, USA, 2007.
 P. K. Sagiraju, G. V. S. Raju, and D. Akopian, “Fast acquisition implementation for high sensitivity global positioning systems receivers based on joint and reduced space search,” IET Radar, Sonar and Navigation, vol. 2, no. 5, pp. 376–387, 2008. View at: Publisher Site  Google Scholar
 C.W. Chen, S.H. Chen, H.W. Tsao, and W.L. Mao, “Memorybased twodimensionalparallel differential matched filter correlator for global navigation satellite system code acquisition,” IET Radar, Sonar and Navigation, vol. 8, no. 5, pp. 525–535, 2014. View at: Publisher Site  Google Scholar
 J. Leclere, C. Botteron, and P.A. Farine, “Comparison framework of FPGABased GNSS signals acquisition architectures,” IEEE Transactions on Aerospace and Electronic Systems, vol. 49, no. 3, pp. 1497–1518, 2013. View at: Publisher Site  Google Scholar
 D. M. Akos and M. Pini, “Effect of sampling frequency on GNSS receiver performance,” Navigation, vol. 53, no. 2, pp. 85–96, 2006. View at: Google Scholar
 W. De Wilde, J.M. Sleewaegen, A. Simsky et al., “Fast signal acquisition technology for new GPS/Galileo receivers,” in Proceedings of the IEEE/ION Position, Location, and Navigation Symposium (PLANS '06), pp. 1074–1079, IEEE, April 2006. View at: Publisher Site  Google Scholar
 J. Leclère, Resourceefficient parallel acquisition architectures for modernized GNSS signals [Ph.D. thesis], École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, 2014.
 Altera, Stratix V Device Handbook, Altera, San Jose, Calif, USA, 2014.
 Altera, FFT MegaCore Function User Guide, Altera, San Jose, Calif, USA, 2014.
Copyright
Copyright © 2015 Jérôme Leclère et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.