#### Abstract

Interpolation is a useful technique for storage of complex functions on limited memory space: some few sampling values are stored on a memory bank, and the function values in between are calculated by interpolation. This paper presents a programmable Look-Up Table-based interpolator, which uses a reconfigurable nonuniform sampling scheme: the sampled points are not uniformly spaced. Their distribution can also be reconfigured to minimize the approximation error on specific portions of the interpolated function’s domain. Switching from one set of configuration parameters to another set, selected on the fly from a variety of precomputed parameters, and using different sampling schemes allow for the interpolation of a plethora of functions, achieving memory saving and minimum approximation error. As a study case, the proposed interpolator was used as the core of a programmable noise generator—output signals drawn from different Probability Density Functions were produced for testing FPGA implementations of chaotic encryption algorithms. As a result of the proposed method, the interpolation of a specific transformation function on a Gaussian noise generator reduced the memory usage to 2.71% when compared to the traditional uniform sampling scheme method, while keeping the approximation error below a threshold equal to 0.000030518.

#### 1. Introduction

Nowadays, the world is facing a boom on the fusion between telecommunications and information technology. The merging of these two fields spreads over all kinds of information systems, requiring efforts for ensuring the integration among many kinds of organizations [1], from tactical to strategic operations, in different levels of information system interoperability [2]. The ISO/OSI seven-layer model arises as a lighthouse for seeking the interoperability on many different layers of networked solutions [3]. Many standards and protocols arise from this model, including cryptographic ones.

Encryption solutions can be implemented on both software and hardware. Software implementations are more related to the protection of the information itself, while hardware ones can be also used to protect the communication channels [4]. In the case of tactical telecommunication systems, which require both channel and information security, the hardware implementation of such encryption algorithms arises as a better compromise. The need to test the behavior of such systems against different sources of noise and jamming becomes the motivation to implement, on FPGA (Field-Programmable Gate Array), a programmable noise generator.

A Look-Up Table- (LUT-) based interpolation system is the core of the programmable noise generator developed in this work. Using a LUT, complex and otherwise slow calculations can be sped up by storing precomputed values of the function, interpolating the desired values in between, achieving high-speed designs [5]. Look-Up Tables are very common microelectronic blocks for many applications [5–19]. Ba et al. [9] proposed a linearly interpolated LUT predistorter used to mitigate the effects of nonlinear amplifiers. Monga and Bala [10] proposed an algorithm for minimizing the approximation error on multidimensional LUTs where both samples values and distributions are optimized.

Some authors used nonuniform sampling schemes as a solution for minimizing the LUT memory size: Seidner [11] reduced the memory usage on the implementation of a conversion circuit with a LUT scaling sample scheme; Yan and Mämmelä [15] used a nonuniformly segmented interpolation LUT for simulating nonlinear radio frequency power amplifiers; Cavers [16] proposed a systematic way to describe and analyze arbitrary nonuniform LUT sampling schemes as a companding function, which was further improved by Hassani and Kamarei [17] with a LUT segmentation concept; Boumaiza et al. [18] proposed a new companding function for amplifier predistortion with built-in dependence on the nonlinearity of the power amplifier; Dutra et al. [19] used a nonuniform but fixed sampling scheme to minimize the memory size of a LUT-based interpolator designed to represent the Inverse Error Function .

All works previously mentioned used a *fixed*, uniform or not, sampling scheme to characterize a given function or class of functions. The main contribution of this work is to implement on FPGA a LUT-based interpolator system with a sampling scheme that is *not-fixed* (it can be programmed on the fly) and *not-uniformly* distributed (it uses not equally spaced sampling points).

The remaining of the paper is organized as follows: based on the definition of partitions, Section 2 will present the offline calculations performed to define the parameters that configure the proposed programmable LUT-based interpolator. Section 3 will describe the interpolator architecture, including the description of the subsystem that calculates the nonuniformly distributed addresses and the corresponding displacements. An application of the proposed interpolator will be presented on Section 4, where its flexibility will be discussed with the usage of a gamma of different functions , using different *not-fixed* and *not-uniformly* distributed sampling schemes. Section 5 will end this paper with a summary of the achieved results and a flavor of future works.

#### 2. Configuration Parameters

To discuss the determination of the configuration tables for the LUT-based interpolator, discussed hitherto, we will consider a generic function which will have notable values stored on the appropriate tables. To set an example, values that define a set of arbitrary intervals are stored in Table 1. The number of intervals is related to the number of resources used on the FPGA implementation. As a project decision, partitions were used in order to minimize the final approximation error. Although we focus on a specific example, the underlined method is revealed in its generality.

To define the configuration tables for the LUT-based interpolator, we will consider a generic function which will have only notable values stored on appropriate tables. We start by considering the interval to be the function domain and a set of points which will induce the partitions on the domain . Samples are next drawn from each element of partition , where , with a given frequency —the set of sampling points induces the subpartition of the th interval (notice that and ). For each function , the appropriate configuration table is stored—the values stored on the table contain, among other parameters discussed in this section, both the ordinate values given by the set and the corresponding derivate values estimated by (1), where . Both ordinate and derivate values are defined for :

The configuration parameters of the LUT-based interpolator, including the content of the memories that store the ordinates and derivatives, are previously calculated and imported into the FPGA. These parameters are calculated according to a scheme of * not-fixed* and *not-uniformly* distributed partitions, or sampling regions, as exemplified in Table 1. In this table, each partition is defined by the interval and a sampling frequency . For example, in Table 1, the fourth partition is defined on the interval , with a sampling frequency . Note also in Table 1 that .

The interval and sampling frequency of each partition should be chosen in order to allow the representation of with a minimum approximation error. Therefore, higher sampling frequencies should be expected on intervals where changes more abruptly, presenting higher curvature. Based on the data of Table 1, we calculate the configuration parameters of the LUT-based interpolator, as illustrated in Table 2 and explained ahead.

We start the construction of Table 2 by adjusting the partitions limits of Table 1 according to the sampling frequency of each partition. The Precise Inferior Limit (PIL) and the Corrected Superior Limit (CSL) of each partition are calculated by (2) and (3) which use a constant binary decimal point position , the function signal , which outputs the values or according to the signal negative or positive of a given argument (notice that for a null argument, ), and the function round , which calculates the maximum multiple of the argument , less or equal to them the argument (note that the symbol / used on the representation of the function round has no relation to the division operation). Both PIL and CSL are necessary for adjusting the partitions of Table 1 (an empiric project choice) to the corresponding frequencies:

Table 2 also brings three parameters used to select some input bits of the LUT, necessary to calculate the addresses and differences. They are the parameter , calculated by (4) and used to select the more significant bits (MSBs) of input , required on the calculation of the nonuniform spaced addresses; the parameter , calculated by (5) and used to slice the less significant bits (LSBs) of input , also required on the addresses calculation; and parameter , calculated by (6) and used to slice the less significant bits (LSBs) of input , required to calculate the difference between the LUT input and the corresponding stored sampling point. The usage of these parameters will be discussed in Section 3:

Two other important configuration parameters present in Table 2 are the Displacement () and the Address Logic (), calculated by (7) and (8). These two parameters are used in the calculation of nonuniform spaced addresses, as will be presented in more details in Section 3:

The quantities QMR, SMN, EMN, MNM, SMP, and IMN are intermediate variables necessary for the recursive calculation of the Address Logic in (8). They are related, respectively, to the following entities: the Quantity of Memories Required (QMR) on each partition , the Starting Memory Number (SMN) and the Ending Memory Number (EMN) on each partition , the Maximum Number of Memories (MNM) considering that the specific sampling frequency was applied to the entire domain , the Starting Memory Position (SMP) considering that the specific sampling frequency was applied to the entire domain , and the Initial Memory Number (IMN) used on the specific sampling frequency.

The last four configuration parameters are related to the calculation of the sampling points used to define the ordinate and derivate stored values. These parameters are the Sampling Points Start (SPS), Sampling Points Final (SPF), Memory Position Start (MPS), and the Memory Position Final (MPF), calculated by (9), (10), (11), and (12), respectively:

Based on the characterization of the partitions (exemplified in Table 1), the equations described in this section are used to calculate the configuration parameters (exemplified in Table 2) used by the proposed nonuniform LUT-based interpolator. Section 3 is going to discuss the internal structure of this interpolator and how it uses the configuration parameters present in Table 2 to perform its tasks.

#### 3. Interpolator Architecture

The LUT-based interpolator designed in this paper maps a 15-bit wide input , with binary point position , belonging to the domain , using a two’s complement signed fixed-point arithmetic, into a desired output . The LUT-based interpolator can be used with different functions, no matter how wide their domains are. For example, for a domain , where , we have to scale the input from the interval to and neglect the values on the interval .

The proposed LUT-based interpolator uses the Taylor’s approximation described in (13) for interpolating according to the input . The bigger the Taylor’s approximation order, the smaller the approximation error, but there is a trade-off involved: one extra multiplier and one extra RAM block are required every time the approximation order is increased. Therefore, the increment of the Taylor’s approximation order brings one advantage: the reduction of approximation error; and three disadvantages: larger memory space required to store one more derivate order, increased arithmetic resource usage and increased latency due to the cascading of one more multiplier.

The first-order Taylor’s approximation arises as the best compromise between hardware costs and approximation error. It presents the biggest marginal improvement regarding the average approximation error, with the lower hardware cost: one multiplier and two RAM blocks for storing the ordinate and the derivative, which are calculated according to the nonuniform spaced sampling points (abscissas ) by using (9) and (10), as demonstrated in Table 2 (columns and ).

When using a *uniform* sampling scheme, the addresses and differences can be calculated by extracting, respectively, the most (MSB) and less (LSB) significant bits from the input . But in our case, because the values stored inside the RAM blocks come from a *nonuniform* sampling scheme, we have to apply a more complex operation for calculating these values. This task is performed by the specific designed subsystem *Difference_Address*, as can be seen in the schematic top view (Figure 1) of the nonuniform LUT-based interpolator. Figure 1 shows the *Difference_Address* subsystem, two RAM blocks for storing the ordinate and the derivative, a block that multiplies the output of derivative RAM block with the difference , a block that adds this product with the output gotten from ordinate RAM block, and three delay blocks used to synchronize the data flow.

The subsystem *Difference_Address* can be seen in details in Figure 2. It has two outputs and two branches, one for calculating the addresses to be used by the RAM blocks and the other related to the calculation of the Differences . It is directly programmed by the parameters presented in Section 2 and illustrated in Tables 1 and 2. When we change the configuration parameters in accordance with the contents of the ordinate and derivative RAM blocks, we enable the LUT to interpolate different functions, according to different nonuniform sampling schemes. The *Difference_Address* subsystem is composed of nine blocks (six subsystems, two adders, and one binary point forcer), as it will be discussed in the following.

The first subsystem (*Sampled Region and Corrected Superior Limit*) in Figure 2 is configured by the Corrected Superior Limit () parameters calculated by (3) and exemplified in column 3 of Table 2. It senses the input , and outputs a selector signal that identifies the partition where this input belongs. For instance, in the case of using the sampling scheme illustrated by Tables 1 and 2, for , it outputs a selector signal equal to 5, meaning that the input belongs to the partition ).

Based on the selector signal provided by subsystem *Sampled Region and Corrected Superior Limit*, the next two subsystems, Displacement and AddLog, output the values and calculated by (7) and (8). These both values are used to calculate the nonuniform RAM addresses via the two blocks named *Displacement_Adder* and *Add_Log_Adder*. Continuing on the example above, the provided selector signal equals to 5, implying and .

Keeping on the description of the Address branch of Figure 2, we have two subsystems that select a configurable number of bits from their inputs. The first one, named Add_MSB, slices a configurable number of the most significant bits of input . This configurable number of selected bits is defined by the parameter in (4). This output is added with the Displacement value , and a configurable number of its less significant bits, defined by the parameter in (5), is selected by subsystem Add_LSB. Finally, the Address is calculated by adding this value with the parameter Add explained above and calculated by (8).

The output Difference is calculated by the configurable subsystem named Dif_LSB. It is configured by the parameters in (6) and the sampling frequency in Table 1. This subsystem slices a configurable number (defined by parameter ) of the less significant bits of the input and forces the binary point to a fixed position . An exception must be done on the sampling regions where because no interpolation is necessary: all possible values of these regions are mapped one to one to a corresponding , and the differences are always made equal to zero.

The six subsystems discussed above are configured on the fly by the parameters enumerated in the example in Tables 1 and 2. These parameters are stored inside each subsystem by means of 28 memories (22 RAM blocks storing 2 positions each and 6 storing 22 positions). Their contents, as well as the contents of the 512 positions wide ordinate and derivative RAM blocks, can be changed on the fly, what enables this nonuniform LUT-based interpolator to represent different functions, according to different sampling schemes, as it will be seen in Section 4. The reconfiguration time is defined by the depth of the longest RAM block, the ordinate or derivative RAM blocks: the interpolator requires 512 clock cycles for a full reconfiguration.

The presented design was implemented using the softwares Integrated Software Environment (ISE) and System Generator (SysGen) from Xilinx, on a Spartan-3 development kit from Avnet with a XC3S2000-5 FG676 Spartan 3 FPGA. The synthesis details of this realization can be seen on Table 3.

#### 4. Programmable Noise Generator

As a study case, the proposed nonuniform LUT-based interpolator was used as a programmable noise generator able to output noise with different Probability Density Functions (PDFs). A controlled level of approximation error is achieved by using the proposed programmable nonuniform sampling scheme.

A given transformation function is responsible for changing the PDF of a source uniformly distributed noise into a noise with a different and configurable PDF. The configuration parameters presented as an example in Tables 1 and 2 were constructed having in mind the minimization of the approximation error of a Gaussian noise generator. It uses a specific transformation function [20], represented in (14), for transforming a uniform distributed noise into a Gaussian one:

The transformation function has two poles located at the abscissas and , which are characterized by high values of curvature and derivatives. Both the uniformly distributed input signal and the domain of are represented by the interval . The ordinate of this function ideally goes from to , what is expected since the output is an unlimited normally distributed signal.

One advantage of implementing a nonuniform sampling scheme for the interpolation of is the lower RAM space necessary for storing both the ordinates and derivatives, allied to a lower approximation error. As a counterexample, if we use a uniform sampling scheme instead of the proposed nonuniform one, we would face high approximation error around the poles of (14), even using high frequency samplings, as seen in Figure 3. These graphs show the absolute approximation error verified when two different uniform sampling schemes were applied to the whole domain: the upper graph shows the error for , what requires the storage of positions of ordinates plus for derivatives; and the lower graph (observe the zoom on axis) shows the error for the case where , that results in positions for both RAM blocks. The horizontal line in Figure 3 represents a boundary approximation error limit equal to : the input values are 15 bits long, and any error lower than that boundary does not decrease the quality of the interpolation.

**(a)**

**(b)**

If a uniform sampling scheme is used, the only solution to keep the absolute approximation error below this boundary for all abscissas would be to use , what makes the approximation error equal to zero for all possible abscissas . This happens because, in this extreme case, there is not a really interpolation, but a one-to-one mapping of all possible input values . But such linear sampling scheme requires a RAM block with a high depth equal to positions for storing , hard to implement on an FPGA due to the number of bits necessary to represent each stored value.

The solution is to use the proposed nonuniform sampling scheme which stores less ordinates and derivatives for input values around , and more samples near the poles and , where the approximation error is bigger, saving significant amount of memory space (the proposed LUT-based interpolator reserves only 512 positions for each ordinate and derivative RAM blocks). This approach is graphically presented in Figure 4, where you can see the 1st quadrant of —the 3rd quadrant is not displayed since it is symmetric in relation to the origin . The absolute approximation error obtained with this nonuniform approach remains under the boundary limit even near the poles of , as seen in Figure 5.

When the proposed programmable nonuniform LUT-based interpolator is configured to represent (14), according to the sampling scheme of Tables 1 and 2, it works as a Gaussian noise generator: by applying to its input a signal with a uniform PDF (Figure 6(a)), it outputs a signal with Gaussian PDF (Figure 6(b)).

**(a)**

**(b)**

Tables 1 and 2 are just one example of configuration data for the proposed programmable LUT-based interpolator. In this work, 3 different sampling schemes (, , and ) were formulated. The sampling scheme was the one demonstrated in Tables 1 and 2. The abscissas of sampling schemes , , and are plotted in Figure 7. As can be seen in these figures, there are sampling points on scheme , on scheme , and on scheme . As expected, the amount of sampling points is always smaller than the depth (equal to 512) of the two RAM blocks that store the corresponding ordinates and derivatives . Observe that the inclination on these graphs is inversely proportional to the sampling frequency of each partition : the higher the frequency , the bigger the number of abscissas , and the smaller the inclination in Figure 7.

**(a)**

**(b)**

**(c)**

To show the flexibility of the proposed design, the three sampling schemes (, , and ) discussed above were applied to eight different transformation functions, represented by (14) to (21), which gave us a total of 24 different examples for configuring the proposed LUT-based interpolator. These equations were selected as mathematical examples, and they are not related to the generation of noise with a natural response:

Each sampling scheme was designed to minimize the approximation error for abscissa values belonging to the sampling region with higher values. As a matter of fact, high frequencies should be used for regions where presents a strong nonlinear behavior and low frequencies for regions with a linear behavior. For example, the sampling scheme was specifically designed for the Gaussian transformation function (14). It applies high frequencies () near the poles and and low frequencies () near the origin . The approximation error for the three sampling schemes , , and can be seen in Figure 8, in a case of using the displaced Gaussian transformation function (15), and in Figure 9, for the Cubic Function (18) case.

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**

The frequency assignment of the partitions for sampling schemes and is presented in Table 4. The corresponding configuration parameters for these sampling schemes are calculated via (2) to (12) and are presented in Tables 5 and 6. Observe that the sampling limits for sampling scheme are the same as sampling scheme , only the frequencies are distributed differently. But in the sampling scheme , both sampling limits and the frequencies are distributed differently from sampling schemes and , what shows the flexibility for reconfiguring the designed programmable noise generator.

As seen in Table 1, the scheme distributes the sampling frequencies symmetrically to the origin, with the lower frequencies near the origin, as can be graphically seen in Figures 8 and 9 (upper graphs): the semiarcs with bigger diameters (related to ) are located around the origin, in the interval . As seen in Table 4, the schemes and distribute the lower sampling frequencies () on the intervals and , respectively, as can be graphically seen in Figures 8 and 9((b) and (c) graphs, resp.).

The designed programmable noise generator can generate different noise signals by properly filling the ordinate and the derivative RAM blocks (Figure 1) and configuring its internal parameters on the *Difference_Address* subsystem (Figure 2). For example, Figure 10 shows the Probability Density Functions (PDFs) of four different signals produced by the programmable noise generator when: its input is fed with a uniform distributed noise, it is configured with the sampling scheme , and it was configured to interpolate four different functions: the Displaced Error Function (15), the Cubic Function (18), the first (19), and the second (20) Quadratic Function.

**(a)**

**(b)**

**(c)**

**(d)**

#### 5. Conclusion

A programmable Look-Up Table-based interpolator with nonuniform sampling scheme was implemented using a *Avnet* development kit containing an XC3S2000-5 FG676 Xilinx Spartan-3 FPGA. This LUT-based interpolator can be programmed on the fly by loading the proper configuration parameters presented in Section 2, including the ordinate and derivative, inside RAM blocks. The complete reconfiguration takes 512 clocks cycles. When these parameters are changed, they can interpolate different functions, sampled according to different nonuniform sampling schemes. The ability of changing the sampling scheme allows the minimization of both the approximation error and memory space: for instance, the sampling schema (Table 1) applied to (14) was able to keep the approximation error below a threshold of while reducing the memory usage to for a Gaussian noise generator application.

As a study case, the LUT-based interpolator was used as the core of a programmable noise generator able to output signals with different Probability Density Functions (PDFs). The flexibility of this design was proved by interpolating 8 different functions, according to 3 different nonuniform sampling schemes (, , and ) described in Tables 1 and 4, each one defining partitions each characterized by a chosen sampling frequency .

As future work, we recommend the implementation of a programmable nonuniform LUT-based interpolator with a domain not fixed to and where the number of sampling regions can be changed on the fly.