#### Abstract

We investigate Residue Number System (RNS) to binary conversion, which is an important issue concerning the utilization of RNS numbers in Digital Signal Processing (DSP) applications. We propose two new reverse converters for the moduli set . First, we simplify the Chinese Remainder Theorem (CRT) to obtain a reverse converter that uses mod- operations instead of mod- operations required by other state-of-the-art equivalent converters. Next, we further reduce the hardware complexity by making the resulting reverse converter architecture adder based. Two hybrid Cost-Efficient (CE) and Speed-Efficient (SE) reverse converters are proposed. These two hybrid converters are obtained by combining the best state-of-the-art converter with the newly introduced area-delay efficient scheme. The proposed hybrid CE converter outperforms the best state-of-the-art CE converter in terms of delay with similar area cost. Additionally, the proposed hybrid SE converter requires less area cost with smaller delay when compared to the best state-of-the-art equivalent SE converter.

#### 1. Introduction

The presence of carry chains in conventional weighted number systems such as binary or decimal number systems often limits the efficiency of computer arithmetic operations. Residue Number System (RNS) is a number system having an attractive carry-free property, which has proved to be highly useful in many Digital Signal Processing (DSP) applications requiring high-speed computations [1, 2]. RNS also has the following inherent features due to its carry-free property: modularity, parallelism, and fault tolerance. In order not to offset the speed gained in RNS operations, a fast RNS-to-binary converter is required. The complexity as well as the efficiency of RNS to binary converter is determined by the moduli choice and by the conversion algorithm. Many different choices of moduli sets are available for RNS to represent the binary numbers in a certain range. Three moduli sets have been actively investigated, for example, [3], [4], [5], and [6]. The dynamic range of these moduli sets is not sufficient for applications requiring larger dynamic range. Thus, the moduli set , which is the most popular length three moduli set, has been enhanced to [7] and [8], to mention just a few. Generally speaking, RNS-to-binary conversion is either based on the Chinese Remainder Theorem (CRT) [1, 2, 6] or the Mixed Radix Conversion (MRC) [9].

In this paper, we propose two new hybrid Cost Efficient (CE) and Speed Efficient (SE) reverse converters for the moduli set . We obtain the two hybrid converters by combining the converters in [7] with the newly introduced area-delay efficient converters. First, we simplify the CRT to obtain a reverse converter that uses mod- operations instead of mod- operations required by the proposal in [7]. Next, we further reduce the hardware complexity by making the resulting reverse converter architecture adder based. Basically, the newly introduced scheme is made up of two Carry-Save Adders (CSAs) and -bit Carry Propagate Adder (CPA). The proposed CE converter outperforms the one in [7] in terms of delay with similar area cost. Additionally, the proposed SE converter requires less area cost and smaller conversion delay when compared to the one in [7].

The rest of the paper is organised as follows: Section 2 presents the necessary background information. In Section 3, we describe the proposed algorithm. Section 4 presents the hardware realization of the proposed algorithm, and a comparison with the state-of-the-art reverse converters is provided in Section 5. The paper is concluded in Section 6.

#### 2. Background

RNS is defined in terms of a set of relatively prime moduli set such that gcd for , where gcd means the greatest common divisor of and , while , is the dynamic range. The residues of a decimal number can be obtained as , thus can be represented in RNS as , . This representation is unique for any integer . We note here that in this paper we use to denote the mod operation.

For a moduli set with the dynamic range , the residue number can be converted into the decimal number , according to the Chinese Reminder theorem, as follows [10]:

where , , and is the multiplicative inverse of with respect to .

This general scheme can be actually simplified when certain moduli sets of interests like are utilized. For this moduli set efficient converters have been presented in [7]. In [2], the New CRT proposed in [3] is formulated as follows:

where , and is the multiplicative inverse of with respect to . For the moduli set , the following relation, based on the New CRT represented by (2) has been presented in [7]:

Given that the residues () have the following binary representations:

Equation (3) was further simplified to obtain where

Given that the moduli set is desirable, can we obtain a more effective reverse converter when compared to the ones in [7]? In the following section, we present two effective reverse converters for the moduli set by first simplifying (1).

#### 3. Proposed Algorithm

Given the RNS number with respect to the moduli set in the form , the proposed algorithm computes the decimal equivalent of this RNS number based on a further simplification of the well-known traditional CRT. First, we show that the computation of the multiplicative inverses can be eliminated for this moduli set resulting into memoryless reverse converters. Next, we obtain reverse converters that utilize modulo- operations instead of modulo- operations required by the state-of-the-art equivalent converters. We further reduce the hardware complexity by obtaining adder-based reverse converters.

Theorem 1. *Given the moduli set with , the following holds true:
*

*Proof. *If it can be demonstrated that , then is the multiplicative inverse of with respect to . is given by
thus (7) holds true.

In the same way if , then 1 is the multiplicative inverse of with respect to . is given by: , thus (8) holds true.

Again, if , then 1 is the multiplicative inverse of with respect to . is given by
thus (9) holds true.

The following important relations are used in the subsequent theorem: Given the moduli set with , the following holds true:

Theorem 2. *The decimal equivalent of the RNS number with respect to the moduli set in the form is computed as follows:
*

*Proof. *Equation (1) for is given by
By substituting (7), (8), and (9) into (16) we obtain the following:
Substituting (12) in the above equation, we obtain
Equation (18) can be further simplified by using the following lemma presented in [11]:
Applying (19), (18) becomes
Using (14) in (20), we obtain
Applying (19), (21) becomes
Using (13) in (22), we obtain
thus, (15) holds true.

We reduce the hardware complexity by further simplifying (15) using the following properties [7].

Property 1. *Modulo multiplication of a residue number by , where and are positive integers, is equivalent to bit circular left shifting.*

Property 2. *Modulo of a negative number is equivalent to the one’s complement of the number, which is obtained by subtracting the number from .*

*Assumption 1. *The hardware complexity is reduced based on the assumption that always holds true.

Based on the given assumption, , which is -bit binary number can now be represented like an -bit number. Therefore, the residues have binary representations as follow:

Then (15) is further simplified using the following theorem.

Theorem 3. *Provided that holds true, the binary equivalent of the RNS number with respect to the moduli set in the form is computed as follows:
**
where
*

*Proof. *We need to show that (15) can be presented as (25), and that the values of are valid. It should be noted from (15) that and can be represented as

Also, we represent as
Next, we convert each term in to an bit binary number. No manipulation is required for since it is treated like an -bit number and also is already an -bit number. The only term to be manipulated is and this is carried out as follows:

However, if the condition does not hold true, the converter in [7], represented by (5), is utilized.

#### 4. Hardware Realization

The hardware implementations of the proposed reverse converters are based on the combination of (25) and (5). We propose two converters based on this approach, termed “hybrid approach”. The hardware realizations of the proposed schemes are depicted by Figures 2 and 3. In Figure 2 (the proposed hybrid cost-efficient converter), is first input into the system. Given that the Most Significant Bit (MSB) of is , then is compared with “1”. If , (5) is utilized just as outlined in [7] otherwise the hardware realization follows (25). It should be noted that the overhead for comparison is approximately zero as only the MSB of needs to be compared with a 1. The condition holds true only in very few cases. The probability of occurrence of this condition is denoted by , and it can be seen from Figure 1 that approaches zero as increases. This is fully explained in the next section. However, if (which occurs most of the time) holds true, the hardware realization is as follows. The operands , , and in (25) are added using CSA1 producing and , which are in turn added using a one’s complement adder (this is equivalent to a CPA with End Around Carry (EAC)). Suppose that and are, respectively, used to store the results of the and -bit right shifting of the one’s complement adder. Since is a bit number, it can be concatenated with with no computational hardware. The second operand is a -bit number with -bit of zeros. must be converted to a -bit number by appending -bit of zeros to its MSB part. The third operand , which is a -bit number, is also made a -bit number by appending -bit of ones to its MSB part. , , and are all now -bit numbers and are to be added using CSA2 yielding and . It should be noted that bits of the Full Adders (FAs) in CSA2 are reduced to Half Adders (HAs). The final result is supposed to be obtained by a CPA but the final result (obtained by means of simulation) is always the same as inverting . Thus, the final CPA is eliminated. On the other hand, Figure 2 depicts the hardware realization of the proposed hybrid speed efficient converter. Just like the proposed hybrid CE converter, hybrid SE converter is also made up of the same level of CSAs, but the only difference is that two CPAs are utilized in parallel instead of the 1’s complement adders in Figure 2. Consequently, the conversion time is significantly reduced.

*Design Example*

Given the moduli set , where , convert the residue number to binary.

In order to obtain the required binary equivalent, (25) is applied as given in Table 1 where CSA1 represents the first CSA in Theorem 3. The CPA then adds and producing . We then obtain , , and as already expalined above. CSA2 is used to add , , and producing and . It can be seen from Table 1 that . Thus, the final CPA can be eliminated, and consequently the conversion delay and the area cost are significantly reduced.

#### 5. Performance Evaluation

The performances of the proposed converters are evaluated in terms of area cost and conversion delay. Two efficient reverse converters have been proposed. The performance comparisons of these converters and the best state-of-the-art converter in [7] are presented in Tables 2, 3, and 4. From Table 2, it can be seen that the proposed hybrid CE outperforms the CE in [7] in terms of delay with slightly lesser or similar area cost (whenever , the same result otherwise). With the same condition, the proposed hybrid SE converter outperforms the SE converter in [7] in terms of both speed and area. Table 4 shows the occurrence probability and other associated parameters. We note here that in Table 4, the following notations are utilized: is an integer value that determines various dynamic range requirements, is the number of times occurs within the dynamic range of the system, is the probability of occurrence of the condition within the dynamic range of the system ( is computed by ), hybrid CE stands for the delay of the hybrid CE converter, CE [7] stands for the delay of the CE converter in [7], and hybrid SE stands for the delay of the hybrid SE converter while SE [7] stands for the delay of the SE converter in [7]. As shown in Table 4, reduces as the dynamic range increases (i.e., as increases). For example, when , whereas when . This implies, as can also be deduced from Figure 1, that the occurrence probability approaches zero as continues to grow. The delays of hybrid CE and SE converters and that of the CE and SE converters in [7] are depicted by Figure 4. It can be easily seen from Figure 4 that the rate of growth of the conversion delay is comparably very small in the hybrid SE. Another interesting thing to note here is that the proposed CE converter and the SE converter in [7] have nearly equal conversion delay.

Additionally, the proposed hybrid SE converter (whenever ) and the SE converter in [7] are implemented using Xilinx92i FPGA technology for varoius dynamic range requirements (different values of ). The target technology is Xillinx (Xa3s200-4vqg100) FPGA. The performance is evaluated in terms of area (measured in terms of the number of slices) and delay (represents the total gate delay, which is measured in nanoseconds). Table 3 shows the synthesized results for various values of , which show the superiority of our scheme over the one in [7]. Consequently, the RNS-to-binary converters proposed in this paper are better than the ones in [7].

#### 6. Conclusions

In this paper, we proposed two new reverse converters for the moduli set . First, we simplified the traditional CRT to obtain a reverse converter that uses mod- operations instead of mod- operations required by the proposal in [7]. Next, we further reduced the hardware complexity by making the resulting reverse converter architecture adder based. We proposed two hybrid CE and SE converters. In each of the schemes, the converter in [7] is integrated into a newly proposed area-delay efficient scheme. The path to be followed depends on whether (which occurs only in few cases) or . In terms of delay, the two proposed hybrid CE and SE converters require and , respectively, while the CE and SE converters in [7], respectively, require and , where denotes the delay of one full adder and that of a multiplexer, and is the occurrence probability of . The proposed hybrid CE converter outperforms the one in [7] in terms of delay with slightly higher or similar area cost. Additionally, with smaller delay, the proposed hybrid SE converter also requires less area cost when compared to the one in [7].