VLSI Design

Volume 2018 (2018), Article ID 9269157, 7 pages

https://doi.org/10.1155/2018/9269157

## Efficient Nonrecursive Bit-Parallel Karatsuba Multiplier for a Special Class of Trinomials

Department of Computer Science and Technology, Xinyang Normal University, Nanhu Road 237, Xinyang, Henan, China

Correspondence should be addressed to Yin Li

Received 15 August 2017; Revised 1 December 2017; Accepted 10 December 2017; Published 11 January 2018

Academic Editor: Junqing Sun

Copyright © 2018 Yin Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Recently, we present a novel Mastrovito form of nonrecursive Karatsuba multiplier for all trinomials. Specifically, we found that related Mastrovito matrix is very simple for equally spaced trinomial (EST) combined with classic Karatsuba algorithm (KA), which leads to a highly efficient Karatsuba multiplier. In this paper, we consider a new special class of irreducible trinomial, namely, . Based on a three-term KA and shifted polynomial basis (SPB), a novel bit-parallel multiplier is derived with better space and time complexity. As a main contribution, the proposed multiplier costs about circuit gates of the fastest multipliers, while its time delay matches our former result. To the best of our knowledge, this is the first time that the space complexity bound is reached without increasing the gate delay.

#### 1. Introduction

Efficient hardware implementation of the finite field arithmetic, especially for , is frequently desired in coding theory and public-key cryptosystems [1, 2]. Among these arithmetic operations in , multiplication is of the most importance, as other complicated field operations such as exponentiation and inversion can be carried out by iterative multiplications. Thus, it is necessary to design efficient multiplier.

The field elements are usually represented by a certain basis such as polynomial basis (PB), normal basis (NB), and dual basis (DB). In PB representation, the multiplication consists of multiplying two polynomials and reducing the result modulo an irreducible polynomial. The choice of such an irreducible polynomial is critical to perform the reduction operation efficiently. Irreducible trinomial is one of the most common considerations [3, 4]. During recent years, many bit-parallel multipliers using PB representation are proposed for defined by irreducible trinomials, some of which can be found in [3, 5–8]. The efficiency of the architecture is always evaluated by space and time complexity. The former one is expressed in terms of the number of logic gates (XOR and AND) and the latter one is expressed in terms of the sum of XOR and AND gates delay of the critical path. Among these multipliers, the fastest bit-parallel multipliers nowadays are proposed by Fan and Hasan [9] and Hariri and Reyhani-Masoleh [10]. If is defined by , the corresponding multiplier requires AND and XOR gates with time delay (for good fields, the time delay is ), where and are the circuit delay of one AND gate and one XOR gate, respectively. Except for these multipliers for general trinomials, there are also several proposals for special types of irreducible trinomials [11–13]. These multipliers usually utilize the special form of the trinomial to obtain efficient implementation.

The Karatsuba algorithm (KA) works recursively by breaking down one big multiplication into two or more submultiplications. It is a typical divide-and-conquer algorithm. Please note that the classic KA starts with a way to multiply two 2-term polynomials using three scalar multiplications. Some other variations are also investigated. More details can be found in [14–16]. The KA can be adopted to design subquadratic complexity multiplier [14, 17] or hybrid multiplier [18, 19]. Specially, there is another type of hybrid multiplier, namely, nonrecursive Karatsuba multiplier, which only applies KA once in the polynomial multiplication [8, 20]. These multipliers regularly require 3/4 circuits gates compared to the fastest bit-parallel multipliers, while its time delay increased by a small number of . For example, Elia et al. [8] costs at least two more .

Recently, we proposed a novel nonrecursive Karatsuba multiplier that is based on Mastrovito approach [21]. It is shown that our multiplier only requires one more compared with the fastest multipliers [9, 10]. However, it costs a few more logic gates than Elia's result. Except for the nonrecursive Karatsuba multiplier for general trinomials, Shen and Jin [13] proposed a new Karatsuba multiplier that fully exploited equally spaced trinomial and the classic KA to simplify the modular reduction. Consequently, the space complexity of their scheme matches Elia's result. Meanwhile, the time complexity is , which is roughly equal to the fastest results. Furthermore, we observe that the special case of our multiplier coincides with their scheme. (Here, the trinomial is an equally spaced trinomial.)

In this paper, we explore another special case of our former scheme to obtain even more efficient nonrecursive Karatsuba multipliers. Our main idea is analogous to Shen and Jin [13], where a special type of trinomials and a KA variation are utilized to simplify the structure of corresponding Mastrovito matrix. More explicitly, we consider the irreducible trinomial and a three-term Karatsuba algorithm. It is demonstrated that the corresponding Mastrovito matrix can be simplified further under this condition. The shifted polynomial basis (SPB) [4] is also utilized to reduce the critical path delay further. Consequently, we proposed a bit-parallel multiplier that costs approximately 2/3 circuit gates of the fastest bit-parallel multipliers. On the other hand, the time complexity is , which almost matches the best known results.

The rest of this paper is organized as follows: In Section 2, we briefly review the Mastrovito approach based on SPB representation and some relevant notions. Then we introduce a three-term KA formula and investigate the structure of related Mastrovito matrix. A new bit-parallel multiplier architecture is then proposed in Section 3. Section 4 presents a comparison between the proposed multiplier and some others. Finally, some conclusions are drawn.

#### 2. Preliminary

In this section, we briefly review some related notations and algorithms used throughout this paper. Consider the finite field generated with an irreducible trinomial . Let be a root of and the set constitute a polynomial basis (PB). Therefore, every element of can be represented as a polynomial over of degree less than . The shifted polynomial basis (SPB) is a variation of the polynomial basis, which is obtained by multiplying the set by certain exponentiation of .

*Definition 1 (see [4]). *Let be an integer and the ordered set be a polynomial basis of over . The ordered set is called the shifted polynomial basis with respect to .

Generally speaking, the optimal choice of for irreducible trinomial is equal to the middle term degree or it minus one [4]. In this case, we have and use this denotation thereafter. It follows that the field element can be expressed with respect to SPB as follows: Given two elements of under SPB representation, that is, , , the field multiplication can be performed as Obviously, the product is thus equal to Analogous to ordinary polynomial multiplication, this product can be computed by a matrix-vector multiplication , where express the coefficient vectors of and , and the matrix is given byThe difference between the above matrix and the usual PB case [3] is simply the labels of the lines in left side, which indicate the exponent of indeterminate for each line.

We then reduce the above matrix in view to obtain the field product expressed in SPB representation. The reduced matrix, denoted by** M**, is called Mastrovito matrix. Thus, the SPB field multiplication is rewritten as where denotes the coefficient vector of . The structure of** M** relies on and the modular reduction rule. In this case, we should obey the following reduction rule:However, if we directly reduce the product matrix presented in (4) using the above formulae and perform matrix-vector multiplication, there is no difference between this computation and the general case. In the following section, we will construct a new Mastrovito matrix using a three-term Karatsuba algorithm and describe a highly efficient bit-parallel multiplier.

Moreover, one can check that the irreducible trinomial in the form of exists when where is a nonnegative integer [1]. Although the number of this type of irreducible trinomials is not that abundant, there still exist some trinomials in the range of interest for practical application.

In the end, we also introduce some notations pertaining to matrices and vectors, which are already proposed in [21, 23] and extensively used throughout this paper.(i) represents the th row vector in matrix ;(ii) represents the th column vector in matrix ;(iii) represents the entry with position in matrix .

#### 3. Mastrovito Multiplier Using a Three-Term Karatsuba Algorithm

The Karatsuba algorithm [2] has been applied to improve the efficiency of bit-parallel multiplier for generated by an AOP [20] and a trinomial [8, 13, 21]. It starts with a way to multiply two two-term polynomials using three scalar multiplications which can reduce the space complexity of the multipliers by approximately a factor of 3/4. Besides the classic algorithm, there exist several generalizations with respect to the Karatsuba algorithm [14–16]. Here, we are only focus on a simple Karatsuba algorithm variation, three-term Karatsuba algorithm, which multiplies two three-term polynomials using six scalar multiplications. Given two three-term polynomials in , one can check that

In general, the Mastrovito multiplication utilizing the KA will increase the time complexity. Our former result shows that a Mastrovito multiplier using classic KA costs one more than the fastest ones. However, some literature sources [13] indicated that this result would be further improved for some special cases, for example, the EST . In the following, we will show that for the trinomial , applying the three-term Karatsuba-like formula will also simplify the reduction operation and lead to fast implementation.

Let be an irreducible trinomial and , be two field elements in SPB representation. We partition , into three parts, with each part consisting of bits. In order to simplify related expressions, we denote as . Then, where , , for . Then we multiply and using the three-term Karatsuba-like formula and do the following transformation:where , , , , , . We divide (9) into two parts, and compute each part modulo independently.

##### 3.1. Computation of

We first consider the computation of in detail. Note that actually consists of three different parts: , , (others can be obtained by shift of these parts). When is rewritten as a matrix-vector form, we haveFor simplicity, we do not write the labels of the product matrix here, which indicate the degree of in . Note that these degrees are in the range . In the above expression, represent the coefficient vectors of , respectively. is a zero matrix, () are lower-triangular Toeplitz matrices, and () are upper-triangular Toeplitz matrices. Please note that the matrix on the right side actually contains rows and the product matrix in fact contains rows. However, the last row of the above matrix is** 0**, which does not affect the result. These submatrices have the following form: for . It is easy to check that the products contain the terms of degrees out of the range ; we have to perform the reduction operation for the product matrix in (21). According to Mastrovito scheme, the reduction can be regarded as the construction of product matrices from using the reduction rule in (6). Denoted by , the Mastrovito matrix is related to . Then, we investigate the construction details for this matrix . We have the following proposition.

Proposition 2. *The Mastrovito matrix can be constructed as where *

*Proof. *The proof is analogous with the proof of observation in [21]. Note that the product matrix contains nonzero rows (the last row is a zero vector), each of which corresponds to the polynomial degree from to . It is easy to check that the first rows and the last rows correspond to the degrees that are out of the range . Thus, we need to reduce these rows.

According to the reduction rule in (6), we have to reduce by adding them to the row and and reduce the rows by adding them to the row and . Obviously, the first row here is and the last rows constitute We compare the line number and obtain the result immediately.

Based on Proposition 2, we can compute as follows:

By swapping and combining some overlapped entries, expression (16) now can be rewritten asWe just compute two submatrix-vector multiplications and add them up to obtain . Some tricks can apply to save more logic gates. We mainly utilized the computation strategy presented in [7] and fully considered the overlapped parts of the two above matrices. The computation can be divided into two steps: (i)Perform row-vector products: in parallel. The symbol “” represents only row-vector product related to (or ) and , . For example, represents computing the inner product , for in parallel.(ii)Sum up all the entries of each row using binary XOR tree. Specially, consider some products of each row are zero; we compute the following summations: using binary XOR tree firstly and then add these results together.

*Remarks 3. *It is easy to see that the row-vector products (18) contain all the possible row-vector products in (17). In addition, , , , and are all triangular matrices; one can easily check that each row of both and consists of at most nonzero entries. After the computation of (18) and (19), certain number of XOR gates is required to obtain the final result. Table 1 summarizes the space and time complexity of for all the steps.