Abstract

We describe a modification to a previously published pseudorandom number generator improving security while maintaining high performance. The proposed generator is based on the powers of a word-packed block upper triangular matrix and it is designed to be fast and easy to implement in software since it mainly involves bitwise operations between machine registers and, in our tests, it presents excellent security and statistical characteristics. The modifications include a new, key-derived s-box based nonlinear output filter and improved seeding and extraction mechanisms. This output filter can also be applied to other generators.

1. Introduction

Most cryptographic protocols require unpredictable quantities; these include keys, prime numbers, and challenge values. If these values were predictable, the security of such systems would be compromised.

The most common way to obtain these values is from pseudorandom sequences. Furthermore, a suitable pseudorandom number generator (PRNG) can be used as the key-stream generator in a Vernam stream-cipher scheme (see [1, 2]).

A PRNG is a completely deterministic algorithm; the sequence it generates is a function of its inputs and, unlike a truly random generator, its output can be reproduced. This means that we only need the seed (the input to the PRNG) in order to generate the complete output sequence. The output sequence is much longer than the seed and is, in practice, undistinguishable from a really random sequence.

In security applications we need to produce sequences with large periods, high linear complexities, and good statistical properties and satisfy certain unpredictability criteria. Several statistical suites (see [35]) that can reject a sequence as nonrandom are available; they involve testing the frequency of bit patterns, autocorrelation, or linear complexity among other metrics.

Most available cryptographic PRNGs are based on linear feedback shift registers (LFSRs). They are so popular because they can be easily implemented in hardware; they produce sequences of large periods with good statistical properties and have a simple structure that can be analysed easily. LFSRs by themselves are not secure but they are commonly enhanced with other techniques to improve their cryptographic properties.

We propose a modification to a previously published PRNG (see [6]) based on word-packed block upper triangular matrices (BUTM) over (see [7]) that improves security introducing a key-derived s-box output filter involving a key-schedule algorithm, as well as enhanced seeding and extraction functions.

Similar to the LFSR case, the generator is comprised of two distinct blocks: a generator component that involves linear operations but is proven to generate sequences of a guaranteed period, perfect linear complexity (unlike LFSRs) and excellent statistical properties; and a nonlinear output filtering component that introduces unpredictability and resistance to common attacks.

One of the main contributions of this paper, together with the seeding and extraction algorithms, is the output filter, which is a new design based on the key scheduling algorithm of RC4 (see [8]) to construct four 8 × 32 s-boxes; it can be adapted to other generators, incorporating the concept of key scheduling to PRNGs.

The main key differences and improvements of the proposal regarding the original PRNG (see [6]) are included in the following.

Seed Size. The original PRNG had no minimum seed size specified and a maximum of 3072 bits. The improved PRNG has a minimum of 128 bits of seed specified and up to 2976 (in order to guarantee a valid seed).

Seeding. There was no seeding algorithm specified in the original PRNG; the improved PRNG specifies a seeding algorithm that is avalanche optimized and guarantees a valid (nonzero) seed.

Output Filter. The original PRNG had an output filter based on simple, nonlinear, Boolean functions; the improved PRNG employs an output filter based on the approximation of a 32 × 32 bit s-box through the combination of four key-dependent 8 × 32 bit s-boxes. These are indexed by one 32 bit word from the linear component of the PRNG and further combined with an addition modulo operation with another 32 bit word from the linear component.

Key Scheduling. The original PRNG had no key-scheduling algorithm; the improved PRNG specifies a key scheduling algorithm based on the construction of four 8 × 32 bit key-dependent s-boxes. The fact that these are key-dependent further improves the unpredictability of the proposed output filter.

Blank Rounds. The original PRNG specified no blank rounds; the improved PRNG specifies 64 blank rounds, skipping a total of 98304 bits before producing any output. This further improves the statistical, avalanche, and security properties.

The paper is divided as follows: a description of the generator is given in Section 2.1, packed matrices and related operations are detailed in Section 2.2, while Section 2.3 describes the key scheduling, seeding, and filtering algorithms; we analyse the randomness, avalanche, security, and performance characteristics of the proposal in Section 3; and, finally, some conclusions are given in Section 4.

2. Description

2.1. Generator

Our generator is based on the powers of a BUTM defined over , with prime [9]. As we take the different powers of a BUTM, we have as a result a series of matrices that have interesting randomness properties and very long period. Subsequently, each element of the series (each BUTM) can be processed to obtain an output sequence with good statistical values.

Consider the BUTM defined as whose entries lie in , where   is an matrix, is an matrix, is an matrix, and denotes the zero matrix.

The following result, which forms the basis of the generator, establishes the expression of the different powers of matrix . It also defines matrix in terms of , , and .

Theorem 1. Let be the BUTM given in (1). For any nonnegative integer, , then, where Also, if  , then

In order to generate the pseudorandom bit sequence, matrices and are predetermined and matrix is randomly chosen, becoming the seed of the sequence. Then, taking expression (4) we obtain the following series of matrices:

For each matrix a bit extraction operation is determined obtaining a sequence of bits like

The key for obtaining long periods for the sequence given by (6) is constructing matrices and as companion matrices to primitive polynomials so the period can be guaranteed to be at least

The value of or the sizes of and need not be very large in order to achieve long periods. For more information see [6, 10, 11].

Although any prime can be chosen for the generator, we choose for performance reasons, as highlighted in the following section.

The initial contents of matrices A and B are shown in Algorithms 5 and 6, respectively. These are constructed as companion matrices to primitive polynomials in , as described before, so the period can be guaranteed to be at least

2.2. Packed Matrices

The concept of word-packed matrices is essential for the optimized implementation of the generator over . Word-packed matrices enable adding and multiplying binary matrices just by performing binary operations between processor registers, which is very efficient. For more information see [7].

We define a matrix, whose elements lie in , as a word-packed matrix if one of its dimensions (rows or columns) is packed as word sized groups of bits.

Operations involving word-packed matrices are equivalent to those between conventional matrices since packed matrices are, essentially, just a way of storing the elements of the matrix so that the computations required can be efficiently implemented as binary operations between processor registers. Nevertheless, they present certain peculiarities of their own that must be taken into account.

The addition of word-packed matrices must be done between matrices of the same type, observing packing orientation: rows or columns. Although they could be unpacked and operated normally, the optimal way is to perform a XOR operation word by word.

The product operation between packed matrices is a little more complex than the addition. The product must be done between matrices of different types and with compatible sizes. The multiplicand has to be a row packed matrix, while the matrix corresponding to the multiplier must be packed by columns.

2.2.1. Matrices and Packness

In order to implement the generator over , we can see that taking in expression (4) leaves us with and the following operations have to be performed per iteration:

It can be observed that the computations required on each iteration imply that matrices , , and their powers and , must be kept in memory at the same time. It is also necessary to employ temporary matrices and since the same matrix cannot simultaneously be source and destination.

Taking into account the peculiarities of the product operation between packed matrices, we can identify the following matrices and types:(i)row packed matrix,(ii) row packed matrix,(iii) column packed matrix,(iv) row packed matrix,(v) column packed matrix,(vi), column packed matrices (temporary).

Although the product operation between word-packed matrices generates sparse bits instead of words, these bits can be repacked into the desired format (rows or columns) without a significant performance penalty.

2.2.2. Parameters

Besides determining the format for each matrix, their sizes must also be decided for the correct operation of the implementation.

Several sizes and the number of decimal digits of the corresponding period are shown in Table 1. The option that appears to be more adequate is the , since it is compatible with usual processor register sizes (32 or 64 bits). Moreover, the order obtained is very high.

2.3. Extraction Mechanism

In this section, we describe the different algorithms that perform seeding, bit extraction, and output filtering using a suitable pseudocode notation involving the following operators:  bitwise XOR,&bitwise AND,bitwise right shift,bitwise left shift,%modulus,assignment, + addition modulo ,product.

The generator takes a 128 bit seed (or key) as input and uses it to generate the initial state of block (Section 2.3.2) and the s-boxes (Section 2.3.1) that conform the output filter (Section 2.3.3). It is trivial to modify the generator to accept longer (or shorter) seeds.

2.3.1. S-Box Construction

The four s-boxes are constructed following a similar scheme to the one employed on the RC4 key set-up algorithm (see [8]).

We initialise a temporary s-box, , and clear the final s-boxes, (see Algorithm 1).

for i:= 0  to 255 {
s[i]:= i
S0[i]:= 0
S1[i]:= 0
S2[i]:= 0
S3[i]:= 0
}

Then, for each s-box ( shown as an example), we perform 256 substitutions on involving the key (seed) in the process. This creates an s-box, so it is repeated four times and the results are concatenated to form an s-box (see Algorithm 2).

for c:= 0 to 3 {
j:= 0
for i:= 0 to 255 {
j  :=  (j  +  s[i] + key[i % 16]) % 256
t:= s[i]
s[i]:= s[j]
s[j]:= t
}
for i:= 0 to 255
S0[i]:= (S0[i]<<8) s[i]
}

The first s-box () starts from the initialized , and each subsequent substitution process builds on the previous ones using the current state of . This is repeated for the remaining s-boxes, , constructing a total of 16 s-boxes in the process.

Note that all s-boxes constructed in this way are balanced and have properties similar to purely random, non-key-dependent, s-boxes (see [12] for more information).

2.3.2. Seeding

The generator is seeded using the block. It is configured as a conventional binary matrix which is then converted to the row and column word packed representations needed by the algorithm. The contents of this block are dependent on the 128 bit seed input.

Each row of this matrix is filled up using a 64 bit word, , that uses the output of all four s-boxes generated in the previous step, guaranteeing excellent avalanche characteristics (see Section 3). Only the least significant 48 bits of are actually used.

Rows 0 and 63 are fixed to a specific bit pattern, preventing a full zero block regardless of the key employed (see Algorithm 3).

offs:= 0
for r = 0 to 63 {
if r is 0 or r is 63 {
v:= 0x55AA55AA55AA55AA
}
else {
v:= S0[(r + key[offs  % 16]) % 256]
offs:= offs + 1
v:= v   S1[(offs + key[offs % 16]) % 256]
offs:= offs + 1
v:= v << 32
v:= v    S2[(r + key[offs % 16]) % 256]
offs:= offs + 1
v:= v    S3[(offs + key[offs % 16]) % 256]
offs:= offs + 1
}
for c:= 0 to 47
X[r*48+c]:= (v>>c) % 2
}

The generator is then iterated 64 times without generating any output, further improving avalanche characteristics and overall security.

2.3.3. Filtering and Extraction

The output filtering and extraction mechanism employs two adjacent 32 bit words from matrix ; separating the first word, , into four bytes, to , which serve as indexes into to , respectively. These four 32 bit words are XORed again and the second word from , , is added modulo obtaining a single 32 bit word of output.

This process is repeated to process all 96 32 bit words produced on each iteration of the generator, thus creating 48 32 bit words (1536 bits) of filtered output sequence per iteration.

The array corresponds to the block associated to iteration (see Section 2.2.1) (see Algorithm 4).

offr:= 0
for c:= 0 to 94 increment 2 {
b0:= Xh[c] % 256
b1:= Xh[c]>>8 % 256
b2:= Xh[c]>>16 % 256
b3:= Xh[c]>>24 % 256
Seq[offr]:= (S0[b0]S1[b1]S2[b2]S3[b3])
Seq[offr]:= Seq[offr] + Xh[c  +  1]
offr:= offr + 1
}

u32 A
 := {0x40000000, 0x00000000, 0x20000000, 0x00000000, 0x10000000, 0x00000000,
    0x08000000, 0x00000000, 0x04000000, 0x00000000, 0x02000000, 0x00000000, 0x01000000,
    0x00000000, 0x00800000, 0x00000000, 0x00400000, 0x00000000, 0x00200000, 0x00000000,
    0x00100000, 0x00000000, 0x00080000, 0x00000000, 0x00040000, 0x00000000, 0x00020000,
    0x00000000, 0x00010000, 0x00000000, 0x00008000, 0x00000000, 0x00004000, 0x00000000,
    0x00002000, 0x00000000, 0x00001000, 0x00000000, 0x00000800, 0x00000000, 0x00000400,
    0x00000000, 0x00000200, 0x00000000, 0x00000100, 0x00000000, 0x00000080, 0x00000000,
    0x00000040, 0x00000000, 0x00000020, 0x00000000, 0x00000010, 0x00000000, 0x00000008,
    0x00000000, 0x00000004, 0x00000000, 0x00000002, 0x00000000, 0x00000001, 0x00000000,
    0x00000000, 0x80000000, 0x00000000, 0x40000000, 0x00000000, 0x20000000, 0x00000000,
    0x10000000, 0x00000000, 0x08000000, 0x00000000, 0x04000000, 0x00000000, 0x02000000,
    0x00000000, 0x01000000, 0x00000000, 0x00800000, 0x00000000, 0x00400000, 0x00000000,
    0x00200000, 0x00000000, 0x00100000, 0x00000000, 0x00080000, 0x00000000, 0x00040000,
    0x00000000, 0x00020000, 0x00000000, 0x00010000, 0x00000000, 0x00008000, 0x00000000,
    0x00004000, 0x00000000, 0x00002000, 0x00000000, 0x00001000, 0x00000000, 0x00000800,
    0x00000000, 0x00000400, 0x00000000, 0x00000200, 0x00000000, 0x00000100, 0x00000000,
    0x00000080, 0x00000000, 0x00000040, 0x00000000, 0x00000020, 0x00000000, 0x00000010,
    0x00000000, 0x00000008, 0x00000000, 0x00000004, 0x00000000, 0x00000002, 0x00000000,
    0x00000001, 0xD8000000, 0x00000000};

u32 B
 := {0x40000000, 0x00000000, 0x20000000, 0x00000000, 0x10000000, 0x00000000,
    0x08000000, 0x00000000, 0x04000000, 0x00000000, 0x02000000, 0x00000000, 0x01000000,
    0x00000000, 0x00800000, 0x00000000, 0x00400000, 0x00000000, 0x00200000, 0x00000000,
    0x00100000, 0x00000000, 0x00080000, 0x00000000, 0x00040000, 0x00000000, 0x00020000,
    0x00000000, 0x00010000, 0x00000000, 0x00008000, 0x00000000, 0x00004000, 0x00000000,
    0x00002000, 0x00000000, 0x00001000, 0x00000000, 0x00000800, 0x00000000, 0x00000400,
    0x00000000, 0x00000200, 0x00000000, 0x00000100, 0x00000000, 0x00000080, 0x00000000,
    0x00000040, 0x00000000, 0x00000020, 0x00000000, 0x00000010, 0x00000000, 0x00000008,
    0x00000000, 0x00000004, 0x00000000, 0x00000002, 0x00000000, 0x00000001, 0x00000000,
    0x00000000, 0x80000000, 0x00000000, 0x40000000, 0x00000000, 0x20000000, 0x00000000,
    0x10000000, 0x00000000, 0x08000000, 0x00000000, 0x04000000, 0x00000000, 0x02000000,
    0x00000000, 0x01000000, 0x00000000, 0x00800000, 0x00000000, 0x00400000, 0x00000000,
    0x00200000, 0x00000000, 0x00100000, 0x00000000, 0x00080000, 0x00000000, 0x00040000,
    0x00000000, 0x00020000, 0x00000000, 0x00010000, 0x89400000, 0x00000000};

3. Results

3.1. Randomness Analysis

The resulting generator has been tested successfully with three different statistical suites.

The first one, shown in Table 2, is a custom suite that checks for bit frequency (frequency), bit pair frequency (Serial), 8 bit and 16 bit pattern frequency (poker 8 and poker 16, resp.), runs (contiguous set bits) and gaps (contiguous unset bits) in the sequence (runs), autocorrelation, and linear complexity. We have made available this suite on GitHub (see [3]). The proposed generator passes all the tests successfully.

The second suite is PractRand (see [4]). We have tested our proposed generator for sequences up to 64 GB ( bits) in length, passing all tests successfully as shown in Table 3. This also highlights the potential for generating very long, high quality, binary sequences.

The third suite is comprised of the 160 statistics included in TestU01  1.2.3 BigCrush battery (see [5]). The proposed generator passes all tests from this stringent suite; the extensive report is included as supplementry material.

3.2. Avalanche Analysis

Avalanche is a very important characteristic in cryptographic primitives in order to prevent successful cryptanalysis.

It is defined as the number of output bits that change when a single input bit is flipped, and the expected outcome is that roughly half of the output bits change when this happens.

In this case, we have taken 128 different seeds that differ in a single bit and have measured avalanche in two different points: the block (Figure 1, pre-output-filter) and generated sequence (Figure 2, post-output-filter). The graphs depict the mean difference value, together with one standard deviation above and below the mean value, as well as minimum and maximum values for each 64 bit word in the test sample.

In both cases, the results are excellent with no abnormal values and with most of the population very close to 32, which is the expected value.

3.3. Security
3.3.1. Security Parameters

The proposed generator could be employed as the source for random values, nonces, keys, and so forth, as well as the key stream generator in a Vernam stream cipher.

In this case, it accepts keys of 128 bits in size, but the seeding algorithm (see Section 2.3.2) can be modified to accept longer keys with ease, while maintaining the excellent avalanche characteristics (see Section 3.2).

The maximum capacity of the block is bits, since the first and last rows are fixed to a specific pattern to guarantee a nonzero block.

3.3.2. Linear Generator

The generator component is based on the powers of a 2 × 2 BUTM and has excellent statistical characteristics, passing all tests of stringent suites (see Section 3.1).

It also guarantees a period of at least that is the period of matrix since blocks and are constructed as companion matrices of primitive polynomials in . The actual expected period is that of matrix times the number of bits produced per iteration:

Regarding linear complexity, the generated sequences present the expected linear complexity of a random sequence (half the sequence length, see Table 2). This implies that the generator cannot be reduced to a simple LFSR.

The generator is a linear algorithm, involving exclusively addition and multiplication operations over . This means that the generator by itself is susceptible to linear cryptanalysis where a linear equation system can be set up in order to obtain the seed state (block ) after a certain number of iterations. This type of attack is prevented by the nonlinear filter component.

3.3.3. Nonlinear Filter

Substitution boxes (or s-boxes) are simple substitution tables where an input value is transformed into a different output value; the most common are 8 × 8 bits (byte as input and output) and 8 × 32 bits (byte as input and four byte word as output). They are essential in many cryptosystem designs since they can introduce the required nonlinearity characteristics, making cryptanalysis a more difficult endeavour (see [1215]).

The proposed nonlinear filter (see Sections 2.3.1 and 2.3.3) employs four 8 × 32 key-dependent s-boxes and takes a 32 bit word from the generator as input and combines a second word from the generator with the output.

There have been some known attacks on RC4 (see [1618]); these are mostly based on biases of the first bytes of the output sequence. Although it performs value swapping in similar ways as the key scheduling algorithm of RC4, there are several reasons in which the proposed nonlinear filter differs from it and why we believe these known attacks are not applicable in this case.(i)Static s-boxes. Unlike in RC4, which uses the key scheduling algorithm to determine the initial state of an evolving 8 × 8 s-box, our proposal uses the four 8 × 32 s-boxes to filter the output of the linear generator and the s-boxes do not evolve with sequence generation.(ii)Sequence is not sourced on s-boxes. In our proposal, the source of the output sequence is the linear generator component and not the s-box evolution; therefore, the biases associated with the RC4 key scheduling algorithm do not affect the sequences produced by the proposed PRNG.(iii)Multiple s-boxes. The proposed nonlinear filter employs four different 8 × 32 s-boxes that are combined together to generate the output, unlike in RC4 where a single 8 × 8 s-box is employed.(iv)S-boxes are not output directly. Furthermore, the output of the four 8 × 32 s-boxes is combined with a second 32 bit word from the linear generator using addition modulo . This means that, unlike in RC4, the output sequence is never the direct output of the s-boxes.(v)Blank rounds. The seeding algorithm (see Section 2.3.2) specifies that the generator is iterated 64 times (and therefore bits of output are skipped) avoiding the pitfalls of the early biases in RC4.

The proposed nonlinear filter presents defences against common attacks.(i)Linear cryptanalysis. Key-derived s-boxes behave essentially as random s-boxes and have highly nonlinear characteristics (see [12, 13]), together with the fact that the second word from the generator is added modulo (involving operations over and ) complicate approximating the whole PRNG with a linear equation system.(ii)Differential cryptanalysis. Although it is not a common attack against stream ciphers, the nonlinearity values of the 16 component 8 × 8 s-boxes are within secure bounds (see [12, 13]). Furthermore, the fact that the s-boxes are not fixed but key-dependent makes differential cryptanalysis more difficult.(iii)Correlation and statistical attacks. There are several design features to prevent these type of attacks: the 64 blank rounds in the seeding algorithm, only 48 32 bit words are output per iteration from 96 words possible, one word from the generator is used as the input to the s-boxes while the adjacent one is combined with the output of the s-boxes using a different operation. Moreover, the excellent statistical characteristics of the linear generator component help prevent biases and other problems.

3.4. Performance

We have included a performance benchmark in Table 4. This table includes speed measurements for common algorithms, such as AES with a 256 bit key in output feedback mode (OFB) (see [19]), the RC4 stream cipher (see [8]), and the Salsa20 (see [20]) stream cipher and the HC128 stream cipher (see [21]), together with our proposal.

All algorithm implementations are single thread, pure native compiler-optimized code, without hardware acceleration or processor-specific high-performance instruction set extensions.

Although not as fast as other lighter-weight stream ciphers, the proposed generator achieves acceptable performance and presents valuable characteristics.

One of them is that, being a matrix based generator, it is essentially an embarrisingly parallel problem and, therefore, capable of taking advantage of the multiple core architecture of modern CPUs or vector processing in GPUs.

Another advantage comes from the fact that the whole matrix computations involve binary operations between registers, and matrix sizes can be adjusted to profit from architectures that have bigger register sizes. The proposed design employs a 64 × 48 bit matrix size, taking advantage of current 64 bit architectures, but it is trivial to adjust this size for maximum performance on future architectures if necessary.

4. Conclusions

We have presented a modification on a block matrix PRNG introducing a key-dependent s-box output filter as well as different seeding and initialization algorithms to improve security and nonlinearity. It employs a word packing technique in order to optimise computations over reducing them to fast, native, register bitwise operations. Additionally, the word packing system can directly take advantage of more powerful processors with bigger registers.

The resulting generator produces sequences of great quality in terms of randomness, passing battery tests like PractRand [4] or TestU01 [5], and with a very long period. It is also capable of accepting very large keys if necessary.

The key-dependent s-box output filter is the result of adapting the concept of block cipher key-scheduling to a key-stream generator. It offloads some computations to each seed change cycle while maintaining high performance during sequence generation. An additional benefit is that it multiplies the cost of brute force key search. Although somewhat based on the key scheduling algorithm of RC4 [8], it is a new design generating four 8 × 32 key-dependent s-boxes that do not evolve during sequence generation.

Possible future work includes parallel implementations of the proposed generator on suitable CPU and GPU platforms, assembler optimization, and other performance analyses, as well as the adaptation of the nonlinear filter component to other PRNG and further analysis.

Appendix

For more details see supplementry material.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

Research partially supported by the Spanish MINECO under Project TIN2011-25452.

Supplementary Materials

This supplementary material includes the positive report corresponding to the BigCrush test battery from the stringent TestU01 randomness analysis suite. The report has been edited slightly (large headers and extra blank lines have been removed) for the sake of brevity.

  1. Supplementary Material