Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2015, Article ID 626408, 14 pages
http://dx.doi.org/10.1155/2015/626408
Research Article

New Statistical Randomness Tests Based on Length of Runs

1Institute of Applied Mathematics, Middle East Technical University, 06800 Ankara, Turkey
2Mathematics Department, Atılım University, 06836 Ankara, Turkey

Received 27 September 2014; Accepted 17 March 2015

Academic Editor: Anna Vila

Copyright © 2015 Ali Doğanaksoy et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Random sequences and random numbers constitute a necessary part of cryptography. Many cryptographic protocols depend on random values. Randomness is measured by statistical tests and hence security evaluation of a cryptographic algorithm deeply depends on statistical randomness tests. In this work we focus on statistical distributions of runs of lengths one, two, and three. Using these distributions we state three new statistical randomness tests. New tests use distribution and, therefore, exact values of probabilities are needed. Probabilities associated runs of lengths one, two, and three are stated. Corresponding probabilities are divided into five subintervals of equal probabilities. Accordingly, three new statistical tests are defined and pseudocodes for these new statistical tests are given. New statistical tests are designed to detect the deviations in the number of runs of various lengths from a random sequence. Together with some other statistical tests, we analyse our tests’ results on outputs of well-known encryption algorithms and on binary expansions of , , and . Experimental results show the performance and sensitivity of our tests.

1. Introduction

Random numbers and random sequences are extensively used in many areas such as game theory, numerical analysis, quantum mechanics, and cryptography. In cryptography, need for random sequences emerges in many different applications such as challenge and response authentication systems, generation of digital signatures, and zero-knowledge protocols. Among those, the most important feature is key generators which highly depend on random values. Use of weak random values in key generations can cause a leakage in the system and hence an adversary can gain ability to break the whole cryptosystem. Therefore, randomness testing is an essential part of security evaluation of a cryptographic algorithm.

Random sequences and random numbers can be generated by true random sources, such as atmospheric noise and radioactive decay. However, using these sources in an algorithm is unpractical. It causes challenging problems in transmitting and storing large random bits since reproducing outputs of these sources is nearly impossible. Therefore, sequences and numbers, used as a key in cryptographic algorithms such as block ciphers and synchronous stream ciphers, should be pseudorandom, that is, random looking sequences of a specific length which are produced by deterministic processes [1]. Since proving randomness of these generators mathematically is nearly impossible, we use statistical randomness test for this purpose. Using statistical tests we try to detect the weaknesses that a generator could have.

Moreover, outputs of encryption algorithms should be indistinguishable from random mappings; that is, it should be random looking. This is another place where pseudorandom sequences play an important role. Also, deciding the round number of a block cipher algorithm, which is an essential part of design, is highly associated with concept of being random looking. Therefore, security of the system highly depends on production or testing of pseudorandom sequences. For these reasons, statistical randomness tests are considered as an important part of evaluating security of cryptographic algorithms.

Statistical tests are designed to test the null hypothesis which states that the sequence is randomly generated. Testing a binary sequence means that its degree of randomness is evaluated by a statistical test. The conclusion is that the sequence is random or not probabilistic; in other words the hypothesis is either accepted or rejected. A statistical test considers a random variable whose distribution function is known. Depending on the distribution, a real number between 0 and 1, called value, is calculated. If the value of the sequence is evaluated as one, we say that the sequence is completely random. On the other hand, the sequence is completely nonrandom, if value is determined as zero. If the value exceeds a predefined real number , then is accepted; otherwise, it is rejected.

Usually result of one statistical test is not enough to decide the randomness of sequence. Therefore, it is better to use a collection of statistical tests, called statistical test suites, to measure different behaviours of the sequence under consideration. These suites should be well designed to give trustable results and should not be blindly populated.

In the literature, there exist various statistical test packages. Among those, the most important ones are given in Knuth’s book [2], test suite presented by Rukhin [3], DIEHARD [4], CRYPT-X [5], TestU01 [6], and the test suite published by NIST [7] so far. Also there are works focusing on statistical tests individually such as a universal statistical test, stated by Maurer [8], a test based on diffusion characteristic of a block cipher [9], and topological binary test defined by Alcover et al. [10].

In this work, we propose three new statistical randomness tests which depend on famous postulates of Golomb. These tests are named as runs of length one, runs of length two, and runs of length three test. The rest of the paper is formed as follows. In Section 2, we explain Golomb’s randomness postulates. Also we discuss run tests given in the literature. In Section 3, we give proofs of our fundamental theorems. Also in order to calculate the probabilities needed, we state corollaries and algorithms for each theorem. In Section 4, we state new run tests and give the pseudocodes. In Section 5, we apply new tests to binary expansion of , , and , which are obtained from NIST package [7] and outputs of five advanced encryption standard competition finalists. In the last part of implementation we generate some nonrandom data sets to emphasize the sensitivity of our tests. Finally, in Section 6, we summarize our results and state the topics for further research.

2. Preliminaries

2.1. Golomb’s Randomness Postulates

Deciding the pseudorandomness of a sequence is a difficult task. The base for this task is constructed by Golomb’s postulates. These postulates are one of the most important attempts to create some necessary properties for a finite (or periodic) pseudorandom sequence to be random looking. Sequences satisfying following three properties are called pseudonoise sequence [11].

Let be an infinite binary sequence periodic with (or a finite sequence of length ). A run is defined as an uninterrupted maximal sequence of identical bits. Runs of 0’s are called gap; runs of 1’s are called block. R1, R2, and R3 are Golomb’s randomness postulates which are given as follows.(R1)In a period of , the number of 1’s should differ from the number of 0’s by at most 1. In other words, the sequence should be balanced.(R2)In a period of , at least half of the total number of runs of ’s or ’s should have length one, at least one-fourth should have length 2, at least one-eighth should have length 3, and the like. Moreover, for each of these lengths, there should be (almost) equally many gaps and blocks.(R3)The autocorrelation function should be two-valued. That is, for some integer and for all ,

The first postulate states that, in an sequence, the difference of number of ones and zeros should be 1 or 0. In other words, the number of ones in a sequence, that is, weight of the sequence, should be approximately . Frequency test, which measures the difference of number of ones and zeros in an sequence, is defined to check the first postulate of Golomb. Balancedness is a fundamental feature for an algorithm’s output. Therefore, frequency test is used as an initial step for almost all test suites. If an algorithm fails the frequency test, then other tests are not even applied.

The second postulate of Golomb is about number of runs in sequences. Tests, which deal with number of runs, are called run tests and these are also included in many test suites as the frequency test. Since calculating the expected number of runs of specified length in a random sequence is a difficult task (especially when specified length becomes large), most of test suites consider only the total number of runs and do not consider the number of runs of different lengths.

Lastly, the third postulate gives information about amount of similarities between the sequence and shifted version of it. If is a random looking sequence, the autocorrelation should be constant; that is, correlation between and bits should give no information about the sequence for . In this paper, we mainly focus on the first and second postulates, and the last one is not a matter of concern.

These postulates are theoretical, but difficult to check. Inspired by these postulates, we define new statistical randomness tests which are practical. In order to give the definitions, we calculate the exact probabilities. Before explaining these tests, first we give the mathematical background in order to compute the probabilities that we use in the following Section 3.

2.2. Run Test

Run tests depend on Golomb’s second postulate and investigate number of runs in a sequence and their distribution. Run tests take place in most of the test suites. Almost all of these suites, run tests, consider only the total number of runs in a sequence. The most important ones of these are the suites given in [2, 4, 6, 7].

Knuth [2] and DIEHARD [4] test suites define the run test on random numbers. They define runs as runs up and runs down in a sequence. To illustrate their definition, consider a sequence of length 10, . Runs are indicated by putting a vertical line between ’s when . Hence, runs of the sequence can be seen as . In other words, the run test examines the length of monotone subsequences. TestU01 [6] defines run and gap tests for testing the randomness of long binary stream of length . This test collects runs of 1’s and 0’s until the total number of runs is . Then, for each length the number of runs of 1’s and 0’s of length in this collection is counted and recorded. Then test is applied on these counts. Longest run of 1’s test is also defined for the collection of strings of length which are obtained from the original long binary string of length .

NIST [7] test suite consists of firstly 16 and then 15 various statistical tests. After its first publication, some revisions are made. In 2004, it is discovered that test setting of discrete fourier transform test and lempel-ziv test were wrong [12] and new test, which can be used instead of lempel-ziv test, is defined in [13] and correction of overlapping template matching is stated in 2007 [14].

In the suite, 2 of 15 tests are variations of run tests. They are called run test and longest run of ones in a block test. The first one deals with the total number of runs in a sequence. It calculates the total number of runs in a sequence and determines whether it is consistent with the expected number of runs, which is supposed to be close to in a sequence or not. The second one determines whether the longest run of ones in the sequence is consistent with the length of the longest runs of ones which is in a random sequence. In NIST test suite the reference distributions for the run tests are a distribution.

In test suite, NIST assumed that sequence of length is of order to . For this reason, asymptotic reference distributions were derived and used for their tests. But, asymptotic reference distribution is misleading for smaller values of ; as stated in [7] “the asymptotic reference distributions would be inappropriate and would need to be replaced by exact distributions that would commonly be difficult to compute”. In other words, asymptotic reference distributions can lead to some errors in testing short sequences such as outputs of block ciphers or hash functions. In 1999, to overcome this problem, Soto and Bassham [15] propose to concatenate short sequences. This method is used for testing the randomness of Advanced Encryption Standard candidates. Another method has been proposed by Sulak et al. [16], in which distribution functions are used in NIST test suite, replaced by exact distribution and a similar method is used for producing the values.

In this paper we use the method stated in [16]; thus we need the exact probabilities and exact distribution of tests statistics. Finding the number of sequences having a specified number of runs of length is a hard problem. We find the number using combinatorial formulas. After that we calculate the desired probabilities by dividing the calculated number by the total number of sequences of length . Calculating the exact probabilities of the number of runs of length in a sequence enables us to define the new run tests. We calculate the probabilities for number of runs of lengths one, two, three and we give the detailed information in the following chapter. However, as the length grows, calculations are getting complex and time required for these calculations grows exponentially. Therefore tests involving number of runs of length    are unpractical for statistical test suites.

3. Computation of Probabilities

In this chapter, we give the theorems to find the number of sequences with specified properties and hence state the exact probabilities. The probabilities depend on the number of existing shorter runs. That is, probabilities for the number of runs of length two depends on both total number of runs and number of runs of length one; similarly number of runs of length three depends on total number of runs and number of runs of lengths one and two and so on. Since they have some dependencies with other variables, these probabilities are not directly used in tests. Therefore, after stating each theorem we give the corollaries and the algorithms to find the exact probabilities which are needed for describing the tests.

In the calculations of probabilities we frequently use the following combinatorial formulas.

Fact 1 (number of nonnegative integer solutions of linear equation [17]). The number of nonnegative integer solutions of ,  , is .

Fact 2. The number of positive integer solutions of ,  , is .

Proof. With the substitution we get From Fact 1 it follows that the number of solutions is

3.1. Number of Runs

In the rest of the paper we denote the total number of runs and number of runs of lengths one, two, and three as , , , and and we use samples of these variables, , , , and , respectively. We denote the probability of randomly chosen binary sequence with runs by . In the same way, is the probability of randomly chosen binary sequence with runs of length . Also we use subscripts to differentiate the blocks of a long sequence or outputs of block ciphers and hash functions. Lastly, , , and are used to state the set of number of runs of lengths one, two, and three in the sequences accordingly. That is, and corresponds the number of runs of length in the sequence.

Moreover, in order to illustrate the runs of a sequence we use the equation for a sequence with length and having runs.    represents the number of bits in run. An important property of this illustration is that it gives no information about content of ’s; that is, can be a run of 0’s or 1’s. Thus, each positive integer solution of the equation corresponds to two sequences: one starts with 1 and the other starts with 0. Hence, the number of sequences with length and having exactly runs is by Fact 2.

Example 1. Let be a binary sequence of length 32 and having 15 runs. Then,

Probabilities are calculated in a similar way as in [16]. The main difference is that, in the previous approach, sequences are viewed in a circular form. Probabilities depend on weight of the sequence and parity of number of runs. We calculate the probabilities with the above notation, which is not based on circular form, and they depend on the number of runs and number of shorter runs.

Theorem 2. Let be a binary sequence of length having total of runs; then

Proof. We can illustrate the sequence of length , having runs, as follows: From Fact 2 the number of all binary sequences of length , having total number of runs, is . Since there are sequences, probability of a randomly chosen such sequence to have exactly runs is

3.2. Number of Runs of Length One

In this section, probabilities for a -bit sequence having runs of length one is given in a combinatorial approach. We use the illustration defined in Section 3.1 to compute the number of sequences having total of runs, of which are of length one, and hence we calculate the probabilities. Then we state the first new run test depending on the idea of Golomb’s second postulate in the next chapter.

Theorem 3. The probability of randomly chosen binary sequence with length , having total of runs, of which are runs of length one, is

Proof. As in the proof of the Theorem 2, we illustrate the sequence as follows: Let us first assume that the last runs are the runs of length one and the rest are of at least length two. That is, Notice that, here, , so we use the change of variable for . ConsiderThe number of sequences having conditions, which are stated above, is equal to the number of nonnegative solutions of (11). Consequently, by the Fact 1, number of desired solutions is Selection of runs of length 1 gives us a factor of . Since each positive integer solution of (9) corresponds two sequences (one starts with 1; the other starts with 0), 2 is stated as factor also. Therefore, the number of all binary sequences of length , having total number of runs, of which are of length one, is equal to . Hence probability of a randomly chosen such sequence to have exactly runs, of which are of length one, is

Number of sequences having runs, of which are of length one, can be found using the formula above. Our aim is to compute total number of sequences of length having runs of length one without depending on the total number of runs. In order to compute aimed probabilities we use Corollary 4.

Corollary 4. Let denote the number of sequences with exactly runs of length one. Then,Since the number of all sequences of length is , probabilities follow immediately:

Moreover, using Algorithm 1 we calculate the probabilities for a sequence of length and runs of length one so that we can investigate number of length one independently.

Algorithm 1: Calculating Pr for .

After finding the exact probabilities we calculate the subinterval probabilities. Following example shows the calculations of subinterval probabilities for 128-bit sequences.

Example 5 (calculating the subinterval probabilties).
Step 1. Calculate for by using Corollary 4 and Algorithm 1.
Step 2. Determine subintervals such that; such that, . In our example subinterval probability can be calculated as follows; Step 3. Finally, we get the Table 1 for subinterval probabilities.

Table 1: Subinterval probabilities for 128-bit sequences.

In the same way we calculate the subinterval probabilities for different block lengths. All subinterval probabilities for runs of length one test can be seen in Table 2.

Table 2: Interval and probability values for runs of length one for 64-, 128-, 256-, and 512-bit blocks.

Example 6. Let be a random sequence of length 8, having 4 runs and 2 runs of length one.
Since, we have exactly 4 runs, ’s must be at least 1;Fix then;We want . Define for . The above construction gives us 6 different sequences of length 8 with 2 runs of length one. Also selecting and gives us a factor of . Hence, the total number of sequences of length 8 with 4 runs, 2 of which are of length one is .

3.3. Number of Runs of Length Two

In this section, we calculate the number of sequences having runs of length two in a combinatorial approach. As in the previous section we use the same notation and the similar ideas in Section 3.1 to compute the number of sequences having total of runs, of which are of length two and hence we calculate the probabilities. After that, using these calculations, we state the second new run test.

Theorem 7. The probability of randomly chosen binary sequence with length , having runs, of which are length one and of which are length and two is,

Proof. As in the previous Theorems 2 and 3 we illustrate the sequence as follows; Let us first assume that the last runs are of length one and runs are the runs of length two. The rest are of length at least three. That is, Notice that here, . We use the change of variables for The number of sequences having conditions, which are stated above, is equal to the number of nonnegative solutions of (23). Consequently, by the Fact 1, number of desired solutions is, Selection of and runs of length 1 and length 2 give us a factor of . Since, each positive integer solution of (21) corresponds two sequences (one starts with 1, the other starts with 0) 2 is stated as factor also. Therefore, the number of all binary sequences of length , having total number of runs, and of which length one and two respectively, is equal to,Hence the probability of a randomly chosen sequence to have the above conditions is;

We find the number of sequences having runs, and of which are length one and two respectively, using formula above. In order to define the second new run test, we need number of sequences of length having runs of length two, without depending on the other variables such as, number of runs and number of runs of length one. Corollary 8 enables us to compute the probabilities that are needed for defining the new statistical test.

Corollary 8. Let denote the number of runs of sequences with exactly runs of length two. Clearly, we have maximum runs of length two. Otherwise sequence length exceeds . Then, for , Since the number of all sequences of length is , probabilities follow immediately:

Also Algorithm 2 enable the calculation for the number of sequences with desired conditions. Furthermore, subinterval probabilities can be stated in the same way as in Example 5. The subinterval probabilities can be seen in Table 3.

Table 3: Interval and probability values for runs of length two test for 64-, 128-, 256-, and 512-bit blocks.

Algorithm 2: Calculating Pr for .

3.4. Number of Runs of Length Three

In the last section of this chapter, we focus on the number of sequences having exactly runs of length three. We use the same constructions with the previous sections to compute the number of sequences having total of runs, of which are of length three, and hence we calculate the probabilities. Then using these calculations, we state the last new statistical test in the next chapter.

Theorem 9. The probability of chosen binary sequence with length , having runs, runs of length one, runs of length two, and runs of length three, is

Proof. As in Theorems 2, 3, and 7 we illustrate the sequence as follows: Let us first assume that the last are of length 1, are of length 2, and are of length 3. The rest are of at least length four. Consider Notice that and we use the change of variables for .
The number of cases is equal to the number of nonnegative solutions of the following equation: The number of sequences having conditions, which are stated above, is equal to the number of nonnegative solutions of (32). Consequently, by Fact 1, number of desired solutions isSelection of , , and runs gives us a factor of . Therefore, the number of all binary sequences of length with conditions stated above isHence, the probability of a randomly chosen sequence to have these conditions is

We find the number of sequences having runs, , , and of which are of lengths one, two, and three, using the formula above. In order to use probabilities in tests we need numbers of sequences with length and runs of length two, without depending on the other variables. Corollary 10 enables us to compute the probabilities that are needed for defining the new statistical test.

Corollary 10. Let denote the number of runs of sequences with exactly runs of length three. Clearly, we have maximum runs of length three. If sequence length exceeds , then, for , Since the number of all sequences of length is , probabilities follow immediately:

Since the number of all sequences of length is , probabilities follow immediately: . And Algorithm 3 enables the calculations of the number of sequences of length and runs of length three and hence subinterval probabilities can be stated in the same way as in Example 5. The subinterval probabilities can be seen in Table 4.

Table 4: Interval and probability values for runs of length three test for 64-, 128-, 256-bit blocks.

Algorithm 3: Calculating Pr for .

In this chapter we formulate the exact numbers of sequences with given conditions and hence corresponding probabilities are given. As we mentioned before calculating the probabilities for number of runs of length more than three is unpractical. The probabilities can be stated theoretically in the same way. However the time consumption of algorithms to find the exact values grows exponentially. Therefore, it is inconvenient to use them in test suites.

4. Tests Descriptions

Golomb’s first postulate is about the weight of a sequence and in many test suites the postulate is implemented with a proper generalization. On the other hand, the second postulate, which is about runs of a sequence, is mostly implemented according to the total number of runs regardless of their lengths. In this chapter, we define three new statistical tests as a proper generalization of Golomb’s second postulate which are runs of length one test, runs of length two test, and runs of length three test. The subjects of new run tests are , , and as their names state.

We test the null hypothesis () which states that the sequence is randomly produced. There are two type of errors which are called type I and type II errors. Type I error occurs when the data is random and is rejected and the second one occurs when the data is nonrandom and is accepted. Probability of type I error is called level of significance and denoted by . A statistical test evaluates the sequence against this predefined number . If value, produced by statistical test, is greater than , then is accepted. Level of significance is decided based on the applications. We set as 0.01, as in many test suites.

We use as reference distribution. The measurements are compared with the expected values. In order to make a comparison we divide number of runs of lengths one, two, and three into subintervals, as explained in Section 3. New tests use the subintervals with the following property: . For example, probabilities of 128-bit sequences for runs of length two test can be divided into 5 subintervals as follows:

After calculating the subinterval probabilities, we count the number of runs of length in the different sequences and increment the corresponding subinterval counter by one according to the counted number of runs. To denote the number of sequences in the given subinterval we use . Before the last step we calculate the using the following formula [16]. Also denotes the number of sequences. Consider

Lastly value is calculated according to the given values:

We test the by comparing the produced value with the level of significance and accept or reject the . That is, if value > ,   is accepted; otherwise it is rejected.

New tests can be implemented on sequences of length (where is the block size). This number is a direct consequence of creating subintervals. In order to get reliable results, in each subinterval we need at least 5 blocks of sequences. In NIST test suite it is suggested that the sequences should be about 20.000 bits long. Therefore, new run tests can be implemented on short sequences also.

Remark 11 (derivative of a sequence). Let be a binary sequence of length ; then, derivative of , denoted by , is defined as follows.
For , Counting runs of a sequence by using the definition is unpractical. So we use the derivative of a sequence to count the runs. By the definition, all 1’s in the derivative of a sequence indicate the end of a run. So the number of runs of a sequence can be defined as the weight of its derivative.

Also we use a variation of derivative of length by adding ’s at the beginning the sequence . The variation of derivative is an important part of new defined run tests, since the number of runs of different length is determined by this sequence.

Remark 12. Let be a binary sequence and derivative of is denoted by . Then is defined as follows:

In order to count the runs at the beginning, we use a variation of derivative instead of the original derivative definition. Number of runs of length one in a sequence is indicated by the number of overlapping occurrences of 11 in its variation of derivative. In the same way number of runs of lengths 2 and 3 in a sequence is indicated by the number of overlapping occurrences of 101 and 1001, respectively. More generally we can say that number of runs of length is indicated by the overlapping number of occurrences of .

Example 13. Let be a binary sequence of length 32, having 15 runs, 6 runs of length one, 4 runs of length two, and 3 runs of length three. Then(i)Weight of is 15 which corresponds to number of runs.(ii)Number of overlapping occurrences of 11 is 6 which corresponds to number of runs of length one: .(iii)Number of overlapping occurrences of 101 is 4 which corresponds to number of runs of length two: (iv)Number of overlapping occurrences of 1001 is 3 which corresponds to number of runs length three:

Before defining new statistical tests, we give the general idea of the test by following example.

Example 14. Let be a binary sequence of length . Let and be the number of sequences in given subinterval and probability of it, respectively.
Step 1. Choose a block size . In our example we choose as 128.
Step 2. Then divide the sequence into -bit sequence. Then we get the set of sequences as follows: .
Step 3. For each count the number of runs of lengths one, two, and three. And increment the corresponding boxes by 1. ConsiderStep 4. Then, we get Table 5. Count rows of each test corresponding to the number of sequences whose number of runs of length one, two, or three is in given interval.
Step 5. is calculated by the given formula and value is computed accordingly: Step 6. Finally, we get the value for each test. (i)Number of runs of length one test: value = 0.357056.(ii)Number of runs of length two test: value = 0.462207.(iii)Number of runs of length three test: value = 0.627001.

Table 5: Number of sequences in given intervals for runs of length one test, runs of length two test, and runs of length three test.
4.1. Runs of Length One Test

The subject of the first new run test is runs of length one in the sequences. Test uses the probabilities calculated in the previous chapter. First, we collect the algorithms output and generate the data set S. If the given sequence of length is a long binary sequence, the sequence is divided into -bit blocks and gets a set of sequences and generates where . In our test can be 64, 128, 256, or 512. After generating the data set, the set is formed by counting the number of runs of length one in each sequence. In order to find the number of runs of length one, first we find the derivative of the binary sequence and then we count the overlapping occurrences 11 in for . After that we apply of goodness of fit test to the values in . We propose new run test to implement the idea of Golomb’s second postulate in statistical randomness test. The pseudocode of the test is given in Algorithm 4.

Algorithm 4: Runs of length one test (), .

4.2. Runs of Length Two Test

After giving the first new run test, we define runs of length two test. Test uses the probabilities calculated in the previous chapter. As in the runs of length one test first, we generate the data set S. Also in the second test the block size can be 64, 128, 256, or 512. From the data set S, the set is formed by counting the number of runs of length two in each sequence. Like in the previous test we get the derivative of the binary sequence . In order to find the number of runs of length two, we count the overlapping occurrences 101 in . Then we apply of goodness of fit test to the values in . The second new run test constitutes another approach to Golomb’s second postulate. The pseudocode of the test is given as in Algorithm 5.

Algorithm 5: Runs of length two test (), .

4.3. Runs of Length Three Test

The last new run test is runs of length three test. This test also uses the probabilities calculated in the previous chapter. Data sets are created as in the previous run tests. Also in the last new run test block size can be 64, 128, or 256. The set is formed by using S. The counting phase of this test is done by finding the total number of the overlapping occurrences 1001 in . Then we apply of goodness of fit test to the values in . The pseudocode of the last new run test is given in Algorithm 6.

Algorithm 6: Runs of length three test (), .

Together with three new run tests we implement the idea of Golomb’s second postulate in statistical randomness tests. The new run tests, concerning runs of lengths one, two, and three, constitute a better proper generalization of Golomb’s idea.

5. Implementations

In order to check the reliability of tests stated in the previous section, we implement new test together with well-known statistical tests included in NIST test suite.

In the first part of the experiments we select 5 encryption algorithms, which are Advanced Encryption Algorithms finalists, MARS [18], RC6 [19], Rijndael [20], Serpent [21], and Twofish [22]. pseudorandom sequences of length 128 are generated with encryption of noncorrelated data by using these algorithms. In other words, in the first experiment we test the outputs of AES finalists using our tests and NIST test suite. New run tests are implemented on pseudorandom sequences of length 128 as described in the previous section and NIST’s tests are implemented on a binary sequence of length by concatenating the outputs of algorithms. The results can be seen in Table 6.

Table 6: Test results for the 128-bit outputs of AES finalists.

In the second part of the experiments, we use the binary expansions of , , and . The binary expansions can be found within the NIST test suite. As in the first part we also use well-known tests that are included in NIST test suite. We collect first bits of the binary expansions. In order to apply new run tests, collected long sequence is divided into 128-bit blocks; hence we get sequences of length 128. Using the second implementation we show the performance of new run tests. The test results can be seen in Table 7.

Table 7: Test results for the binary expansion of , , and .

In the last part of the experiments, we analyse the sensitivity of new run tests. In order to do the implementation, first we need to generate a nonrandom sequence.

A nonrandom sequence can be generated in two steps. First, we create a sequence of random numbers such that for using RNGCryptoServiceProvider classes of C. After the generation we create nonrandom data by using the following important concept in cryptography defined in [23].

Definition 15. Let be a binary sequence of length and element of it is represented as ; then bias is defined as follows:

Clearly, we can say that in a true random sequence we expect bias as 0. That is, . Moreover, this is the main idea of Golomb’s first postulate. To generate nonrandom sequence we need to increase the bias. Finally using Algorithm 7 we can generate a nonrandom sequence.

Algorithm 7: Generation of biased sequence .

Example 16. Let be a random sequence with for ; from this sequence we construct a binary sequence with bias 0.05. The generation of nonrandom sequence can be summarized as follows:

In the last part of the experiments we generate nonrandom datum with different biases using the above construction. We observe the behaviour of new run tests with respect to the randomness of a sequence. The last results show the efficiency of the new tests. Moreover new run tests can detect the deviations in distributions of runs while other tests cannot. The test results can be seen in Table 8.

Table 8: Test results for nonrandom data sets.

6. Conclusion

In cryptography almost all applications use random looking sequences. Therefore randomness is one of the most important issues for cryptographic algorithms. In fact, using weak random values enables an adversary to break the whole system.

In all applications, used values should be of sufficient size and be random, in such a manner that probability of any chosen quantity should be small enough to eliminate an adversary to gain any specific information. Therefore, sequences and numbers, used as a key in cryptographic algorithms, should be pseudorandom. Also these sequences should have good statistical properties. For these reasons statistical randomness is an important topic. While giving a mathematical proof that a generator is a random bit generator is nearly impossible, statistical tests are defined to detect weaknesses that a generator could have. Hence, they are considered as an important part of evaluating security of cryptographic algorithms.

In this work, we propose three new statistical tests based on Golomb’s second postulate. Finding the real probabilities related to number of runs of lengths one, two, and three enables us to compare the observed values accordingly. New run tests can be used in test suites to test security of algorithms so that Golomb’s second postulate is implemented in a proper way. Moreover, these tests can be used as an evaluation tool for short sequences such as outputs of block ciphers and hash functions. These tests can detect deviations in distribution runs which cannot be detected by other tests.

Also, we experiment with some standard encryption algorithms that behave like pseudorandom number generator and random sequences such as binary expansion of , , and . Implementations show the consistency of new statistical test with other well-known statistical tests. It is shown that, in order to detect the deviation from randomness (in the sense of distribution of runs), new statistical tests are more efficient than other statistical tests.

As a future work, we extend statistical tests to approach Golomb’s randomness postulates more than now. And correlations between new statistical tests and also with other statistical tests can be examined.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

  1. A. J. Menezes, S. A. Vanstone, and P. C. V. Oorschot, Handbook of Applied Cryptography, CRC Press, Boca Raton, Fla, USA, 1st edition, 1996.
  2. D. E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Addison-Wesley, Longman Publishing, Boston, Mass, USA, 3rd edition, 1997.
  3. A. L. Rukhin, “Testing randomness: a suite of statistical procedures,” Theory of Probability & Its Applications, vol. 45, no. 1, pp. 111–132, 2001. View at Publisher · View at Google Scholar · View at Scopus
  4. G. Marsaglia, “The marsaglia random number CDROM including the diehard battery of tests of randomness,” 1995, http://www.stat.fsu.edu/pub/diehard/.
  5. W. Caelli, “Crypt x package documentation,” Tech. Rep., Information Security Research, 1992. View at Google Scholar
  6. P. L'ecuyer and R. Simard, “TestU01: a C library for empirical testing of random number generators,” ACM Transactions on Mathematical Software, vol. 33, no. 4, article 22, 2007. View at Publisher · View at Google Scholar · View at Scopus
  7. L. E. Bassham III, A. L. Rukhin, J. Soto et al., “A statistical test suite for random and pseudorandom number generators for cryptographic applications,” Tech. Rep. Sp 800-22 rev.1a, NIST, Gaithersburg, Md, USA, 2010. View at Google Scholar
  8. U. M. Maurer, “A universal statistical test for random bit generators,” Journal of Cryptology, vol. 5, no. 2, pp. 89–105, 1992. View at Publisher · View at Google Scholar · View at Scopus
  9. V. Katos, “A randomness test for block ciphers,” Applied Mathematics and Computation, vol. 162, no. 1, pp. 29–35, 2005. View at Publisher · View at Google Scholar · View at Scopus
  10. P. M. Alcover, A. Guillamón, and M. D. C. Ruiz, “A new randomness test for bit sequences,” Informatica, vol. 24, no. 3, pp. 339–356, 2013. View at Google Scholar · View at Scopus
  11. S. W. Golomb, Shift Register Sequences, Aegean Park Press, Laguna Hills, Calif, USA, 1982.
  12. S.-J. Kim, K. Umeno, and A. Hasegawa, “Corrections of the nist statistical test suite for randomness,” International Association for Cryptologic Research, p. 18, 2004. View at Google Scholar
  13. K. Hamano and H. Yamamoto, “A randomness test based on T-codes,” in Proceedings of the International Symposium on Information Theory and its Applications (ISITA '08), pp. 1–6, December 2008. View at Publisher · View at Google Scholar · View at Scopus
  14. K. Hamano and T. Kaneko, “Correction of overlapping template matching test included in NIST randomness test suite,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. 90, no. 9, pp. 1788–1792, 2007. View at Publisher · View at Google Scholar · View at Scopus
  15. J. Soto and L. Bassham, “Randomness testing of the advanced encryption standard finalist candidates,” NIST IR 6483, National Institute of Standards and Technology, 1999. View at Google Scholar
  16. F. Sulak, A. Doğanaksoy, B. Ege, and O. Koçak, “Evaluation of randomness test results for short sequences,” in Sequences and Their Applications—SETA 2010, vol. 6338 of Lecture Notes in Computer Science, pp. 309–319, Springer, Berlin, Germany, 2010. View at Publisher · View at Google Scholar
  17. S. Ross, A First Course in Probability, Prentice Hall, New York, NY, USA, 6th edition, 2002.
  18. C. Burwick, D. Coppersmith, E. D'Avignon et al., Mars—a candidate cipher for AES, NIST AES Proposal, 1999.
  19. R. L. Rivest, M. J. B. Robshaw, Y. Yin, and R. Sidney, The RC6 Block Cipher, 1998.
  20. J. Daemen and V. Rijmen, The Design of Rijndael, Springer, New York, NY, USA, 2002.
  21. E. Biham, R. J. Anderson, and L. R. Knudsen, “Serpent: a new block cipher proposal,” in Fast Software Encryption: 5th International Workshop, FSE' 98 Paris, France, March 23–25, 1998 Proceedings, vol. 1372 of Lecture Notes in Computer Science, pp. 222–238, Springer, Berlin, Germany, 1998. View at Publisher · View at Google Scholar
  22. B. Schneier, J. Kelsey, D. Whiting, D. Wagner, C. Hall, and N. Ferguson, “Twofish: a 128-bit block cipher,” in Proceedings of the 1st Advanced Encryption Standard (AES) Conference, Ventura, Calif, USA, August 1998.
  23. H. M. Heys, “A tutorial on linear and differential cryptanalysis,” Cryptologia, vol. 26, no. 3, pp. 189–221, 2002. View at Publisher · View at Google Scholar