Research Article | Open Access
An Improved Genetic Algorithm for Developing Deterministic OTP Key Generator
Recently, a genetic-based random key generator (GRKG) for the one-time pad (OTP) cryptosystem has been proposed in the literature which has certain limitations. In this paper, two main characteristics (speed and randomness) of the GRKG method are significantly improved by presenting the IGRKG method (improved genetic-based random key generator method). The proposed IGRKG method generates an initial pad by using linear congruential generator (LCG) and improves the randomness of the initial pad using genetic algorithm. There are three reasons behind the use of LCG: it is easy to implement, it can run efficiently on computer hardware, and it has good statistical properties. The experimental results show the superiority of the IGRKG over GRKG in terms of speed and randomness. Hereby we would like to mention that no prior experimental work has been presented in the literature which is directly related to the OTP key generation using evolutionary algorithms. Therefore, this work can be considered as a guideline for future research.
Recent years have witnessed use of information in many areas including financial accounts, military and political. Security of this information in both storage and transit is crucial as it may be compromised resulting in financial loss, disclosure of military or commercial secrets, and even the loss of life. Cryptography is one set of techniques for providing information security. Historically, cryptography is commonly connected with surveillance, warfare, and the similar applications. However, with the advent of information civilization and digital revolution, cryptography is also useful in the peaceful lives of common people, for example, when buying something over the Internet through credit card, withdrawing money from the ATM machines using smart-cards, and locking and unlocking luxury cars.
Cryptography is related to the design of cryptosystems. Cryptosystems have two divisions: symmetric key and asymmetric key. In the case of symmetric-key cryptosystem, encryption function takes a text message (plaintext) as input and transforms it into an unreadable text (ciphertext) with the use of a secret key . The decryption function converts the ciphertext back to the plaintext using the same secret key. If any flaws or oversights exist in the cryptosystem, it can be exploited by the attacker . The attacker can recover the plaintext from the ciphertext without knowing the secret key because of openness of cryptographic algorithms and the encrypted data transfers via the insecure public communication channel. For this reason, sensitive applications, for example, financial domain, demand perfect security that can only be achieved by one-time pad (OTP) symmetric-key cryptosystems in which the key used for encryption once is never used anymore at any time . For achieving perfect security, an obvious choice is random generation of the key via truly random sources. However, this choice is inefficient (generation of truly random numbers from hardware-based physical phenomena, for example, elapsed time between emissions of particles during radioactive decay; thermal noise from a semiconductor diode or resistors; sound from a microphone or video input from a camera, and so on; and generation of truly random numbers from software-based process, for example, the system clock; elapsed time between keystrokes or mouse movement; and operating system values such as system load and network statistics are impractical choices for practical cryptographic applications, i.e., large sized keys for each encryption ). Therefore, for sensitive applications pseudorandom generation of the key is the only option to make the scheme practical. Recent years have witnessed large use of computationally secure OTP all over the world, typically during financial transactions. Hereinafter OTP key means computationally secure pseudorandom key and original OTP key means truly random key. In this work, we present a genetic-based scheme for automatically generating OTP keys.
2. Related Work and Our Contributions
Several things in the world are naturally encoded, for example, genomes of animals . This motivates us to utilize (genotype) genetic algorithms in development of deterministic scheme that can generate the OTP keys rapidly. In 2013, Sokouti et al.  have demonstrated a significant use of genetic algorithm for automatically generating OTP keys. They have proposed and compared two genetic-based OTP key generators, namely, 10P-GRKG and the GRKG. The comparison results in  show that the GRKG method is much better than the 10P-GRKG method in terms of speed and randomness. However, it is observed that the GRKG method has certain limitations which needs improvements. In this paper, we propose an improved genetic-based random key generator (IGRKG). As compared to GRKG, the proposed IGRKG generator generates the OTP key rapidly and the degree of randomness of the generated keys is better. In the literature, a prior attempt in OTP key generation using evolutionary technique has been addressed only in . Therefore, this paper can present a detailed comparison between GRKG and IGRKG generators. We also compare the Diehard scores of GRKG and IGRKG with some existing pseudorandom number generators. It is important to note that, except GRKG and IGRKG generators, the other pseudorandom generators have not been developed to generate OTP key.
It should be noted that speed and randomness are the main objectives of a designer behind the design of a pseudorandom key generator. For achieving these objectives the following novelties and modifications are introduced which are our major contributions:(1)Unlike GRKG we use a comparatively short secret key.(2)Unlike parameter employed in GRKG, a new variable is proposed, where the essence of the parameter is the minimum generations in which the initial pad is obscured almost entirely.(3)For determination of the crossover point, rather than using modular arithmetic over addition, we introduce a new approach of modular arithmetic over subtraction. The advantage of this approach is that it improves the randomness of the pad and makes the scheme faster.(4)For evolving existing generation, a new and efficient approach is introduced that updates two variables and . These variables are employed in Algorithm 1, Steps (–, to decide crossover and mutation points. This idea increases the randomness of the existing pad and evolves the obscure final pad rapidly.(5)For increasing the speed of encryption and decryption, a more efficient encryption and decryption function is suggested.Figure 1 shows the block diagram of the proposed work. Figure 1 shows that four integer values are taken as input corresponding to the short secret key: , , , and . It should be noted that the values of these parameters are taken only once in the presence of both the sender and receiver. Also, all the values must be “truly random” which is referred to as seed. This seed must be generated from the truly random sources, because it is utilized by GA techniques in order to generate larger sized keys. As shown in Figure 1, the seed is first processed by one of the existing statistical sound generators, namely, LCG. Through feedback mechanisms, the initial pad equal to the size of the plaintext is generated. That is, is used to generate , is used to generate , and so on, where for each computation the remaining secret key parameters, that is, the multiplier , the increment , and the modulus , remain unchanged. The initial pad is then converted into a population of individuals , where is a binary equivalent of integer , is a binary equivalent of integer , and so on. Afterward, the population is evolved by three evolutionary operators: selection, crossover, and mutation (all these operators have been discussed in detail in Sections 5.1, 5.2, and 6). The probability of crossover and mutation is controlled by a common probability parameter . However, for each instance of the problem, the rate of mating and mutation may be different; we determine these rates by a deterministic mathematical procedure (for details, see Algorithm 1, Steps to ). The selection of individuals for mating, crossover point , choice of genes for mutation, and mutation point are also controlled by a deterministic mathematical procedure (see Algorithm 1, Steps to and to ). Finally, we get an obscure final pad , where is an integer equivalent of the corresponding binary individual. The number of generations is controlled by a parameter, namely, (for details about this parameter, see Section 5.2, Step ()).
Advantages of IGRKG over GRKG. () IGRKG generator is much faster than the GRKG generator; for instance, in generating large sized secure pad (e.g., ), the average time taken by the IGRKG generator is about 2.432 seconds, while the GRKG generator takes 9.747 seconds (for details, see Section 6.4). These results indicate that the IGRKG generator is four times faster than the GRKG generator. Consequently, in the case of exchange of a significant number of encrypted messages, the OTP-IGRKG system will outperform the OTP-GRKG system. () In terms of randomness, the quality of the IGRKG generator is significantly better than the GRKG generator (see statistical and randomness testing results in Section 6.5).
We organize the remainder of the paper as follows: in Section 3, we present some of the previous valuable research work in the field of cryptology where the genetic algorithm has been utilized. In Section 4, we present basics of the one-time pad and associated challenges. In Section 5, we propose the IGRKG method followed by comparison with previously proposed Sokouti et al. GRKG method. In Section 6, statistical testing and cryptanalysis results are discussed followed by conclusion in Section 7.
3. Genetic Algorithm
The origin of evolutionary algorithms (EAs) is an attempt to mimic some of the process taking place in natural evolution. Although the details of biological evolution are not completely understood (even nowadays), there exist some points supported by strong experimental evidence . Genetic algorithm (GA) is one of the most popular EA techniques that has emerged based on the concept of imitating the evolution of a species . In the case of GA, a population of individuals (or chromosomes) is generated using an intelligent method or a random method [6–8]. Each of these individuals is encoded as a binary string that represents a possible candidate solution to the problem at hand. In each iteration, the survival strength of each candidate solution is measured by a fitness function [6–9]. Afterward, the evolutionary process is constrained by three genetic operators: selection, crossover, and mutation. Through selection procedure, individuals are selected that enter into the crossover process. The crossover operator alters two or more parents to create offspring, where a probabilistic crossover rate is usually used to generate offspring [7, 8, 10]. Mutation operator produces one child from one parent by flipping a bit (s) of the parent. A probabilistic mutation rate is usually used to determine whether a particular change occurred or not within an individual [7, 8, 10].
There are some important characteristics of crossover and mutation operators that are not captured by the other. Błażej et al.  mentioned that it has never been theoretically shown that mutation is in some sense less powerful than the crossover and vice versa. Mutation serves to create random diversity in the population while crossover serves as an accelerator that promotes emergent behavior from components [11, 12]. The metaissue, then, is the relative importance of diversity and construction. It is impossible for mutation to simultaneously achieve high levels of construction and survival [11, 12]. This would appear to be important since one without the other may not be extremely useful. High construction levels are accomplished at the expense of survival (e.g., mutation rate 0.5), while good survival is at the expense of construction (e.g., mutation rate 0.01) [11, 12]. In our study, we get the highly constructive results with 0.25 to 0.3 mutation rates. That is, 25% to 30% parents are affected in our study by mutation operation (see Section 5.1 for details). GA parameters can be controlled in three different ways: deterministic [13, 14], adaptive [13–15], and self-adaptive [13, 14, 16]. The deterministic parameter control technique takes place when the value of strategy parameters (e.g., and in our study) is altered by some deterministic rule. This rule modifies the strategy parameters deterministically without using any feedback from the search [13, 14, 17].
Applications of GAs in Cryptographic Applications. GAs have been applied successfully to solve real-world optimization and search problems . These techniques have also shown good potential in the domain of cryptology. Here we mention some of the good works that have been carried out in the last decade. An interesting work in the domain of cryptographic protocol design has been carried out by Park and Hong  and Zarza et al.  in 2005 and 2006, respectively. Wang et al. (2012)  have proposed a novel method based on the genetic algorithm and chaotic map for designing substitution boxes (S-boxes). Jhajharia et al. (2013)  have utilized GAs for cryptographic key generation. Jain and Chaudhari (2014)  have proposed an improved GA method to attack the knapsack based cryptosystems. Faraoun (2014)  has proposed a block cipher design using GA and cellular automata. Recently, GA and CGP techniques have been utilized in  for determining strong cryptographic Boolean functions. Jain and Chaudhari (2015)  have proposed improved GA for automated cryptanalysis of the substitution ciphers. In  Sokouti et al. (2013) have proposed a GA technique for automatically generating OTP keys that we improve in this paper.
4. One-Time Pad Cryptosystems
One-time pad cryptosystems are based on the concept of stream cipher. In stream cipher, a short secret key is used to generate a keystream (i.e., a string of bits) . The keystream bits are XORed with the plaintext bits in the usual way to produce the ciphertext . At the receiver end, the ciphertext is XORed with keystream to get the original plaintext . However, in stream cipher, a keystream is generated from a short secret key . Therefore, these ciphers can be compromised if not used carefully. The advantage of stream ciphers is that they are much faster in hardware and therefore mostly employed in resource-constrained devices. However, the original OTP is used in those applications where the primary objective is perfect security rather than speed . The conventional OTP cryptosystem combines a plaintext sized key with the given plaintext code as modulo addition “26” and thereby generates the ciphertext. An example is shown in Table 1. The fact is that the plaintext message can consist of not only English alphabet, but also ASCII characters. Therefore, in this paper we consider that the encryption and decryption of plaintext will be done on “modulo 256” rather than “modulo 26.” As a result, each plaintext character will consist of 8 bits (i.e., each plaintext character will be in the range ).
5. Genetic Algorithm for Generating OTP Keys
There are two main challenges for developing original OTP cryptosystem: () The OTP cryptosystem must generate a key of length equal to the length of the plaintext. () The key should be truly random for achieving perfect security. Plaintexts are variable sized and often their size is large. Therefore, it is impossible to generate a truly random key of the size of the plaintext.
An efficient option for solving this kind of problem is the utilization of pseudorandom key . However, it is not trivial to generate pseudorandom key equal to the length of the plaintext. In this context, Sokouti et al. (2013)  have proposed the GRKG method. GRKG generator accepts a fixed size short secret key as an initial key and thereby generates the pseudorandom key . Here, we point out each time a different key is generated.
We have two popular pseudorandom generator choices as a base generator: LCG and Mersenne Twister, because of the good statistical properties . However, for cryptographic security only the use of a statistical sound generator is not sufficient. Therefore, we can employ either LCG or Mersenne Twister for generating initial pad and then genetic algorithm is used to improve the randomness of the initial pad. As a result, an obscure and appropriate OTP key is generated. In this research, we have decided to use LCG method because it is easy to implement and runs efficiently on computer hardware . Most importantly, its use allows us to give a fair comparison between two methods, GRKG and IGRKG, since LCG has also been employed in the GRKG method.
5.1. IGRKG: The Proposed Method
Consider initial key which consists of LCG and GA parameters, and the details are as follows.
Parameters Related to LCG : the modulus () : an initial positive integer number for generating another integer number using (1), where : the multiplier : the increment .
Parameters Related to GA : combine probability of crossover and mutation : number of selections of chromosome pairs for crossover : number of selections of chromosomes for mutation : minimum number of iterations to generate a sufficient secure OTP key
Algorithm 1 (Description). Pseudocode for the proposed IGRKG method is shown in Algorithm 1. Input to the algorithm is , where is a secret key decided by communicating parties once. Using first four elements, , , , and , the initial pad is generated via LCG method, where and is the size of the plaintext. The initial pad is then converted into its equivalent binary representation (see Remark 1).
GA Operators. The binary initial pad which is generated by LCG method is modified by applying selection, crossover, and mutation that are deterministically  controlled. That is, crossover and mutation will be not performed at random positions of individuals; rather positions are determined using deterministic procedure. It is emphasized that the same secret key is possible at both ends iff identical evolutionary operations are applied. If we use fitness function then this constraint will be violated. Therefore, this work does not require any fitness function; however, for generating secure OTP keys intelligent selection, crossover and mutation operators have been designed.
Deciding Values of and . If the initial pad length , then selection of one pair of chromosomes (i.e., ) and a single chromosome (i.e., ) is sufficient for reproduction and mutation, respectively (see Remark ). However, for the initial pad of size , and are determined deterministically by utilizing and (see Steps to ). is a common probability parameter for crossover and mutation.
Fine-Tuning of Crossover and Mutation. We have tested certain type of mutation and crossover operators, but the best results have been obtained using simple mutation (which flips a selected bit) and single point crossover. In the literature it has also been shown that, among all the crossover operators, the most successful one is single point crossover . A deterministic procedure is developed for deciding crossover and mutation points (see Steps and , resp.). The number of chromosomes mutated is defined as fixed percentage of the total number of chromosomes (see Steps –().
Finding best combination of crossover rate and mutation rate is an important step in GA. In [28, 29], it is investigated that generally low mutation rates (0.01 to 0.1) and comparatively high crossover rates (0.5 to 0.7) perform very well. However, in , it is mentioned that the modern view of EAs admits that specific problem types require specific EA setups. Therefore, different crossover and mutation rates have been experimented to investigate their capability to find good solutions (the conditional optimal values of crossover and mutation rates are shown in Table 2). Note that there is no prior experimental work of this kind, so this work should be considered as a guideline for future research.
Use of and Variables. For the initialization of , we use the last element of the initial pad only once (see Step ). That is, in the evolutionary process, we will never use again due to security reasons, but the GRKG method uses more than once, which is one of the drawbacks of the GRKG method (see Table 3, Steps () and ()). Steps and show that an integer variable is used to select a chromosome pair for mating, where each time possibly a different chromosome pair is mated (see Steps and ). In each iteration the mating operation is performed “” times (see Remark and Step ). Step shows that another integer variable is used to select an individual for mutation, where each time possibly a different individual is mutated (see Steps and ). For each iteration the mutation operation is performed “” times (see Step ). By repetitive applications of mating and mutation, a new population is generated. During evolution of the population through crossover, variable is itself updated (see Steps and ). Similarly, during evolution of the population through mutation, variable is itself updated (see Steps and ). In both cases, the LCG method is used. In each iteration, after crossover and mutation, and are assigned the updated value of and , respectively (see Steps and ). This strategy has been introduced in this research for the purpose of generating robust and secure OTP key (for detailed information, see Section 5.2, Step ).
Use of Variable. Until the termination condition is not satisfied, the new population is fed back in the evolutionary process. is an integer variable that indicates the minimum number of generations till the pad is entered in the evolutionary process (see Section 5.2, Step ).
5.2. Comparison between GRKG and IGRKG Generators
In this section, we compare the proposed IGRKG method with Sokouti et al.’s GRKG method . A table of comparison based on the features of both the generators is shown in Table 3. In this table, we have underlined the values of IGRKG features that are different form their GRKG counterparts. A detailed list of proposed improvements is as follows:(1)Rather than the secret key of size “seven,” IGRKG uses a short secret key of size “six.” This is possible because the crossover and mutation probabilities have been combined in a single parameter . However, the algorithm is designed in such a way that the same probability parameter is utilized for performing both crossover and mutation operations (see Table 3, steps () and ()).(2)Unlike parameter used in GRKG, IGRKG uses parameter. The essence of the parameter is the minimum generations in which the initial pad is obscured almost completely. The IGRKG scheme has been tested with different values of and . It is observed that 50 generations are sufficient to completely obscure the initial pad. However, after each communication the variable is increased in order to achieve computationally high security.(3)For evolving the existing generation, a different approach is proposed which is based on the effective updates of and . In the GRKG method, the same initial value (i.e., PGU()) is used for evolving current generation (see Table 3: Column 2, Steps () and ()). The limitation of this approach is that the number of individuals that were improved by crossover is once again selected for mutation. Due to this reason, the GRKG method requires a large number of generations for evolving the remaining individuals. This limitation is resolved by assigning the updated value of to and by assigning updated value of to (i.e., PGU() and CGU(), see Table 3, Column 3, Steps () and (), resp.). This phenomenon gives the chance to remaining individuals that were not improved by crossover, that is, improvement in the same current generation through mutation. The main benefit of this approach is that there is a high probability of selection of chromosomes for mating that were not selected in mutation and vice versa. This idea increases randomness of the pad with the increase in iteration. That is, due to this strategy the IGRKG method produces the more randomized pad in less number of iterations as compared to the GRKG method.(4)In order to determine crossover points, GRKG method uses modular arithmetic over addition. This approach makes the GRKG scheme conceptually weak. The fact is that the sum of two chromosome values (i.e., sum of integers) before and after crossover will always be the same. That is, if two chromosomes and are mated and converted into and , respectively, whenever in the next generation and are selected for crossover, the result will be the original chromosomes, that is, again and (see Table 3, Step ()). Clearly, this phenomenon is a big obstacle in increasing randomness of the input pad. In this paper, we resolve this weakness by suggesting the use of modular arithmetic over subtraction rather than addition. Due to this strategy, even though the same pair will be selected in the next generation, the different crossover points will be selected because the subtraction of two chromosome values before and after crossover operation is different. This approach improves the practical efficiency of the generator.(5)We have critically examined that the encryption and decryption functions suggested by Sokouti et al.  are not appropriate for use in cryptography. The design of encryption and decryption functions is not a part of the OTP key generator. However, as a complete OTP scheme, we advise simple encryption and decryption functions that are often used in stream ciphers (see Table 3, Steps () and ()).
For the purpose of comparison between GRKG and IGRKG generators in terms of speed, we have implemented both generators in Java 2.0 with Intel Quad-Core processor i7 (@3.40 Ghz). We present the results of both the generators on the text “cryptology.” The size of plaintext “cryptology” is 10; that is, . Consider . That is, in each iteration “two pair ” of chromosomes and “two ” chromosomes will be affected by the crossover and mutation, respectively. Note that this example has been considered by Sokouti et al.  in their work. Therefore, for a fair comparison between GRKG and IGRKG generators, we demonstrate our work on the same example.
6.1. Common Computation
Consider secret , , , , , . Using this short secret key an initial pad is generated iteratively via LCG method. That is, mod (256) = 52, mod (256) = 11, mod (256) = 62, and so on. Finally, . A population of size 10 is initialized, where th chromosome will be binary equivalent of the th element of . This population is input in the GRKG and IGRKG generators for generating OTP keys.
6.2. Results Obtained Using the GRKG Method
In this section, we determine the OTP key from the initial pad using the GRKG generator, where the initial . Table 4 shows the working of the GRKG method. Initially , that is, the last element of the initial pad.
Mating. Initially, for mating, 8th and 9th chromosome pairs are selected, where the selection of chromosomes is determined as follows: (, i.e., ) mod (256) = 198 (mod) and (updated , i.e., ) mod (256) = 229 (mod) 10 = 9. The mating is performed in between 8th and 9th indexed chromosomes at the 7th indexed-gene position. The index is computed as follows: ( i.e., 252 + i.e., 243) mod ( i.e., 8) = 7. Similarly, the second mating operation is performed in between 0th and 5th indexed chromosomes, where the mating starts from 3rd “(52 + 31) mod (8) = 3” indexed-gene position. Following such selection and crossover mechanisms, the initial pad is transformed into .
Mutation. The mutation operation is performed using variable = 243 (here, we point out that the initial value 243 is used again for the selection of chromosomes for mutation, which is one of the drawbacks of the GRKG method). As shown in Table 4, the first mutation operation changes 4th “252 (mod) 8 = 4” indexed bit of the 8th indexed chromosome and the second mutation changes 3rd “243 (mod) 8 = 3” indexed bit of the 9th indexed chromosome. In this way, after first iteration, the intermediate pad is transformed into .
Table 4 also shows the recomputation (Re) phase which is one of the limitations of the GRKG method, where recomputation appeared due to the selection of the same chromosome again (i.e., 8th one). Similarly, during the mutation operation, if the same chromosome is selected again, then the GRKG method performs the recomputation. Here we emphasize that this phenomenon needs improvement because, in the case of large sized plaintext, the efficiency of the scheme will degrade. This paper resolves this issue by removing recomputation phase and keeping updating the resulting pad through crossover, where the crossover is performed using modular subtraction rather than modular addition.
6.3. Results Obtained Using the IGRKG Method
In this section, we determine the OTP key from the initial pad using the IGRKG generator, where the initial pad is equal to . Table 5 shows the working of the IGRKG method. Initially , that is, the last element of the initial pad.