Abstract

In 1996, a trinucleotide circular code which is maximum, self-complementary, and , called , was identified statistically on a large gene population of eukaryotes and prokaryotes (Arquès and Michel (1996)). Transition and transversions I and II are classical molecular evolution processes. A comprehensive computer analysis of these three evolution processes in the code shows some new results; in particular (i) transversion I on the 2nd position of any subset of trinucleotides of generates trinucleotide circular codes which are always and (ii) transversion II on the three positions of any subset of trinucleotides of yields no trinucleotide circular codes. These new results extend our theory of circular code in genes to its evolution under transition and transversion.

1. Introduction

We continue our study of properties of trinucleotide circular codes [15], trinucleotide comma-free codes [1, 6], strong trinucleotide circular codes [7], and the common trinucleotide circular code identified in genes [8] (see also the recent statistical analysis by [9]) which could be a translation code [10]. A trinucleotide is a word of three letters (triletter) on the genetic alphabet . The set of trinucleotides is a code (called genetic code), more precisely a uniform code but not a circular code (see Remark 2). In the past years, codes, comma-free codes, and circular codes have been mathematical objects studied in theoretical biology, mainly to understand the structure and the origin of the genetic code as well as the reading frame (construction) of genes, for example, [1113]. In order to have an intuitive meaning of these notions, codes are written on a straight line while comma-free codes and circular codes are written on a circle, but in both cases, unique decipherability is required. Circular codes only belong to some subsets of the trinucleotide set while comma-free codes are even more constrained subsets of circular codes [1].

Before the discovery of the genetic code, Crick et al. [11] proposed a maximum comma-free code of trinucleotides for coding the amino acids. This comma-free code turned out to be invalid (see, e.g., [14]). In , a maximum circular code of trinucleotides was identified statistically on a large gene population of eukaryotes and also on a large gene population of prokaryotes [8] This code has remarkable mathematical properties as it is a self-complementary maximum circular code (see the following). Since , its properties have been studied in detail by different authors, for example, [9, 1521]. Transition and transversions I and II are classical molecular evolution processes, for example, [22]. By using an algorithm based on the necklace, we perform here a comprehensive computer analysis of these three evolution processes in the code . Some new results are identified with the code by computer analysis; in particular (i) transversion I on the 2nd position of any subset of trinucleotides of generates trinucleotide circular codes which are always and (ii) transversion II on the three positions of any subset of trinucleotides of yields no trinucleotide circular codes.

2. Preliminaries

The classical notions of language theory and codes can be found in [23, 24]. Let denote the genetic alphabet, lexicographically ordered by . The set of words (nonempty words, resp.) on is denoted by (, resp.). The set of the words of length (dinucleotides or diletters) on is denoted by . The set of the words of length (trinucleotides or triletters) on is denoted by .

Definition 1. A subset is a code on if for each , the condition implies and for .

Remark 2. is a code.

Any nonempty subset of is a code called here trinucleotide code.

Definition 3. A trinucleotide code is circular if, for each , the conditions and imply , (empty word) and for .

Notation 1. A trinucleotide circular code is noted .

Remark 4. is not a trinucleotide circular code.

Let be letters in , diletters in , and an integer satisfying .

Definition 5. We say that the ordered sequence is an (Letter Diletter Continued Necklace) for a subset if

Only a few trinucleotide codes are circular. Two propositions based on the necklace concept allow to determine if a trinucleotide code is circular or not [2, 18].

Proposition 6 (see [18]). Let be a trinucleotide code. The following conditions are equivalent:(i)is a trinucleotide circular code;(ii)has no .

Definition 7. We say that the ordered sequence is an (Letter Diletter Continued Closed Necklace) for a subset if

Proposition 8 (see [2]). Let be a trinucleotide code. The following conditions are equivalent:(i)is a trinucleotide circular code;(ii)has no for any integer .

Definition 9. A trinucleotide circular code is maximal if, for each , , is not a trinucleotide circular code.

Definition 10. A trinucleotide circular code containing exactly elements is called a -trinucleotide circular code.

Definition 11. A -trinucleotide circular code is maximum as no trinucleotide circular code can contain more than words.

Notation 2. A maximum trinucleotide circular code is noted .

Remark 12. A -trinucleotide circular code is both maximal and maximum.

We recall two classical genetic maps: complementary and circular permutation.

Definition 13. The complementary genetic map : is defined by and for all by

Example 14. . This map is associated with the property of the complementary and antiparallel double helix (one DNA strand chemically oriented in a direction and the other DNA strand in the opposite direction).

Definition 15. The complementary map on a trinucleotide is naturally extended to a trinucleotide code as follows:

Definition 16. The circular permutation genetic map : permutes circularly a trinucleotide , , as follows:

Example 17. .

Definition 18. The circular permutation map on a trinucleotide is naturally extended to a trinucleotide code as follows:

Notation 3. The th iterate of is denoted by .

Remark 19. The trinucleotide codes and are the conjugated classes of the trinucleotide code .

Definition 20. A trinucleotide circular code is self-complementary if, for each , .

Notation 4. A self-complementary trinucleotide circular code is noted .

Remark 21. A -trinucleotide circular code for odd cannot be self-complementary.

Definition 22. A trinucleotide circular code is if , , and are trinucleotide circular codes.

Notation 5. A trinucleotide circular code is noted .

Definition 23. A trinucleotide circular code is self-complementary maximum if is maximum, (self-complementary), and and are trinucleotide circular codes satisfying .

Notation 6. A self-complementary maximum circular code is noted .

The set of trinucleotides identified in the gene populations of both eukaryotes and prokaryotes is a self-complementary maximum circular code [8]; that is, is maximum, , , and are trinucleotide circular codes, and .

We recall three classical evolution genetic maps: transition and transversions I and II, for example, [22] and extend their definitions to the positions of a trinucleotide.

Definition 24. The transition evolution genetic map : is defined by

Definition 25. The transition map on a letter can be applied in different positions of a trinucleotide : , , is the transition on the position of , , with , is the transition on the two positions and of , and is the transition on the three positions of .

Example 26. , , , , , , and .

Definition 27. The transition maps ,, on a trinucleotide are also extended to a trinucleotide code , in a similar way to the genetic maps and .

Definition 28. The transversion I evolution genetic map : is defined by

Definition 29. The transversion I map on a letter can also be applied in different positions of a trinucleotide : , , is the transversion I on the position of , , with , is the transversion I on the two positions and of , and is the transversion I on the three positions of .

Example 30. , , , , , , and .

Definition 31. The transversion I maps ,, on a trinucleotide are also extended to a trinucleotide code , in a similar way to the genetic maps and .

Definition 32. The transversion II evolution genetic map : is defined by

Definition 33. The transversion II map on a letter can also be applied in different positions of a trinucleotide : , , is the transversion II on the position of , , with , is the transversion II on the two positions and of , and is the transversion II on the three positions of .

Example 34. , , , , , , and .

Definition 35. The transversion II maps ,, on a trinucleotide are also extended to a trinucleotide code , in a similar way to the genetic maps and .

Definition 36. The evolution genetic maps in trinucleotides of a trinucleotide circular code are defined by for transition, for transversion I, and for transversion II.

3. Results

An evolution genetic map, that is, , , and , in trinucleotides of the common trinucleotide circular code leads to trinucleotide codes which are potentially circular. Table 1 gives these numbers .

Based on Proposition 6 allowing to test if a trinucleotide code is circular or not (algorithm not detailed, see, e.g., [2]), computer analyses of a great number of trinucleotide codes allow to identify here new properties with the common trinucleotide circular code observed in genes under evolution by transition and transversion.

3.1. Transition Map
3.1.1. Transition Map

Result 1 (Table 2). For As expected, the lists of trinucleotide circular codes associated with and are different for (not shown). No trinucleotide code is circular after a certain number of transitions in the trinucleotides of the common trinucleotide circular code . Precisely, for and for The transition generates a maximum number of trinucleotide circular codes for and a maximum number of self-complementary maximum circular codes for

3.1.2. Transition Map

Result 2 (Table 3). For The lists of trinucleotide circular codes associated with and are different for (not shown). No trinucleotide code is circular after a certain number of transitions in the trinucleotides of the common trinucleotide circular code . Precisely, for and for The transition generates a maximum number of trinucleotide circular codes for and a maximum number of self-complementary maximum circular codes for The numbers of circular codes have a particular growth function

3.1.3. Transition Map

Result 3 (Table 4). The transition always generates trinucleotide circular codes. Indeed, for The lists of trinucleotide circular codes associated with and are different for (not shown). The transition generates a maximum number of trinucleotide circular codes for and a maximum number of self-complementary maximum circular codes for

3.2. Transversion I Map
3.2.1. Transversion I Map

Result 4 (Table 5). For The lists of trinucleotide circular codes associated with and are different for (not shown). No trinucleotide code is circular after a certain number of transversions I in the trinucleotides of the common trinucleotide circular code . Precisely, for and for The transversion I generates a maximum number of trinucleotide circular codes for and a maximum number of self-complementary maximum circular codes for A remarkable code property only found with transversion I is, for , and furthermore, after a detailed computer analysis, the lists of trinucleotide circular codes and associated with and , respectively, are identical for .

3.2.2. Transversion I Map

Result 5 (Table 6). For The lists of trinucleotide circular codes associated with and are different for (not shown). No trinucleotide code is circular after a certain number of transversions I in the trinucleotides of the common trinucleotide circular code . Precisely, for and for The transversion I generates a maximum number of trinucleotide circular codes for and a maximum number of self-complementary maximum circular codes for The numbers of self-complementary circular codes have a particular growth function The numbers of circular codes have a particular growth function

3.2.3. Transversion I Map

Result 6 (Table 7). The transversion I always generates trinucleotide circular codes. Indeed, for The lists of trinucleotide circular codes associated with and are different for (not shown). The transversion I generates a maximum number of trinucleotide circular codes for and a maximum number of self-complementary maximum circular codes for

3.3. Transversion II Map
3.3.1. Transversion II Map

Result 7 (Table 8). For The lists of trinucleotide circular codes associated with and are different for (not shown). No trinucleotide code is circular after a certain number of transversions II in the trinucleotides of the common trinucleotide circular code . Precisely, for and for The transversion II generates a maximum number of trinucleotide circular codes for and a maximum number of self-complementary maximum circular codes for

3.3.2. Transversion II Map

Result 8 (Table 9). For The lists of trinucleotide circular codes associated with and are different for (not shown). The distribution of trinucleotide codes which are not circular under transversions II in the trinucleotides of the common trinucleotide circular code is very unusual. Indeed, for and for The transversion II generates a maximum number of trinucleotide circular codes for and a maximum number of self-complementary maximum circular codes for The numbers of circular codes have a particular growth function

3.3.3. Transversion II Map

Proposition 37. For and obviously, by letter invariance, as in Tables 4 and 7.

Proof. The common trinucleotide circular code can be partitioned according to the maps , , and as shown in Table 10.
Let a partition , , composed of two trinucleotides . For , any transversion II of a trinucleotide generates a trinucleotide which is a permuted trinucleotide of the other trinucleotide . So, any transversion II of a trinucleotide leads to a trinucleotide code which is not circular. For , the proof needs a computer analysis of the necklace for the nontrivial cases when two transversions II occur with two trinucleotides in the same partitions.

Remark 38. Very surprisingly, for the three maps of transition, transversions I and II, , , and , , , , and , with and (not for and ), the numbers of self-complementary maximum circular codes for the first even values of follow a series of binomial coefficients. For , , and , , , , and , with , the numbers of maximum circular codes for the first even values of follow a series of binomial coefficients. For , the numbers of circular codes for the values and with follow a series of binomial coefficients. These binomial properties with some numbers of circular codes for the three maps of transition, transversions I and II have no combinatorial explanation so far.

4. Conclusion

A comprehensive computer analysis of transition and transversions I and II in the self-complementary maximum circular code shows some new results; in particular (i) transversion I on the 2nd position of any subset of trinucleotides of generates trinucleotide circular codes which are always and (ii) transversion II on the three positions of any subset of trinucleotides of yields no trinucleotide circular codes. In addition to the classical self-complementary (Definition 20) partition of known since 1996, a new partition of based on the transversion II map (Definition 33) and the circular permutation maps and (Definition 18) is also identified here. These results here extend our theory of circular code in genes to its evolution under transition and transversion.