Computational Biology Journal

Volume 2013 (2013), Article ID 795418, 10 pages

http://dx.doi.org/10.1155/2013/795418

## Transition and Transversion on the Common Trinucleotide Circular Code

Equipe de Bioinformatique Théorique, ICube, Université de Strasbourg, CNRS, 300 boulevard Sébastien Brant, 67400 Illkirch, France

Received 15 February 2013; Accepted 22 April 2013

Academic Editor: Alessandra Lumini

Copyright © 2013 Emmanuel Benard and Christian J. Michel. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In 1996, a trinucleotide circular code which is maximum, self-complementary, and , called , was identified statistically on a large gene population of eukaryotes and prokaryotes (Arquès and Michel (1996)). Transition and transversions I and II are classical molecular evolution processes. A comprehensive computer analysis of these three evolution processes in the code shows some new results; in particular (i) transversion I on the 2nd position of any subset of trinucleotides of generates trinucleotide circular codes which are always and (ii) transversion II on the three positions of any subset of trinucleotides of yields no trinucleotide circular codes. These new results extend our theory of circular code in genes to its evolution under transition and transversion.

#### 1. Introduction

We continue our study of properties of trinucleotide circular codes [1–5], trinucleotide comma-free codes [1, 6], strong trinucleotide circular codes [7], and the common trinucleotide circular code identified in genes [8] (see also the recent statistical analysis by [9]) which could be a translation code [10]. A trinucleotide is a word of three letters (triletter) on the genetic alphabet . The set of trinucleotides is a code (called genetic code), more precisely a uniform code but not a circular code (see Remark 2). In the past years, codes, comma-free codes, and circular codes have been mathematical objects studied in theoretical biology, mainly to understand the structure and the origin of the genetic code as well as the reading frame (construction) of genes, for example, [11–13]. In order to have an intuitive meaning of these notions, codes are written on a straight line while comma-free codes and circular codes are written on a circle, but in both cases, unique decipherability is required. Circular codes only belong to some subsets of the trinucleotide set while comma-free codes are even more constrained subsets of circular codes [1].

Before the discovery of the genetic code, Crick et al. [11] proposed a maximum comma-free code of trinucleotides for coding the amino acids. This comma-free code turned out to be invalid (see, e.g., [14]). In , a maximum circular code of trinucleotides was identified statistically on a large gene population of eukaryotes and also on a large gene population of prokaryotes [8] This code has remarkable mathematical properties as it is a self-complementary maximum circular code (see the following). Since , its properties have been studied in detail by different authors, for example, [9, 15–21]. Transition and transversions I and II are classical molecular evolution processes, for example, [22]. By using an algorithm based on the necklace, we perform here a comprehensive computer analysis of these three evolution processes in the code . Some new results are identified with the code by computer analysis; in particular (i) transversion I on the 2nd position of any subset of trinucleotides of generates trinucleotide circular codes which are always and (ii) transversion II on the three positions of any subset of trinucleotides of yields no trinucleotide circular codes.

#### 2. Preliminaries

The classical notions of language theory and codes can be found in [23, 24]. Let denote the genetic alphabet, lexicographically ordered by . The set of words (nonempty words, resp.) on is denoted by (, resp.). The set of the words of length (dinucleotides or diletters) on is denoted by . The set of the words of length (trinucleotides or triletters) on is denoted by .

*Definition 1. *A subset is a code on if for each , the condition implies and for .

*Remark 2. * is a code.

Any nonempty subset of is a code called here trinucleotide code.

*Definition 3. *A trinucleotide code is circular if, for each , the conditions and imply , (empty word) and for .

*Notation 1. *A trinucleotide circular code is noted .

*Remark 4. * is not a trinucleotide circular code.

Let be letters in , diletters in , and an integer satisfying .

*Definition 5. *We say that the ordered sequence is an (Letter Diletter Continued Necklace) for a subset if

Only a few trinucleotide codes are circular. Two propositions based on the necklace concept allow to determine if a trinucleotide code is circular or not [2, 18].

Proposition 6 (see [18]). * Let be a trinucleotide code. The following conditions are equivalent:*(i)*is a trinucleotide circular code*;(ii)*has no *.

*Definition 7. *We say that the ordered sequence is an (Letter Diletter Continued Closed Necklace) for a subset if

Proposition 8 (see [2]). * Let be a trinucleotide code. The following conditions are equivalent:*(i)*is a trinucleotide circular code*;(ii)*has no **for any integer *.

*Definition 9. *A trinucleotide circular code is maximal if, for each , , is not a trinucleotide circular code.

*Definition 10. *A trinucleotide circular code containing exactly elements is called a -trinucleotide circular code.

*Definition 11. *A -trinucleotide circular code is maximum as no trinucleotide circular code can contain more than words.

*Notation 2. *A maximum trinucleotide circular code is noted .

*Remark 12. *A -trinucleotide circular code is both maximal and maximum.

We recall two classical genetic maps: complementary and circular permutation.

*Definition 13. *The complementary genetic map : is defined by
and for all by

*Example 14. *. This map is associated with the property of the complementary and antiparallel double helix (one DNA strand chemically oriented in a direction and the other DNA strand in the opposite direction).

*Definition 15. *The complementary map on a trinucleotide is naturally extended to a trinucleotide code as follows:

*Definition 16. *The circular permutation genetic map : permutes circularly a trinucleotide , , as follows:

*Example 17. *.

*Definition 18. *The circular permutation map on a trinucleotide is naturally extended to a trinucleotide code as follows:

*Notation 3. *The th iterate of is denoted by .

*Remark 19. *The trinucleotide codes and are the conjugated classes of the trinucleotide code .

*Definition 20. *A trinucleotide circular code is self-complementary if, for each , .

*Notation 4. *A self-complementary trinucleotide circular code is noted .

*Remark 21. *A -trinucleotide circular code for odd cannot be self-complementary.

*Definition 22. *A trinucleotide circular code is if , , and are trinucleotide circular codes.

*Notation 5. *A trinucleotide circular code is noted .

*Definition 23. *A trinucleotide circular code is self-complementary maximum if is maximum, (self-complementary), and and are trinucleotide circular codes satisfying .

*Notation 6. *A self-complementary maximum circular code is noted .

The set of trinucleotides identified in the gene populations of both eukaryotes and prokaryotes is a self-complementary maximum circular code [8]; that is, is maximum, , , and are trinucleotide circular codes, and .

We recall three classical evolution genetic maps: transition and transversions I and II, for example, [22] and extend their definitions to the positions of a trinucleotide.

*Definition 24. *The transition evolution genetic map : is defined by

*Definition 25. *The transition map on a letter can be applied in different positions of a trinucleotide : , , is the transition on the position of , , with , is the transition on the two positions and of , and is the transition on the three positions of .

*Example 26. *, , , , , , and .

*Definition 27. *The transition maps ,, on a trinucleotide are also extended to a trinucleotide code , in a similar way to the genetic maps and .

*Definition 28. *The transversion I evolution genetic map : is defined by

*Definition 29. *The transversion I map on a letter can also be applied in different positions of a trinucleotide : , , is the transversion I on the position of , , with , is the transversion I on the two positions and of , and is the transversion I on the three positions of .

*Example 30. *, , , , , , and .

*Definition 31. *The transversion I maps ,, on a trinucleotide are also extended to a trinucleotide code , in a similar way to the genetic maps and .

*Definition 32. *The transversion II evolution genetic map : is defined by

*Definition 33. *The transversion II map on a letter can also be applied in different positions of a trinucleotide : , , is the transversion II on the position of , , with , is the transversion II on the two positions and of , and is the transversion II on the three positions of .

*Example 34. *, , , , , , and .

*Definition 35. *The transversion II maps ,, on a trinucleotide are also extended to a trinucleotide code , in a similar way to the genetic maps and .

*Definition 36. *The evolution genetic maps in trinucleotides of a trinucleotide circular code are defined by for transition, for transversion I, and for transversion II.

#### 3. Results

An evolution genetic map, that is, , , and , in trinucleotides of the common trinucleotide circular code leads to trinucleotide codes which are potentially circular. Table 1 gives these numbers .

Based on Proposition 6 allowing to test if a trinucleotide code is circular or not (algorithm not detailed, see, e.g., [2]), computer analyses of a great number of trinucleotide codes allow to identify here new properties with the common trinucleotide circular code observed in genes under evolution by transition and transversion.

##### 3.1. Transition Map

###### 3.1.1. Transition Map

*Result 1 (Table 2). *For
As expected, the lists of trinucleotide circular codes associated with and are different for (not shown). No trinucleotide code is circular after a certain number of transitions in the trinucleotides of the common trinucleotide circular code . Precisely, for
and for
The transition generates a maximum number of trinucleotide circular codes for
and a maximum number of self-complementary maximum circular codes for

###### 3.1.2. Transition Map

*Result 2 (Table 3). *For
The lists of trinucleotide circular codes associated with and are different for (not shown). No trinucleotide code is circular after a certain number of transitions in the trinucleotides of the common trinucleotide circular code . Precisely, for
and for
The transition generates a maximum number of trinucleotide circular codes for
and a maximum number of self-complementary maximum circular codes for
The numbers of circular codes have a particular growth function

###### 3.1.3. Transition Map

*Result 3 (Table 4). *The transition always generates trinucleotide circular codes. Indeed, for
The lists of trinucleotide circular codes associated with and are different for (not shown). The transition generates a maximum number of trinucleotide circular codes for
and a maximum number of self-complementary maximum circular codes for

##### 3.2. Transversion I Map

###### 3.2.1. Transversion I Map

*Result 4 (Table 5). *For
The lists of trinucleotide circular codes associated with and are different for (not shown). No trinucleotide code is circular after a certain number of transversions I in the trinucleotides of the common trinucleotide circular code . Precisely, for
and for
The transversion I generates a maximum number of trinucleotide circular codes for
and a maximum number of self-complementary maximum circular codes for
A remarkable code property only found with transversion I is, for ,
and furthermore, after a detailed computer analysis, the lists of trinucleotide circular codes and associated with and , respectively, are identical for .

###### 3.2.2. Transversion I Map

*Result 5 (Table 6). * For
The lists of trinucleotide circular codes associated with and are different for (not shown). No trinucleotide code is circular after a certain number of transversions I in the trinucleotides of the common trinucleotide circular code . Precisely, for
and for
The transversion I generates a maximum number of trinucleotide circular codes for
and a maximum number of self-complementary maximum circular codes for
The numbers of self-complementary circular codes have a particular growth function
The numbers of circular codes have a particular growth function

###### 3.2.3. Transversion I Map

*Result 6 (Table 7). *The transversion I always generates trinucleotide circular codes. Indeed, for
The lists of trinucleotide circular codes associated with and are different for (not shown). The transversion I generates a maximum number of trinucleotide circular codes for
and a maximum number of self-complementary maximum circular codes for

##### 3.3. Transversion II Map

###### 3.3.1. Transversion II Map

*Result 7 (Table 8). *For
The lists of trinucleotide circular codes associated with and are different for (not shown). No trinucleotide code is circular after a certain number of transversions II in the trinucleotides of the common trinucleotide circular code . Precisely, for
and for
The transversion II generates a maximum number of trinucleotide circular codes for
and a maximum number of self-complementary maximum circular codes for

###### 3.3.2. Transversion II Map

*Result 8 (Table 9). *For
The lists of trinucleotide circular codes associated with and are different for (not shown). The distribution of trinucleotide codes which are not circular under transversions II in the trinucleotides of the common trinucleotide circular code is very unusual. Indeed, for
and for
The transversion II generates a maximum number of trinucleotide circular codes for
and a maximum number of self-complementary maximum circular codes for
The numbers of circular codes have a particular growth function

###### 3.3.3. Transversion II Map

Proposition 37. *For **
and obviously, by letter invariance, as in Tables 4 and 7. *

*Proof. *The common trinucleotide circular code can be partitioned according to the maps , , and as shown in Table 10.

Let a partition , , composed of two trinucleotides . For , any transversion II of a trinucleotide generates a trinucleotide which is a permuted trinucleotide of the other trinucleotide . So, any transversion II of a trinucleotide leads to a trinucleotide code which is not circular. For , the proof needs a computer analysis of the necklace for the nontrivial cases when two transversions II occur with two trinucleotides in the same partitions.

*Remark 38. *Very surprisingly, for the three maps of transition, transversions I and II, , , and , , , , and , with and (not for and ), the numbers of self-complementary maximum circular codes for the first even values of follow a series of binomial coefficients. For , , and , , , , and , with , the numbers of maximum circular codes for the first even values of follow a series of binomial coefficients. For , the numbers of circular codes for the values and with follow a series of binomial coefficients. These binomial properties with some numbers of circular codes for the three maps of transition, transversions I and II have no combinatorial explanation so far.

#### 4. Conclusion

A comprehensive computer analysis of transition and transversions I and II in the self-complementary maximum circular code shows some new results; in particular (i) transversion I on the 2nd position of any subset of trinucleotides of generates trinucleotide circular codes which are always and (ii) transversion II on the three positions of any subset of trinucleotides of yields no trinucleotide circular codes. In addition to the classical self-complementary (Definition 20) partition of known since 1996, a new partition of based on the transversion II map (Definition 33) and the circular permutation maps and (Definition 18) is also identified here. These results here extend our theory of circular code in genes to its evolution under transition and transversion.

#### References

- C. J. Michel, G. Pirillo, and M. A. Pirillo, “A relation between trinucleotide comma-free codes and trinucleotide circular codes,”
*Theoretical Computer Science*, vol. 401, no. 1–3, pp. 17–26, 2008. View at Publisher · View at Google Scholar · View at Scopus - C. J. Michel and G. Pirillo, “Identification of all trinucleotide circular codes,”
*Computational Biology and Chemistry*, vol. 34, no. 2, pp. 122–125, 2010. View at Publisher · View at Google Scholar · View at Scopus - L. Bussoli, C. J. Michel, and G. Pirillo, “On some forbidden configurations for self-complementary trinucleotide circular codes,”
*Journal for Algebra and Number Theory Academia*, vol. 2, pp. 223–232, 2011. View at Google Scholar - L. Bussoli, C. J. Michel, and G. Pirillo, “On conjugation partitions of sets of trinucleotides,”
*Applied Mathematics*, vol. 3, pp. 107–112, 2012. View at Google Scholar - C. J. Michel, G. Pirillo, and M. A. Pirillo, “A classiffication of 20-trinucleotide circular codes,”
*Information and Computation*, vol. 212, pp. 55–63, 2012. View at Google Scholar - C. J. Michel, G. Pirillo, and M. A. Pirillo, “Varieties of comma-free codes,”
*Computers and Mathematics with Applications*, vol. 55, no. 5, pp. 989–996, 2008. View at Publisher · View at Google Scholar · View at Scopus - C. J. Michel and G. Pirillo, “Strong trinucleotide circular codes,”
*International Journal of Combinatorics*, vol. 2011, Article ID 659567, 14 pages, 2011. View at Publisher · View at Google Scholar - D. G. Arquès and C. J. Michel, “A complementary circular code in the protein coding genes,”
*Journal of Theoretical Biology*, vol. 182, no. 1, pp. 45–58, 1996. View at Publisher · View at Google Scholar · View at Scopus - D. L. Gonzalez, S. Giannerini, and R. Rosa, “Circular codes revisited: a statistical approach,”
*Journal of Theoretical Biology*, vol. 275, pp. 21–28, 2011. View at Google Scholar - C. J. Michel, “Circular code motifs in transfer and 16S ribosomal RNAs: a possible translation code in genes,”
*Computational Biology and Chemistry*, vol. 37, pp. 24–37, 2012. View at Google Scholar - F. H. C. Crick, J. S. Griffith, and L. E. Orgel, “Codes without commas,”
*Proceedings of the National Academy of Sciences*, vol. 43, pp. 416–421, 1957. View at Google Scholar - S. W. Golomb, B. Gordon, and L. R. Welch, “Comma-free codes,”
*Canadian Journal of Mathematics*, vol. 10, pp. 202–209, 1958. View at Google Scholar - S. W. Golomb, L. R. Welch, and M. Delbrück, “Construction and properties of comma-free codes,” in
*Biologiske Meddelelser*, vol. 23, Det Kongelige Danske Videnskabernes Selskab, 1958. View at Google Scholar - B. Hayes, “The Invention of the Genetic Code,”
*American Scientist*, vol. 86, no. 1, pp. 8–14, 1998. View at Google Scholar · View at Scopus - A. J. Koch and J. Lehmann, “About a symmetry of the genetic code,”
*Journal of Theoretical Biology*, vol. 189, no. 2, pp. 171–174, 1997. View at Publisher · View at Google Scholar · View at Scopus - R. Jolivet and F. Rothen, “Peculiar symmetry of DNA sequences and evidence suggesting its evolutionary origin in a primeval genetic code,” in
*Proceedings of the 1st European Workshop Exo-/Astrobiology*, P. Ehrenfreund, O. Angerer, and B. Battrick, Eds., ESA SP-496, pp. 173–176, May 2001. View at Scopus - C. Nikolaou and Y. Almirantis, “Mutually symmetric and complementary triplets: differences in their use distinguish systematically between coding and non-coding genomic sequences,”
*Journal of Theoretical Biology*, vol. 223, no. 4, pp. 477–487, 2003. View at Publisher · View at Google Scholar · View at Scopus - G. Pirillo, “A characterization for a set of trinucleotides to be a circular code,” in
*Determinism, Holism, and Complexity*, C. Pellegrini, P. Cerrai, P. Freguglia, V. Benci, and G. Israel, Eds., Kluwer, 2003. View at Google Scholar - G. Pirillo and M. A. Pirillo, “Growth function of self-complementary circular codes,”
*Rivista di Biologia*, vol. 98, no. 1, pp. 97–110, 2005. View at Google Scholar · View at Scopus - J. L. Lassez, R. A. Rossi, and A. E. Bernal, “Crick's hypothesis revisited: the existence of a universal coding frame,” in
*Proceedings of the IEEE 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW '07)*, May 2007. - G. Pirillo, “A hierarchy for circular codes,”
*RAIRO: Theoretical Informatics and Applications*, vol. 42, no. 4, pp. 717–728, 2008. View at Publisher · View at Google Scholar · View at Scopus - M. Kimura, “Estimation of evolutionary distances between homologous nucleotide sequences,”
*Proceedings of the National Academy of Sciences of the United States of America*, vol. 78, no. 1, pp. 454–458, 1981. View at Google Scholar · View at Scopus - J. Berstel and D. Perrin,
*Theory of Codes*, Academic Press, London, UK, 1985. - J. L. Lassez, “Circular codes and synchronization,”
*International Journal of Computer and Information Sciences*, vol. 5, no. 2, pp. 201–208, 1976. View at Publisher · View at Google Scholar · View at Scopus