International Scholarly Research Notices

Volume 2014 (2014), Article ID 350276, 6 pages

http://dx.doi.org/10.1155/2014/350276

## Analysis of Unweighted Amino Acids Network

Department of Mathematics, Dibrugarh University, Dibrugarh, Assam 786004, India

Received 29 August 2014; Revised 26 November 2014; Accepted 27 November 2014; Published 16 December 2014

Academic Editor: Giovanni Colonna

Copyright © 2014 Adil Akhtar and Tazid Ali. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The analysis of amino acids network is very important to studying the various physicochemical properties of amino acids. In this paper we consider the amino acid network based on mutation of the codons. To analyze the relative importance of the amino acids we have discussed different measures of centrality. The measure of centrality is a powerful tool of graph theory for ranking the vertices and analysis of biological network. We have also investigated the correlation coefficients between various measures of centrality. Also we have discussed clustering coefficient as well as average clustering coefficient of the network. Finally we have discussed the degree of distribution as well as skewness.

#### 1. Introduction

Amino acids are the building blocks of proteins. Each protein is formed by a linear chain of amino acids. There are 20 different amino acids being found till now that occur in proteins. Each amino acid is a triplet code of four possible bases. A sequence of three bases forms a unit called codon. A codon specifies one amino acid. The genetic code is a series of codons that specify which amino acids are required to make up specific protein. As there are four bases, (Adenine (A), Cytosine (C), Guanine (G), or Thymine (T/U)) this gives us 64 codons. Out of these 64, the three triplets UAA, UAG, and UGA are known as stop codons or nonsense codons and their role is to stop the biosynthesis. The codon AUG codes for the initiation of the translation process and is therefore also known as start codon. Also a codon can be changed in several ways; such change is known as mutation. There are various types of mutation like substitution, insertion, deletion, frameshift, and so forth. In this paper we have considered one-point mutation of all possible bases. To discuss relative importance or significance of amino acids we have investigated four centrality measures in the amino acid network. The compatibility relation of the graph is defined based on the mutation of the codon. For example the amino acid M (Methionine) is connected with K (Lysine), T (Threonine), R (Argnine), I (Isoluecine), V (Valine), L (Leucine), because all possible mutations of the base of the codon AUG (M) represent amino acids K, T, R, I, V and L. Different researchers have made many contributions in this field. Kundu [1] discussed that hydrophobic and hydrophilic network satisfy “small-world property” within protein. Also he has discussed that hydrophobic network has large average degrees of nodes than the hydrophilic network. In 2007 Aftabuddin and Kundu [2] discussed three types of networks within protein and give some idea about all three types of networks. Jiao et al. [3] discussed the weighted amino acid network based on the contact energy. They have shown that weighted amino acid network satiety is “small-world” property. Fell and Wagner [4] examined whether metabolites with highest degree may belong to the oldest part of the metabolism. Wuchty and Stadler [5] discussed various centrality measures in biological network. They concluded that the degree of vertex centrality alone is not sufficient to distinguish lethal protein from viable ones. Newman [6] discussed correlation of degree of centrality and betweenness centrality. Also Schreiber and Koschutzki [7] compared centralities for biological networks, namely, PPI network and transcriptional network. As a result of their study, it was observed that in the analysis of biological networks various centrality measures should be considered.

This paper is organized as follows. In Section 2 we define some preliminary concepts of the graph on which we operate and briefly review the various centrality measures. In Section 3 we define graph in amino acids based on mutation and discuss various centrality measures. Also we discuss the bivariate correlation between different centrality measures. In Section 4 we discuss some network parameters. In Section 5 we give the conclusion of this paper.

#### 2. Preliminary Concepts of Graph

An undirected graph consists of a finite set of vertices and a finite set of edges*.* If an edge connects two vertices and then vertices and are said to be incident with the edge and adjacent to each other. The set of all vertices which are adjacent to is called the neighborhood of . The complete graph is a graph in which each of the vertices connects to one another. A directed graph or digraph consists of a set of vertices and a set of edges such that , if each edge of the graph has a direction. A graph is called loop-free if no edge connects a vertex to itself. An adjacency matrix of a graph is a () matrix, where if and only if and otherwise. The adjacency matrix of any undirected graph is symmetric. The degree of a vertex is defined to be the number of edges having as an end point. A walk is defined as a finite alternating sequence of vertices and edges, beginning and ending with vertices, such that each edge is incident with the vertices preceding and following it. No edges appear more than once in a walk. A vertex, however, may appear more than once. In a walk beginning and ending vertices are initial and terminal vertices. A walk is closed if beginning and end vertices are the same. Also if beginning and end vertex are not the same then that walk is called open walk. A trail is a walk without repeated edges and path is a walk without repeated vertices. A shortest or geodesic path between two vertices , is a path with minimal length. A graph is connected if there exists a walk between every pair of its vertices.

##### 2.1. Centrality in Graph

In graph theory, centrality measure of a vertex represents its relative importance within the graph. A centrality is a real-valued function on the nodes of a graph. More formally a centrality is a function which assigns every vertex of a given graph a value . In the following we have discussed four most commonly used centrality measures.

###### 2.1.1. Degree of Centrality

The most simple centrality measure is degree of centrality, . It is defined as the number of nodes to which the node is directly connected. The nodes directly connected to a given node are also called first neighbors of the given node. Degree centrality shows that an important node is involved in a large number of interactions. This interaction gives the immediate importance or risk of the node in the corresponding network. Mathematically it is defined as However in real world problem the degree of centrality is not an actual measurement for finding importance or risk of a node. In real situation an important node may be connected indirectly with other nodes.

###### 2.1.2. Eigenvector Centrality

Another important measure of centrality is eigenvector centrality [8]. An eigenvalue of a square matrix is a value for which , where is the identity matrix of the same order as . Eigenvector centrality is defined as the principal eigenvector of the adjacency matrix of corresponding graph.

In matrix-vector notation we can write where is the adjacency matrix of the graph, is a constant (the eigenvalue), and is the eigenvector. In general, there will be different eigenvalues for which an eigenvector solution exists. However eigenvector of the greatest eigenvalue is the eigenvector centrality [8]. Eigenvector centrality gives the direct as well as indirect importance of a node in a network.

###### 2.1.3. Closeness Centrality

The closeness centrality is the idea how a vertex is close to all other vertices not only to the first neighbor but also in global scale. Generally a vertex is central; then it is close to all other vertices. If a vertex is close to other vertices, then it can quickly interact with all other vertices. In general closeness centrality is defined as the inverse of the sum of the shortest path distances between each node and every other node in the network [9]. The closeness centrality of a node depicts an important node that can easily reach or communicate with other nodes of the network. Mathematically it is defined as where is the number of vertices of the network and is the shortest path distance between the pair of vertices and . From the above definition it is clear that if a node has minimum cumulative shortest path distance, then that node has maximum closeness centrality. And maximum closeness centrality node is very well connected to all other nodes.

###### 2.1.4. Betweenness Centrality

Another well-known centrality measure is the betweenness centrality [9]. Betweenness centrality interactions between two nonadjacent nodes depend on the other node, generally on those on the paths between the two. The betweenness centrality of a node is the number of shortest paths going through . Mathematically it is defined as where is the number of shortest paths from vertex to and is the number of shortest paths from to that pass through . Betweenness centrality depicts identifying nodes that make most information flow of the network. An important node will lie on a large number of paths between other nodes in the network. From this node we can control the information of the network. Without these nodes, there would be no way for two neighbors to communicate with each other. In general the high degree node has high betweenness centrality because many of the shortest paths may pass through that node. However a high betweenness centrality node need not always be high degree node.

#### 3. Graph of Amino Acids

Every codon codes unique amino acids. A one-point mutation of a codon may or may not change the corresponding coded amino acid. All one-point mutations of a codon give nine more codons. Some of these nine codons will code for the same amino acid(s) other than the original one. In some sense the nine mutants can be termed near or close to the original one. In the language of topology these codons can be termed vicinity of the original codon. In other words they are related to the original one. Since any mutation has its reverse mutation, this relation is bidirectional. This nearness relation or affinity is naturally carried over to the amino acids. Thus in the amino acids we have a binary relation which generated an undirected graph. Thus in our amino acid graph the vertex set is the set of amino acids and two amino acids and are linked/connected by an edge if one-point mutation of a codon coding codes for . Thus two amino acids connected by an edge can be interpreted as having affinity towards each other in the sense that one may evolve from the other. Thus the amino acid graph gives a picture of the evolution of the amino acids. We will call it the evolutionary graph of amino acids. The corresponding graph is depicted in Figure 1.