BioMed Research International

Volume 2015, Article ID 124537, 10 pages

http://dx.doi.org/10.1155/2015/124537

## Gene Knockout Identification Using an Extension of Bees Hill Flux Balance Analysis

^{1}Artificial Intelligence and Bioinformatics Research Group, Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia^{2}Department of Electronics, Information and Communication Engineering, Osaka Institute of Technology, Osaka 535-8585, Japan^{3}Biomedical Research Institute of Salamanca/BISITE Research Group, University of Salamanca, 37008 Salamanca, Spain

Received 21 August 2014; Revised 22 October 2014; Accepted 31 October 2014

Academic Editor: Juan F. De Paz

Copyright © 2015 Yee Wen Choon et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Microbial strain optimisation for the overproduction of a desired phenotype has been a popular topic in recent years. Gene knockout is a genetic engineering technique that can modify the metabolism of microbial cells to obtain desirable phenotypes. Optimisation algorithms have been developed to identify the effects of gene knockout. However, the complexities of metabolic networks have made the process of identifying the effects of genetic modification on desirable phenotypes challenging. Furthermore, a vast number of reactions in cellular metabolism often lead to a combinatorial problem in obtaining optimal gene knockout. The computational time increases exponentially as the size of the problem increases. This work reports an extension of Bees Hill Flux Balance Analysis (BHFBA) to identify optimal gene knockouts to maximise the production yield of desired phenotypes while sustaining the growth rate. This proposed method functions by integrating OptKnock into BHFBA for validating the results automatically. The results show that the extension of BHFBA is suitable, reliable, and applicable in predicting gene knockout. Through several experiments conducted on *Escherichia coli, Bacillus subtilis*, and *Clostridium thermocellum* as model organisms, extension of BHFBA has shown better performance in terms of computational time, stability, growth rate, and production yield of desired phenotypes.

#### 1. Introduction

The rapid development of genetic manipulation techniques has made the alteration of microorganisms for different purposes popular in recent years. Genetic manipulation of microorganisms aims to increase the yields of biocompounds or decrease the production of by-products [1]. The process of developing computational models to simulate the actual processes inside cells is growing rapidly because the models are of central importance to the investigation of general biological functions and applications in the area of biomedicine and biotechnology [2]. In nature, microorganisms evolve by optimising their growth rather than by overproducing specific chemical compounds due to metabolic responses to the history of selective pressures. Hence, retrofitting cellular metabolism is essential to economically developing high-yield cellular production systems. However, data ambiguity due to the complexities of the metabolic networks makes the effects of genetic modification on the desirable phenotypes difficult to predict. Furthermore, the huge number of reactions performed in the course of cellular metabolism often leads to a combinatorial problem in obtaining optimal gene knockout due to the large solution space [3]. The computational time increases exponentially as the size of the problem increases. As mentioned by de Paz et al., the use of computational methods is essential. One of the possible applications is in the use of Artificial Intelligence techniques [4]. In recent years, rational design principles based on genetic engineering have been implemented to retrofit microbial metabolism, a process that is widely known as metabolic engineering. In metabolic engineering, the main objective is to increase target metabolite production through genetic engineering. Gene knockout is one of the most common genetic engineering techniques in which one of an organism’s genes is made inoperative. To date, this technology has been successfully applied in many organisms, from unicellular eukaryotes to mammals, including human cells.

Computational algorithms have been developed to identify the gene knockout to obtain improved phenotypes. Burgard et al. developed the first rational modelling framework (known as OptKnock) for introducing a gene knockout, leading to the overproduction of a desired metabolite [5]. OptKnock functions by identifying a set of gene (reaction) deletions to maximise the flux of a desired metabolite without affecting the operation of the internal flux distribution so that growth or another objective function is optimised.

OptKnock uses mixed integer linear programming (MILP) to formulate a bilevel linear optimisation that is a promising method of finding the global optimal solution. OptGene is an extended approach of OptKnock, which formulates the in silico design problem using a Genetic Algorithm (GA) [6]. Metaheuristic methods are capable of producing near-optimal solutions with reasonable computation time. Furthermore, the objective function that can be optimised is flexible. OptGene is developed in two representation schemes: binary or integer. The binary representation is more complex and produces solutions with a larger number of knockouts even though it is closer to the natural evolution of microbial genomes. Although the integer representation results in a more compact genome, it still encounters problems as it needs to define the number of gene knockouts a priori [7]. Hence, Rocha et al. proposed two optimisation algorithms, Simulated Annealing (SA), and Set-based Evolutionary Algorithms (SEAs), to allow the automatic determination of the best number of gene deletions to achieve a given productivity goal. Still, these methods do not guarantee to reach optimal solutions due to their stochastic nature [8]. The computational algorithms discussed in this paper are based on constraint-based models. According to Egen and Lun, to date, more than 50 organism-specific genome-scale models have been developed and used in various applications, and it is believed that constraint-based models can produce more accurate predictions [9].

A hybrid of BA and FBA (BAFBA) was proposed by Choon et al. [10]. BAFBA showed better performance in predicting optimal gene knockout in terms of growth rate and production yield. The concept of BAFBA is based on Bees Algorithm (BA) introduced by Pham et al. [11]. BA is a typical meta-heuristic optimisation approach, which has been applied to various problems, such as controller formation [12], image analysis [13], and job multiobjective optimisation [14]. The concept of BA is based on the intelligent behaviour of honeybees. It locates the most promising solutions and selectively explores their neighbourhoods looking for the global maximum of the objective function. BA is efficient in solving optimisation problems, according to previous studies. Nevertheless, BA is relatively weak in local search activities due to its dependency on random search [15]. BHFBA, a hybrid of Hill climbing and the neighbourhood searching strategy of BAFBA, was proposed to improve the performance of BAFBA by using the Hill climbing algorithm as a promising algorithm in finding the local optimum [16]. In this paper, we propose an extension of BHFBA by integrating OptKnock into BHFBA for validating the results automatically. This paper shows that the extension of BHFBA is not only capable of solving large problems in short computational time but also improves the performance in predicting optimal gene knockout. We also present the results obtained by extension of BHFBA in four case studies, with* E. coli* (*Escherichia coli*)* i*JR904,* B. subtilis* (*Bacillus subtilis*), and* C. thermocellum* (*Clostridium thermocellum*) as the target microorganisms. In addition, we conducted a benchmarking to test the performance of the hybrid Bee algorithm and Hill Climbing algorithm.

This paper is organised as follows. First, the materials and experimental setup are described. Then, the problem formulation is introduced, and the details of the BAFBA and the extension of BHFBA are described. Next, experimental results are presented. Then, the obtained results are discussed, reviewing the contributions of this work. Finally, this paper is summarised by providing the main conclusion and addresses future developments.

#### 2. Materials and Methods

##### 2.1. Materials

In this study, we used* E. coli*,* B. subtilis*, and* C. thermocellum* models to test the operation of the extension of BHFBA.* E. coli i*JR904 (http://bigg.ucsd.edu/) was used to test the operation of BAFBA [17]. The* E. coli* model contains 904 genes, 931 unique biochemical reactions, and 761 metabolites. We used* E. coli i*JR904 in this work to test the reliability of BHFBA because this model was used in previous studies [5, 6, 10]. This model is preprocessed through several steps based on biological assumptions and computational approaches before it was applied. This results in the reduction of the size of the model to 667 reactions. The second model is* B. subtilis i*Bsu1103 [18] (http://genomebiology.com/content/supplementary/gb-2009-10-6-r69-s4.xml), which includes 1437 reactions associated with 1103 genes. We preprocessed this model to reduce the size to 763 reactions. The last model is* C. thermocellum* (ATCC 27405) iSR432 model [19] (http://www.biomedcentral.com/content/supplementary/1752-0509-4-31-s3.xml), which contains 577 reactions, representing the function of 432 genes. The preprocessing of this model reduced the size to 351 reactions. The growth rate and BPCY were used in this work. The unit for growth rate is hour^{−1}, while the unit for BPCY is milligram (gram-glucose.hour)^{−1}.

We compared the results with those of previous reports in the literature [5, 6, 10]. The experiments were conducted on a 2.3 GHz Intel Core i7 processor and 8 GB RAM workstation. We carry out 100 individual runs in the experiment to test the efficiency of BHFBA, and the result shown is the best result among the runs.

##### 2.2. Method

###### 2.2.1. Problem Formulation

The problem of identifying optimal gene knockout from biological models can be formulated as follows. Suppose that a model that contains the stoichiometric matrix provides the linear relationship of the model between the flux rates of the reactions and the derivatives of the reactant concentrations . The matrix is a constant, while the flux vector is a variable. Assume that there are reactants and reactions between them.

Flux vector:Concentration vector:Dynamic mass balance equation:where represents the time.

The chemical elements, ionic charge, and biochemical moieties must be balanced in the stoichiometric matrix. The objective is to find the optimal gene knockout to improve the product yields of industrially important chemicals while sustaining the growth rate of the microorganism. This is commonly performed using linear programming, defined as follows:where represents the vector of fluxes and is the stoichiometric matrix. The expression to be maximised or minimised is known as the objective function, where is a vector of weights, indicating how much each reaction contributes to the objective function. The inequalities of the lower bound and upper bound define the maximal rates of flux for every reaction corresponding to the columns of the stoichiometric matrix.

###### 2.2.2. A Hybrid of BA and FBA (BAFBA)

Figure 1 shows the flow of the BAFBA. The BAFBA is initialised by mimicking a population of bees. In identifying gene knockout, a bee is represented by a binary variable to indicate the absence or the presence of genes in the reaction. In this study, the BAFBA is started with the bees being placed randomly in the search space. The fitness of the sites visited by the bees is evaluated using the FBA. Bees with the highest fitness would be denoted as “selected bees” and the sites they visited would be chosen for a neighbourhood search. A small amount of “selected bees” was expected to encourage local exploitation. After many tests, we found that an appropriate maximum “selected bees” was (1/4) × . We chose and limited the amount of selected bees within the range [1, (1/4) × ] to prevent the selection of too many sites for a neighbourhood search. Each bee was required to go through this repetitive local search neighbourhood procedure until the best possible answer was obtained. Meanwhile, the remaining bees were assigned randomly to search for new potential solutions.