Scalable Data Mining Algorithms in Computational Biology and Biomedicine
View this Special IssueResearch Article  Open Access
Zhongwei Li, Yuezhen Xin, Xun Wang, Beibei Sun, Shengyu Xia, Hui Li, Hu Zhu, "Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and GeneSet Based Genetic Algorithm", BioMed Research International, vol. 2016, Article ID 1358142, 7 pages, 2016. https://doi.org/10.1155/2016/1358142
Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and GeneSet Based Genetic Algorithm
Abstract
Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a geneset based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization.
1. Introduction
Phellinus is a kind of fungus having great medicinal value, since it is known as one of the elemental components in drugs with functions of avoiding cancers [1, 2]. Phellinus flavonoids are one of the most popular parasitifers of Phellinus in nature [3], and the research on Phellinus focuses on polysaccharides, proteoglycans medicinal mechanism, composition, and so forth, which are mostly extracted from the fruiting bodies of Phellinus flavonoids [4]. Phellinus rarely exists in the wild environment [5], and it becomes a promising research branch to cultivate it in the laboratory. With mycelial growth by liquid fermentation, the fermentation broth flavonoids, polysaccharides, alkaloids, and other active substances can be produced, which have high level physical activity, short fermentation period, and mass productions, thus providing a possible way of producing Phellinus in the laboratory [6]. In recent years, updated machine learning approaches (see, e.g., [7, 8]) have been developed and applied in biological data processing.
From the understanding of the wild conditions of Phellinus, it is believed that PH value, temperature, and fermentation time have effect on the productions. Also, in general biochemical experiments, we need to consider the inoculum size, initial liquid volume, seed age, and rotation speed. In the laboratory, plenty of experiments have been designed and operated for maximizing the Phellinus production. The methods can be separated into two major groups.(i)With biological technologies: it used optimum media on mycelial growth of Phellinus in [9] and liquid fermentation technology to cultivate Phellinus in [10]. Active ingredients in Phellinus and polysaccharide metabolism regulation are designed in [11].(ii)With mathematical models: some researches focus on building mathematical models for the progress of producing Phellinus by differential equations [12], metabolic path and network [13], and complex network models [14].
Artificial algorithms and models have been used in the bioprocess, particularly for the optimization of culture conditions. In [15], artificial neural network (ANN) is used to optimize the extraction process of azalea flavonoids. Neural networks combined with evolutionary algorithms have been used to optimize the experimental environment, such that neural network and particle swarm optimization method were used for finding optimized culture conditions to maximize the production of Pleuromutilin from Pleurotus mutilus in [16]. Recently, with the increment of biological data, regression analysis becomes a useful tool for the data analysis. In [17] the method of fitting models to biological data using linear and nonlinear regression is proposed, where some multivariate statistical analysis strategies from [18, 19] are formulated to be helpful and useful for biologists. These results give us hints of using regression analysis and artificial algorithms to optimize the culture conditions for Phellinus production. And, to the best of our knowledge, few work focuses on the optimization of culture conditions to maximize the production of Phellinus in the laboratory.
In this work, we start from operating 45 experiments for producing Phellinus from Phellinus flavonoids with different culture conditions, involving parameters PH value, temperature and fermentation time, inoculum size, initial liquid volume, seed age, and rotation speed. With the data collected during the experiments, we use regression analysis method to create a mathematical model, which can forecast the flavonoid yield and the most important element to the production of Phellinus. After that, a geneset based genetic algorithm (GA) is proposed to optimize the culture condition, where the obtained mathematic model is used as fitness function for the evolution of individuals. Data experimental results show that predicted optimal values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization.
2. Data Collected from Experiments
In this section, biological experiments are performed for finding optimal value of certain single factor.
In Table 1, experiments are operated for collecting data. In rows 1–14, it is associated with experiments with PH values ranging from 1 to 14, where the temperature is fixed to 28°C, initial volume is set to be 100 mL, the rotation speed is 140 r/m, and seed age is 8 days. Rows 15 to 20 are 6 experiments with initial volume ranging from 40 mL to 140 mL, where PH value is set to be 6, the best one obtained from experiments with PH values ranging from 1 to 14.

In Table 2, experiments with including inoculum ranging from 2% to 16% and temperature ranging from 25°C to 40°C are performed. In Table 3 the situations on experiments with fermentation time ranging from 1 to 12 hours are shown. From the in total 45 experiments, we collect data of culture conditions for production of Phellinus. Different culture conditions have a fundamental influence on the production of Phellinus, but the optimized culture conditions remain unknown.


3. Methods
We consider here using regression analysis and geneset based genetic algorithm to find the optimized culture conditions for maximizing the production of Phellinus. In general, we convert the data collected in Section 2 to construct a mathematical model by regression analysis. And then, the obtained model can be used as fitness function for optimizing the culture condition with geneset based genetic algorithm.
3.1. Regression Analysis
In statistical modeling, regression analysis is a statistical process for estimating the relationships among variables [20]. Regression analysis is one of the extremely versatile data analysis methods, which is appropriated to establish dependencies between variables based on observational data and widely used to analyze the data inherent law and to predict the result. Regression analysis can be divided into linear regression and nonlinear regression analysis [21], according to the type of relationship between the independent variables and dependent variables. In general, the relationship between variables is determined by the independent variables and dependent selected variables, by which regression models can be made. After that, it is used to solve the various parameters of the model based on the measured data and then evaluate whether the regression model can fit the observed data. If the model can fit the data well, then the model can be used to further predict based arguments [22]. The regression analysis is composed of the following steps [23, 24].
Regression analysis is widely used in data mining, particularly for biological data analysis in recent years, with the purpose of finding a feasible statistical law by the large amount of data of experiments. The general process is given as follows.
Step 1. Determine the variables.
Step 2. Establish the prediction model.
Step 3. Relate analysis.
Step 4. Calculate the prediction error.
Step 5. Determine the predicted value.
From the data collected in Section 2, it consists of seven independent variables and one dependent variable. The seven independent variables are inoculum size, PH values, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. And, the dependent variable is flavonoid yield. From the observation of the experiments, it is found that some culture conditions are not suitable for production of Phellinus. These data are taken as extreme data are removed from regression analysis. Extreme data refers to the data which were measured in extreme experimental environment. Also duplicate data were cancelled. Only the following data are selected in regression analysis.(i)Inoculum size 0.5%~1.2%.(ii)PH 5~7.(iii)Initial liquid volume 60~100 mL.(iv)Temperature 25~30°C.(v)Seed age 4~9 days.(vi)Fermentation time 6~12 days.(vii)Rotation speed 140~200 r/m.
After data filtering, a statistical model is made to represent these data. It is known that there is a correlation between these data relationships, so we applied linear regression analysis to fit them. At this stage, a lot of models were tested one by one with IBM SPSS software and response surface methodology. The statistical model is , where is a dependent variable of the flavonoid yield, are the seven independent variables associated with inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed, respectively, and , , and are real numbers.
Although the relationship between the data may not be linear, we can put squared term for a type of data into these data. If this term is useful it will be retained after linear regression analysis; otherwise, the data will be deleted.
In the regression analysis, it needs to focus on the values of squared and the significance of correlation coefficients for regulating the model. We use the regression analysis tools in the IBM SPSS, setting regression coefficients as estimated () and selecting the display model fit (). Set the stepping method criteria as use of probability , entry () as 0.5, and removal () as 0.10. After regression analysis, we can get the results as shown in Table 4.

It is obtained that significance = 0.006 < 0.05; that is, the regression results are obvious. squared value is , which means that the model is valid for fitting the 88% data. We get the statistical model: .
3.2. GeneSet Based Genetic Algorithm
Genetic algorithm (GA) was first proposed by J. Holland in 1975 [25, 26], whose general process is shown in Figure 1. In the mutation operation, if a short segment is selected in a mutation possibility and replaced by another segment, then the geneset based GA is achieved [27].
In geneset based GA, a chromosome is treated as a set of genesets, instead of a set of genes as in classical GAs. It starts with genesets of the largest size equal to half the chromosome length. It is most appropriate to genetics model because each geneset represents a set of adjacent parameters of certain factor of the culture conditions.
It is noted that, in the selection, only the winning individuals from the population can be selected. Select operators are also known as reclaimed operator (reproduction operator), whose purpose is to optimize the selection of individuals (or solutions) to the next generation. Population can be updated by fitness ratio method and random sampling method to traverse, local selection. Cross operator refers to the part of the structure of the two parent individuals to generate new recombinant replacing individual operation. Variation is to make GA have local random search capability. When the GA crossover neighborhood is close to the optimal solution, the use of such a mutation operator of local random search capability can accelerate the convergence to the optimal solution.
The statistical model obtained by regression analysis is used as the fitness function here, and geneset based GA is used to optimize the culture condition for maximizing the production of Phellinus. The data simulation is achieved by gatool in MATLAB. In the data experiments, we use a binary string composed of 7 segments to represent an individual in GA population, where each segment is associated with the value of one of the 7 parameters for the culture condition. Initial population size is 50, and cross rate is set to 0.8. Mutation rate is set to be 0.01, and selection method is roulette wheel selection. If the time is long enough then the GA process will halt by meeting the stopping conditions, such as generations limit or fitness limit.
After 156 iterations the geneset based GA process returns the best individual and shuts down the process in Figure 2.
After the regression analysis and GA process, an optimized culture condition is obtained, shown in Table 5.

The results obtained by our method have accordance with experimental experience in literature of Phellinus growth environmental studies. Specifically, the suitable environment is neutral acidic environment, about PH value 6. The appropriate temperature range is from 22°C to 28°C [10]. Seed age and fermentation time of species vary due to the strain [3, 28, 29]. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization.
4. Conclusion
In this work, 45 experiments are firstly operated for collecting data related to the production of Phellinus from Phellinus flavonoids. We use regression analysis method to create a mathematical model with the collected data, and then a geneset based GA is proposed to optimize the culture condition, where the obtained mathematic model is used as fitness function for the evolution of individuals. In the comparison results, it is believed that PH value is credible and the temperature is also within the appropriate temperature range. Taking into account environmental factors in the laboratory, the temperature value we predicted is also reliable. The seed age and fermentation time predicted are 9, close to the original data 8. Data experimental results show that predicted optimal values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization.
Neurallike computing models, such as artificial neural networks [30], spiking neural networks [31], and spiking neural P systems [32–34], have been successfully used in pattern recognition and engineering practice. It is of interest to use these neurallike computing models for optimizing culture conditions for Phellinus production. Our work would also guide for the “Precision Medicine” with personal SNP data [35] and other tasks in bioinformatics [21, 22].
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
The research is under the auspices of National Natural Science Foundation of China (nos. 41276135, 31172010, 61272093, 61320106005, 61402187, 61502535, 61572522, and 61572523), Program for New Century Excellent Talents in University (NCET131031), 863 Program (2015AA020925), Fundamental Research Funds for the Central Universities (R1607005A), and China Postdoctoral Science Foundation funded project (2016M592267).
References
 T. Zhu, J. Guo, L. Collins et al., “Phellinus linteus activates different pathways to induce apoptosis in prostate cancer cells,” British Journal of Cancer, vol. 96, no. 4, pp. 583–590, 2007. View at: Publisher Site  Google Scholar
 D. Sliva, A. Jedinak, J. Kawasaki, K. Harvey, and V. Slivova, “Phellinus linteus suppresses growth, angiogenesis and invasive behaviour of breast cancer cells through the inhibition of AKT signalling,” British Journal of Cancer, vol. 98, no. 8, pp. 1348–1356, 2008. View at: Publisher Site  Google Scholar
 Y. Wang, J.X. Yu, C.L. Zhang et al., “Influence of flavonoids from Phellinus igniarius on sturgeon caviar: antioxidant effects and sensory characteristics,” Food Chemistry, vol. 131, no. 1, pp. 206–210, 2012. View at: Publisher Site  Google Scholar
 G. Xia, Y. Ge, and H. Fu, “Research on the extraction of total flavonoids from Phellinus vaninii with ultrasonicassisted technique,” Journal of Jiangsu University, vol. 20, no. 1, pp. 40–41, 2010. View at: Google Scholar
 H. H. Doğan and M. Karadelev, “Phellinus sulphurascens (Hymenochaetaceae, Basidiomycota): a very rare wooddecay fungus in Europe collected in Turkey,” Turkish Journal of Botany, vol. 33, no. 3, pp. 239–242, 2009. View at: Publisher Site  Google Scholar
 W. Liu, Study on the metabolic regulation of flavones produced by medicinal fungus Phellinus igniarius [M.S. thesis], 2012.
 X. Wen, L. Shao, Y. Xue, and W. Fang, “A rapid learning algorithm for vehicle classification,” Information Sciences, vol. 295, pp. 395–406, 2015. View at: Google Scholar
 Z. Xia, X. Wang, X. Sun, and Q. Wang, “A secure and dynamic multikeyword ranked search scheme over encrypted cloud data,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 2, pp. 340–352, 2016. View at: Publisher Site  Google Scholar
 S. Zhong, Y. G. Li, and J. Q. Zhu, “Optimum media on mycelial growth of Phellinus,” Zhejiang Agricultural Sciences, vol. 1, pp. 173–175, 2011. View at: Google Scholar
 S. Li, Y. X. Ding, J. Xu, and M. W. Zhao, “Optimization for medium compositions for intracellular polysaccharide of Phellinus baumii in submerged culture,” Food Science, vol. 11, pp. 236–240, 2006. View at: Google Scholar
 X. Guo, X. Zou, and M. Sun, “Optimization of extraction process by response surface methodology and preliminary characterization of polysaccharides from Phellinus igniarius,” Carbohydrate Polymers, vol. 80, no. 2, pp. 345–350, 2010. View at: Publisher Site  Google Scholar
 X.K. Ma, L. Li, E. C. Peterson, T. Ruan, and X. Duan, “The influence of naphthaleneacetic acid (NAA) and coumarin on flavonoid production by fungus Phellinus sp.: modeling of production kinetic profiles,” Applied Microbiology and Biotechnology, vol. 99, no. 22, pp. 9417–9426, 2015. View at: Publisher Site  Google Scholar
 N. W. Hanson, K. M. Konwar, A. K. Hawley, T. Altman, P. D. Karp, and S. J. Hallam, “Metabolic pathways for the whole community,” BMC Genomics, vol. 15, no. 1, article 619, 2014. View at: Publisher Site  Google Scholar
 M. Kim, G. Kim, B. Nam et al., “Development of speciesspecific primers for rapid detection of Phellinus linteus and P. baumii,” Mycobiology, vol. 33, no. 2, pp. 104–108, 2005. View at: Publisher Site  Google Scholar
 M. Zhang, D. R. Pan, and F. Zhou, “BP neural network extraction process by orthogonal beautiful azalea flavonoids,” Journal of Xinyang Normal University, vol. 2, pp. 261–264, 2011. View at: Google Scholar
 L. Khaouane, C. SiMoussa, S. Hanini, and O. Benkortbi, “Optimization of culture conditions for the production of pleuromutilin from pleurotus mutilus using a hybrid method based on central composite design, neural network, and particle swarm optimization,” Biotechnology and Bioprocess Engineering, vol. 17, no. 5, pp. 1048–1054, 2012. View at: Publisher Site  Google Scholar
 L. Harvey and C. Arthur, Fitting Models to Biological Data Using Linear and Nonlinear Regression: A Practical Guide to Curve Fitting, Oxford University Press, 2004.
 S. Hilary, Multivariate Statistical Analysis for Biologists, John Wiley & Sons, 1964.
 S. Robert and F. Rohlf, “The principles and practice of statistics in biological research,” in Multivariate Statistical Analysis for Biologists, Methuen, London, UK, 1969. View at: Google Scholar
 H. L. Seal, Multivariate Statistical Analysis for Biologists, Methuen, London, UK, 1964.
 B. Liu, F. Liu, X. Wang, J. Chen, L. Fang, and K. Chou, “PseinOne: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences,” Nucleic Acids Research, vol. 43, no. 1, pp. W65–W71, 2015. View at: Publisher Site  Google Scholar
 R. Wang, Y. Xu, and B. Liu, “Recombination spot identification based on gapped kmers,” Scientific Reports, vol. 6, Article ID 23934, 2016. View at: Publisher Site  Google Scholar
 F. John, Applied Regression Analysis, Linear Models, and Related Methods, Sage, 1997.
 G. A. F. Seber and A. J. Lee, Linear Regression Analysis, vol. 936, John Wiley & Sons, 2012.
 D. Lawrence, Handbook of Genetic Algorithms, Van Nostrand Reinhold, 1991.
 D. Beasley, R. R. Martin, and D. R. Bull, “An overview of genetic algorithms: part 1. Fundamentals,” University Computing, vol. 15, no. 2, pp. 58–69, 1993. View at: Google Scholar
 T.P. Hong, M.T. Wu, Y.F. Tung, and S.L. Wang, “Using escape operations in geneset genetic algorithms,” in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC '07), pp. 3907–3911, Montreal, Canada, October 2007. View at: Publisher Site  Google Scholar
 J. Luo, J. Liu, C. Ke et al., “Optimization of medium composition for the production of exopolysaccharides from Phellinus baumii Pilát in submerged culture and the immunostimulating activity of exopolysaccharides,” Carbohydrate Polymers, vol. 78, no. 3, pp. 409–415, 2009. View at: Publisher Site  Google Scholar
 D. B. Harper and J. T. Kennedy, “Effect of growth conditions on halomethane production by Phellinus species: biological and environmental implications,” Journal of General Microbiology, vol. 132, no. 5, pp. 1231–1246, 1986. View at: Google Scholar
 R. P. Lippmann, “An introduction to computing with neural nets,” IEEE ASSP Magazine, vol. 4, no. 2, pp. 4–22, 1987. View at: Publisher Site  Google Scholar
 W. Maass, “Networks of spiking neurons: the third generation of neural network models,” Neural Networks, vol. 10, no. 9, pp. 1659–1671, 1997. View at: Publisher Site  Google Scholar
 T. Song, J. Xu, and L. Pan, “On the universality and nonuniversality of spiking neural P systems with rules on synapses,” IEEE Transactions on NanoBioscience, vol. 14, no. 8, pp. 960–966, 2015. View at: Publisher Site  Google Scholar
 T. Song, Q. Zou, X. Liu, and X. Zeng, “Asynchronous spiking neural P systems with rules on synapses,” Neurocomputing, vol. 151, no. 3, pp. 1439–1445, 2015. View at: Publisher Site  Google Scholar
 X. Wang, T. Song, F. Gong, and P. Zheng, “On the computational power of spiking neural P systems with selforganization,” Scientific Reports, vol. 6, Article ID 27624, 2016. View at: Publisher Site  Google Scholar
 P. Li, M. Guo, C. Wang, X. Liu, and Q. Zou, “An overview of SNP interactions in genomewide association studies,” Briefings in Functional Genomics, vol. 14, no. 2, pp. 143–155, 2015. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2016 Zhongwei Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.