Abstract

Aims/Introduction. Evidences have shown that the deteriorated procession of disease is not a smooth change with time and conditions, in which a critical transition point denoted as predisease state drives the state from normal to disease. Considering individual differences, this paper provides a sample-specific method that constructs an index with individual-specific dynamical network biomarkers (DNB) which are defined as early warning index (EWI) for detecting predisease state of individual sample. Based on microarray data of influenza A disease, 144 genes are selected as DNB and the 7th time period is defined as predisease state. In addition, according to functional analysis of the discovered DNB, it is relevant with experience data, which can illustrate the effectiveness of our sample-specific method.

1. Introduction

A drastic change in the complex biological processes has been shown in recent studies, after which the system shifts rapidly from a stable state to another [1, 2]. This tipping point may be better known with respect to the earth climate system [3] and global finance [4] but now is applied in other areas gradually such as complex disease [5, 6]. In present studies, the disease progression is divided into three parts named normal state, predisease state (tipping point), and disease state, respectively [7]. Researches on predisease state of complex disease are only used to provide clinical patients with disease state the necessary information and are not to predict a patient in predisease state directly.

The earliest disease progression is identified by using a single molecular biomarker [8]. With further researches on disease progression, as early as 2008, Jin et al. applied the protein network to cardiovascular diseases, by identifying a group with high confidence of interacting proteins to form a network, which can be more accurate to divide into two groups of patients compared with a single molecular biomarker [9]. A more important role of network markers and a single molecular biomarker is to distinguish disease status, rather than to detect the critical state of the disease. Given this situation, Chen et al. proposed a theory of DNB to identify the critical state of the disease, which was based on model free, small sample, and high-throughput data. Three conditions to determine the DNB are put forward [10]. Generally, the studies of identifying predisease state are based on two types of data (high-throughput data and sequence data). Gao et al. extracted the larger mutation of influenza A virus proteins to form DNB based on sequence data with consecutive years, and according to the changes in DNB of each year, a warning index can be constructed to identify the outbreak year and before [11]. In addition, the development of high-throughput technology enables us to observe a large number of biomolecules by one time. Even if the number of patients’ samples in the early state is small, it can also maintain each sampling point with high-throughput data on molecular level of high dimensionality [1215].

Based on rapid advanced high-throughput technologies, we can obtain gene or protein expression at genome-wild scale with over thousands of measurements of long-term dynamics. Considering individual differences, our study is different from the method with multiple patient samples at each time period for detecting predisease state instead of proposing a sample-specific method [1618]. In our study, the data sets which are divided into case group and control group were used to select differential expressed genes (DEGs) by -test. Genes in DEGs were clustered into 40 categories by using hierarchical clustering analysis. Then, 144 genes in a group which satisfied the three criteria of DNB identification proposed by Chen were selected as DNB. Therefore, based on individual-specific data, we can predict and identify whether a time period is in predisease state by observing the variation of EWI value combined with the three indicators.

2. Materials and Methods

2.1. Data Collection

The microarray gene expression data is downloaded from the National Center for Biotechnology Information’s Gene Expression Omnibus (GEO) database (series accession number: GSE19392). The gene expression data set is generated by using Affymetrix HT Human Genome U133A (HT_ HG-U133A) Microarrays, which are obtained from an experiment of primary human bronchial epithelial cells that are infected with the wild-type PR8 influenza virus (A/PR/8/34). In our study, 10 out of 20 samples are defined as case group which are collected from primary human bronchial epithelial cells infected with the wild-type PR8 influenza virus after 0.25 h, 0.5 h, 1 h, 1.5 h, 2 h, 4 h, 6 h, 8 h, 12 h, and 18 h and the rest of the 20 samples are defined as control group which treated the same process but in the absence of virus. Moreover, 22277 probe sets are mapped to 13915 unique gene symbols involved in the influenza data set.

The Student -test, which can evaluate the significance of genes with differential expression between case group and control group, is applied in the selection of DEGs. The value figured out by -test is directly used for the subsequent filtering analysis without multiple-testing correction. Only the genes with are regarded as DEGs.

2.1.1. Dynamical System Model

Studies have shown that a biological process of the complex disease can be divided into three parts concretely. The state between normal and disease state is a tipping point which called predisease state. The system will change dramatically when the phase of disease approaches to the state. The following discrete-time state system of a living organism can be described by a nonlinear dynamical system equation:where is an -dimensional vector which represents observed values or molecule concentrations (e.g., gene expression or protein expression) at time point , for example, minutes, hours, or days. Parameter indicates the slowly changing factors about genetic factors (e.g., SNP and CNV) and epigenetic factors (e.g., methylation and acetylation). Yet is determined by its character and is not taken into consideration in this study because it is a unknown parameter with slower dynamics than . is general nonlinear functions of .

2.1.2. Data Normalization

The observed values or molecule concentrations can be classified into two groups, namely, the case group and the control group. They are denoted as and , respectively:

Due to the existing large differences in the expression values of various genes or proteins, the data normalization manner as follows is adopted to analyze the data:where is the th gene or protein expression of case group and and are the mean and standard deviation of th gene or protein expression at all time points in control group and the control group, respectively. Then a normalization matrix is obtained:where represents the normalization data of the th gene or protein at time point t.

2.1.3. Sample-Specific Dynamical Network Biomarkers Selection

To further filtrate DNB, DEGs by -test are isolated from normalization data and denoted as , while the rest are denoted as . It is assumed that and are matrix and matrix, respectively. And is clustered into 40 categories by using hierarchical clustering analysis. The Euclidean distance is applied to calculate the distance within genes or proteins of . The optimal group of genes or proteins is selected as DNB according to the following three criteria of DNB identification proposed by Chen:

(i) The average standard deviation (SD) of molecule concentration in this group is significantly higher comparing to others.

(ii) The average Pearson correlation coefficient (PCC) in absolute value of molecule concentrations in this group is relatively higher than the PCC between other molecules.

(iii) The average Pearson correlation coefficient in absolute value between molecule concentrations inside this group and anyone outside this group (OPCC) is much lower.

2.1.4. Construct Sample-Specific Early Warning Index

The optimal group containing genes or proteins is separated from , which is marked as . Additionally, the rest of the groups of are assigned to . There is a key point called predisease state during the development of the disease, in the figure of dynamic state (Figure 1(c)) originally proposed by Chen et al. [10]. The change of DNB before and after predisease state is relatively stable and smooth, whereas it turns into abrupt and drastic at predisease state. After identifying the DNB, the early warning index of each time point can be constructed by three criteria:

(i) The average coefficient variation (CV) of molecule concentrations at different time points is the value of fluctuation. The CV value approaching predisease state is higher than that of other time point.

(ii) The average value of absolute difference (DIF) in molecule concentrations inside DNB approaching predisease state drastically decreases compared with the values at other time points.

(iii) The average value of absolute difference between molecule concentrations inside DNB and any other outside DNB (ODIF) approaching predisease state is relatively higher than others.

Hence, the of all time points can be constructed as where

In the light of the characteristic in predisease state, a time point with the largest value can be considered as the predisease state. After the point, the disease progression of a living organism shifts rapidly from normal to disease state.

3. Result

All without-correct-corresponding gene symbols are screened out, and probe of the same genes is combined by the averaging method. There are 13915 genes left. Based on the 13915 genes, Student’s -test is applied to calculate the value of each gene by comparing its expression profile between case groups and control groups. We identify 264 genes with as DEGs. Next, 264 genes are classified by hierarchical clustering analysis into 40 categories. Analyzing all clusters or groups, a group of 144 genes is identified as DNB, which satisfies the three criteria of DNB identification. Among them, the values of average SD and average PCC in this group are 1.2585797 and 0.3047569, which is higher than others (e.g., 0.8802955 and 0.2940955), and the average OPCC is relatively high.

To further clarify the early warning index for influenza A disease with 10 time points, Figure 2 demonstrates variation of four indicators in detail. As shown in (a), the curve of the CV value of DNB strongly fluctuates with time and the value at 7th time point (6 h) reaches the maximum value, which indicates that the genes in DNB change drastically when approaching 7th time point (6 h). And the value of DIF at 6 h shows the relatively lower value, which indicates that the trend of genes expression in DNB is similarly approaching 7th time point (6 h). Although the change in ODIF is not obvious, the early warning index at 7th time point (6 h) reaches the maximum value. Thus, the most prominent physiological effects occur approaching 7th time point (6 h). Meanwhile, gene expression changes in HBECs in response to wild-type influenza (PR8) show a strong float after 7th time point (6 h) [19].

In order to analyze biological functional of the obtained DNB, A bioinformatics database DAVID [20] (https://david.abcc.ncifcrf.gov/) with Gene Ontology (GO) analysis and KEGG Pathway analysis is mentioned. Some enriched GO functions based on identified genes in DNB are listed in Table 1. Gene Ontology can be divided into three parts: molecular function, biological process, and cellular composition. The analysis of genes reveals that the DNB selected by the sample-specific method is particularly related to influenza disease, which confirms the validation of our theory about the increasing index approaching predisease state. The enriched GO functions underlying the identified DNBs are particularly related to immune systems that are activated to protect against influenza A virus and inordinate dysfunctions associated with the performance in the viral life cycle. In the DNBs, DCAF1 and ADGRG3, which play crucial roles in cell differentiation, and ARVCF, COL19A1, and OLR1, which are associated with cell adhesion, regulate the expression of cell adhesion molecules [21]. Further, some genes in the basic cellular processes are expressed in a disorderly manner, for example, IFNA10, which is associated with the regulation of cell death and abnormal reaction in transcription and translation. Moreover, Some of them are involved in the related triglyceride metabolic process, especially for APOC3.

According to KEGG Pathway enrichment analysis, the results show that genes in DNB of influenza A disease are closely relevant to immune system and inflammation, such as cytokine-cytokine receptor interaction, PPAR signaling pathway, and Jak-STAT signaling pathway in Table 2. As key genes in cytokine-cytokine receptor interaction, CXCR6, IFNA10, IL21R, and TGFB3 in DNB participate in immune response and immune regulation, regulate innate immune and adapt immune response, stimulate hematopoietic function, stimulate cell activation, proliferation, and differentiation, and induce apoptosis. The pathway of cytokine-cytokine receptor interaction is the same with expressed data.

Moreover, the genes in PPAR signaling pathway like LPL, APOA1, OLR1, and APOC3 play a significant role in inhibiting inflammation, regulating cell apoptosis and immune system. And the genes in DNB which are marked red are placed the critical positions in cytokine-cytokine receptor interaction and PPAR signaling pathway. As shown in Figure 3. JAK-STAT signaling pathway is a signal transduction pathway stimulated by cytokines in recent years [22], which includes IFNA10, IL21R, IL13RA1, and IL3RA in DNB, involved in cell proliferation, differentiation, apoptosis, and immune regulation, and many other important biological processes.

To further demonstrate the effectiveness of our method, we analyze symptoms of patients and their complications. Patients develop symptoms of illness of upper respiratory tract infection. However, they are also accompanied by the occurrence of pulmonary complications, and renal failure [23]. Moreover, 18 out of 144 genes are validated with significantly close relation with influenza A disease. The CX3CL1 involves both acute and chronic inflammations, which is characterized by major perturbations of the immune homeostasis [24]. Especially surfactant proteins SFTPB plays a key role in alveolar stability [25], Which is associated with influenza A disease. And this gene encodes the pulmonary-associated surfactant protein B. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung.

4. Discussion

To detect the early warning signal of influenza A disease using a small number of samples of high-throughput data, we propose an early warning index serving as a leading indicator to predict the critical transition based on the concept of dynamical network biomarkers proposed by Chen, which drives the disease progression from normal state to disease state. Compared to the general biomarkers [26], dynamical network biomarkers are more suitable for characterizing the transfer of system status. In our study, We first select the DEGs by -test between case groups and control groups. Then, a new type of normalization data is constructed by the formula defined in this study for the sake of analysis of the next step. Different from the previous methods, our work regards the gene expression with time of each gene as a vector for hierarchical clustering analysis. And the Euclidean distance is applied to calculate the distance within genes in DEGs. A group, which satisfies three criteria of DNB identification, is identified as DNB. Further, the values of CV, DIF, and ODIF are calculated to construct an index for detecting predisease state of individual sample. The index EWI is applied in early diagnosis with the microarray data of influenza A disease, which demonstrates fluctuated values with time. Although the ODIF value approaching predisease state is not completely obvious, the expression value of the other three indicators is significantly relevant with our theory. In addition, everyone with the same disease has different DNB due to different driving factors. We will focus on this important future topic and continue to refine the algorithm in later research.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study is supported by Major Research Plan of National Natural Science Foundation of China under Grant no. 91730301; National Natural Science Foundation of China under Grant no. 11271163; the Foundation of the Innovation Project of Jiangsu Province under Grant no. KYCX17_ 1478.