Abstract

The identification of bioactive components in traditional Chinese medicine (TCM) is an important part of the TCM material foundation research. Recently, molecular docking technology has been extensively used for the identification of TCM bioactive components. However, target proteins that are used in molecular docking may not be the actual TCM target. For this reason, the bioactive components would likely be omitted or incorrect. To address this problem, this study proposed the GEPSI method that identified the target proteins of TCM based on the similarity of gene expression profiles. The similarity of the gene expression profiles affected by TCM and small molecular drugs was calculated. The pharmacological action of TCM may be similar to that of small molecule drugs that have a high similarity score. Indeed, the target proteins of the small molecule drugs could be considered TCM targets. Thus, we identified the bioactive components of a TCM by molecular docking and verified the reliability of this method by a literature investigation. Using the target proteins that TCM actually affected as targets, the identification of the bioactive components was more accurate. This study provides a fast and effective method for the identification of TCM bioactive components.

1. Introduction

A method to identify the bioactive components in traditional Chinese medicine (TCM) from their complex mixtures is a critical challenge of TCM research. Because of its intuitive and efficient characteristics, molecular docking has become an important means for the identification of TCM bioactive components. The basis of identification via molecular docking involves one or multiple target proteins and the components being screened; ultimately, the components that specifically act on target protein can be identified, such as TCM bioactive components. In the screening process, a single target or multiple targets are chosen, usually targets associated with a specific disease. Methods for choosing targets are generally based on a database of disease-associated targets, a key target in a signaling transduction network or from the literature [13]. Because of the complexity of a disease, multiple targets may be associated with it. Therefore, the target proteins selected may not be the actual targets affected by TCM, or it may not be possible to screen against all of the associated targets. Therefore, the bioactive components obtained by molecular docking may not be the components that actually cured the corresponding disease or have been left out.

The development of chemical informatics and bioinformatics has led to the accumulation of data on TCM components, target proteins, and gene expression profiles. To determine a method for the selection of target proteins for molecular docking guided by the ideas of system pharmacology, this study proposed a method for determining the target proteins of TCM and then identified the bioactive components of TCM by molecular docking. This method has been designated the gene expression profile similarity-based identification (GEPSI) method. The basic concept is to choose the gene expression profiles that are targeted by small molecule drugs in Cmap based on the principle that they have higher comparability with the gene expression profiles of a TCM, and calculate the gene expression profiles similarity between the TCM and the small molecule drugs. The target proteins of the small molecule drugs that have higher similarity scores could be considered TCM targets. Aiming at these target proteins, virtual screening is carried out to screen the TCM components, ultimately identifying the bioactive components. Because it considers the entirety of the TCM components and all of the genes affected as the object, this method could embody the holistic thinking of TCM research more concretely. This method provides an effective means for the identification of TCM bioactive components and could serve as a basis for drug repositioning, quality control, and TCM drug design.

2. Methods and Materials

2.1. Principle of GEPSI

Both TCMs and small molecule drugs all can affect gene expression. By comparing the gene expression profiles before and after treatment with TCM or a small molecule drug, up- and downregulated differentially expressed genes can be identified. Then, these up and downregulated differentially expressed genes that are affected by TCMs and small molecule drugs can be compared, and a similarity score can be obtained. If the similarity score is high, the TCM and the small molecule drug may have similar pharmacological action, and the target proteins for the small molecule drugs that have higher similarity score can be considered targets for the TCM. Using these proteins as the targets, we can finally identify the bioactive components of a TCM by molecular docking. We also discuss each step of the ITPI method (Figure 1) in detail in this paper.

2.1.1. Gene Expression Profile Data

The Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) contains some gene expression profiles which were treated with components, herbs, and TCM formulae. These data were utilized to carry out TCM-related research. Connectivity Map (CMap, https://www.broadinstitute.org/cmap/) is a gene expression profile database related to small molecule drugs [34]. Cmap establishes the relations of small molecule drugs, genes, and diseases according to the gene expression differences in human tumor cell lines after treatment with small molecule drugs. By comparing the similarity of different gene expression profiles, Cmap is mainly applicable to the areas of drug development, such as drug repositioning. Different human tumor cell lines (HL60, MCF7, PC3, SKMEL5, and ssMCF7) were treated with small molecules drugs at different concentrations (10 nM, 100 nM, 1 μM, and 10 μM) for different times (6 h, 12 h). At present, Cmap contains data for 1309 small molecule drugs and more than 7000 gene expression profiles. Of these 1309 drugs, 556 drugs were recorded in DrugBank. Of these 556 drugs, 522 drugs had the data of target protein. This study chose gene expression profiles that had high similarity to TCMs with respect to the same cell lines types and platforms.

2.1.2. The Determination of Up and Downregulated Genes

The differentially expressed genes were determined using the bioinformatics toolbox of Matlab [35]. A t-test and false discovery rate (FDR) of multiple hypothesis testing were performed on each gene. Significant differentially expressed genes were detected by random sample replacement (, FDR ≤ 0.1). Up and downregulated genes were distinguished by the magnitude of fold change (FC). If FC ≥ 2, then the significant differentially expressed genes were up-regulated genes, and if FC ≤ 0.5, then the significant differentially expressed genes were downregulated genes.

2.1.3. The Similarity Computation of the Gene Expression Profile

The up- and downregulated genes were used to calculate the gene expression profile similarity. Using up and downregulated genes as the base data, the gene expression profile similarity was automatically calculated in Cmap by the K-S algorithm [34, 36]. A similarity comparison yielded the similarity scores of the gene expression profiles of each small molecule drugs and TCM. Similarity scores fell between −1 and 1. If 0 ≤ similarity scores ≤ 1, the pharmacological action of a small molecule drug and TCM were similar, and a higher absolute value of the similarity score indicated a greater similarity; if −1 ≤ similarity scores ≤ 0, the pharmacological action of a small molecule drug and TCM were adverse, and a higher absolute value indicated less similarity.

2.1.4. Determination of the TCM Target Proteins

If the similarity score for the gene expression profiles of a small molecule drug and TCM was high, then their pharmacological action was similar. The target proteins of a small molecule drugs were considered TCM targets. This study only considered the top 10 small molecule drugs that had a definite pharmacological action and their target proteins were recorded in DrugBank version 4.3.

2.1.5. Data for the TCM Components

The components of a TCM formula were collected from TCMD [37] and TCMSP [38]. The components were supplemented and perfected by the literature in CNKI and PubMed (1979~2017). The name, structure, and SMILES string of a component was recorded. For components with synonyms, the repetitive components were deleted by the “full structure” algorithm in “ChemBioFinder for Office 12.0”.

2.1.6. Determination of Bioactive Components of a TCM

The three-dimensional structure was downloaded from the PDB (https://www.rcsb.org/pdb/home/home.do), and the structure that had active ligands and higher resolution was preferentially selected. The preprocessing of the target protein included the deletion of ligands, water, and redundant protein conformations; the completion of missing or incomplete residues; the addition of hydrogens; and the distribution of related charges. The amino acids in the target protein that interact with the ligand were selected and were defined as the active pocket. The structure of components was transformed into a three-dimensional structure, endowed with a CHARMM force field and protonated in accordance with the corresponding pH. Molecular docking was carried out by LibDock [39], and the parameter settings were as follows: the “Conformation Method” was “BEST,” the “Docking Preferences” was “High Quality,” and the other parameters were set to the default. With the “LibDock Score” as the reference, the components that had a score higher than the ligand and the ranked in the top 10 were considered the bioactive components. This information allowed us to identify the bioactive components of the TCM.

2.2. The Application of GEPSI on SWT

Si-Wu-Tang (SWT) is a well-known TCM formula and is prepared from four medicinal herbs including Rehmanniae Radix Praeparata (Rehmannia glutinosa Libosch.), Angelicae Sinensis Radix (Angelica sinensis (Oliv.) Diels), Paeoniae Radix Alba (Paeonia lactiflora Pall.), and Chuanxiong Rhizoma (Ligusticum chuanxiong Hort.). SWT and its series of decoctions (i.e., the Xiang-Fu-Si-Wu decoction, and the Tao-Hong-Si-Wu decoction) have been widely used in clinical gynecology practice for blood stasis syndrome, such as primary dysmenorrheal, breast cancer, and other estrogen-related diseases [4044]. For SWT, this study applied the TCM bioactive components identification method based on the similarity of the gene expression profiles. In GEO, the number of gene expression profiles that SWT (0.0256 mg/mL, 0.256 mg/mL, and 2.56 mg/mL) acted on, MCF-7, was GSE23610. GSE23610 was obtained on the GPL570 platform (HG-U133_Plus_2) [45]. A total of 3905 gene expression profiles were selected in Cmap for the same cell line (MCF-7) and platform (HG-U133_Plus_2). These gene expression profiles involved 1294 small molecule drugs. In addition, 98 components of Rehmanniae Radix Praeparata, 215 components of Angelicae Sinensis Radix, 85 components of Paeoniae Radix Alba, and 258 components of Chuanxiong Rhizoma were collected. The collected components can be seen in the Supplemental Information 1.

3. Results and Discussions

3.1. Up- and Downregulated Genes

At SWT concentrations of 0.0256 mg/mL and 0.256 mg/mL, the expression of each gene did not obviously change, but when the SWT concentration was 2.56 mg/mL, the expression of each gene obviously changed. Therefore, the gene expression profile that was elicited by SWT (2.56 mg/mL) was chosen for follow-up research.

A t-test and false discovery rate (FDR) multiple hypothesis test were applied to each gene. A large number of genes were found to have biological differences; 442 genes were up-regulated and 189 were downregulated (Supplemental Information 2).

3.2. The Small Molecule Drugs with High Similarity Scores

After the similarity was computed, the similarity scores of the gene expression profiles for 1294 small molecular drugs and SWT were obtained. The top ten small molecule drugs that had explicit pharmacological action and their target proteins contained in DrugBank were retained. The results are shown in Table 1.

3.3. The Primary Pharmacological Actions of the Top Ten Small Molecule Drugs

The primary pharmacological actions of the top ten small molecule drugs in Table 1 were investigated in the literature. The results are shown in Table 2.

Table 2 shows that the pharmacological actions of the ten small molecule drugs all involve disease caused by an unbalanced estrogen level. Except for phenoxybenzamine and equilin, the primary pharmacological action of the remaining eight small molecule drugs was closely related to the treatment of breast cancer. Most of the eight drugs have an estrogenic effect. For example, resveratrol and genistein are phytoestrogens; estradiol is a natural estrogen that is secreted by mature ovarian follicles; diethylstilbestrol is a kind of estrogen that is a common endocrine medication for breast cancer. We often think that the occurrence of breast cancer is related to an excessive or imbalanced level of estrogen in the female body [46], and the regulation of immunity is an important method for the treatment of cancer. To summarize, SWT may have an anti-breast cancer effect because it has a high similarity score with the top ten small molecule drugs.

3.4. The Target Proteins That Were Used in Molecular Docking

Of the top ten small molecule drugs, only the target proteins of four drugs have a three-dimensional structure in the PDB with a high resolution and corresponding bioactive ligands. Therefore, the target proteins of these four drugs were used for molecular docking studies (Table 3).

3.5. Bioactive Components of SWT

After molecular docking, the components whose LibDock score were higher than that of the ligand were identified as bioactive components (the LibDock score of the ligand is shown in Supplemental Information 3). This study identified 46 bioactive components, including 12 components in Paeoniae Radix Alba, 4 components in Chuanxiong Rhizoma, 6 components in Angelicae Sinensis Radix, and 24 components in Rehmanniae Radix Praeparata (the results are shown in Table 4). The 46 bioactive components act on 9 target proteins.

3.6. Verification of the Reliability of GEPSI

Table 4 shows that SWT has anti-breast cancer activity through 46 bioactive components and these components acted on 9 target proteins. Most of the bioactive components, such as catalpol, verbascoside, and paeoniflorin, acted on multiple targets. The types and numbers of targets that the bioactive components acted on were diverse. If we only use one or a few proteins as targets, the bioactive components retrieved may be not complete. For example, 19 bioactive components, such as catalpol, aucubin, and melittoside will be not be retrieved when ISGO is the target protein. Therefore, we should consider all the targets that TCM could affect when a comprehensive bioactive components screening is carried out.

The proteins in Table 4 were the targets of resveratrol, diethylstilbestrol, estradiol, and genistein. According to the literature, these four components were estrogen or had an estrogen-like effect, and resveratrol and genistein had potent anti-breast cancer activity. Hence, the 46 bioactive components of SWT may also have anti-breast cancer and estrogen-like effects. Studies showed that catalpol, a DNA polymerase inhibitor, inhibited the proliferation of six human solid tumor cell lines by acting during the G0-G1 period. The naturally occurring iridoid catalpol is a Taq DNA polymerase inhibitor. However, the formation of analogs bearing one to three silyl ether groups led to antiproliferative compounds against a panel of six human solid tumor cell lines, with GI50 values in the range 1.8–4.8 μM. Cell cycle studies revealed an arrest of the G0/G1 phase that was consistent with DNA polymerase inhibition [47]. Orientin could suppress the proliferation of MCF-7 and present specific dose-response relationships [48, 49]. Paeoniflorin could suppress the proliferation and spread of breast cancer cells through the Notch-1 pathway [50]. The effect of trigalacturonic acid on the proliferation inhibition of Bcap-3 in breast cancer cells was better and it may have an anti-breast cancer potential [51]. To summarize, we found some bioactive components did have the same effect as small molecule drugs via a literature research which indicated the reliability of the bioactive components identification method based on the similarity of gene expression profiles.

4. Conclusion

This method was more accurate with the protein that TCM actually acted on as the target, and the result was more comprehensive than a determination of the target protein according to disease-related target databases and signal transduction networks. For example, there are 74 breast cancer-related targets in the Therapeutic Target Database (TTD), including the estrogen receptor (ER), the vascular endothelial growth factor receptor 1 (VEGFR1), and the epidermal growth factor receptor (EGFR). However, no evidence was available to support the selection of these proteins as targets. This study identified the target protein that SWT actually acted on by a gene expression similarity comparison, identified all the bioactive components of SWT by molecular docking, and then verified the reliability of this method through a literature investigation. GEPSI could serve as a rapid and effective method for the identification of TCM bioactive components. Although some time is necessary to perfect related databases, such as components of TCM and the target protein structure of small drugs, we believe the data that used in GEPSI will be more complete, and the results will be more accurate with the development of chemical informatics and bioinformatics.

Meanwhile, this study has also revealed that SWT had anti-breast cancer efficacy. However, there have been no studies of these effects. The Tao-hong Si-wu Decoction, a derivative formula, was proved to influence the upper limb swelling after breast cancer surgery and the quality of a chemotherapy patient’s life [52, 53]. Research has shown that Paeoniae Radix Alba, Chuanxiong Rhizoma, Rehmanniae Radix Praeparata, and SWT have plant estrogen-like effects, but the bioactive components have not been identified [54]. The above studies indirectly illustrate the rationality that SWT has an anti-breast cancer effect. That is to say, GEPSI also can be used for drug repositioning. Now that the bioactive components have been identified, we can control the quality of the individual herbs. We can also design an anti-breast cancer drug combination based on the bioactive components in SWT.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 81673697).

Supplementary Materials

Supplementary Materials 1: it includes 656 components of SWT. These components were used for molecular docking. Supplementary Materials 2: 442 upregulated genes and 189 downregulated genes. Supplementary Materials 3: the LibDock score of the ligand. (Supplementary Materials)