GEPSI: A Gene Expression Profile Similarity-Based Identification Method of Bioactive Components in Traditional Chinese Medicine Formula
The identification of bioactive components in traditional Chinese medicine (TCM) is an important part of the TCM material foundation research. Recently, molecular docking technology has been extensively used for the identification of TCM bioactive components. However, target proteins that are used in molecular docking may not be the actual TCM target. For this reason, the bioactive components would likely be omitted or incorrect. To address this problem, this study proposed the GEPSI method that identified the target proteins of TCM based on the similarity of gene expression profiles. The similarity of the gene expression profiles affected by TCM and small molecular drugs was calculated. The pharmacological action of TCM may be similar to that of small molecule drugs that have a high similarity score. Indeed, the target proteins of the small molecule drugs could be considered TCM targets. Thus, we identified the bioactive components of a TCM by molecular docking and verified the reliability of this method by a literature investigation. Using the target proteins that TCM actually affected as targets, the identification of the bioactive components was more accurate. This study provides a fast and effective method for the identification of TCM bioactive components.
A method to identify the bioactive components in traditional Chinese medicine (TCM) from their complex mixtures is a critical challenge of TCM research. Because of its intuitive and efficient characteristics, molecular docking has become an important means for the identification of TCM bioactive components. The basis of identification via molecular docking involves one or multiple target proteins and the components being screened; ultimately, the components that specifically act on target protein can be identified, such as TCM bioactive components. In the screening process, a single target or multiple targets are chosen, usually targets associated with a specific disease. Methods for choosing targets are generally based on a database of disease-associated targets, a key target in a signaling transduction network or from the literature [1–3]. Because of the complexity of a disease, multiple targets may be associated with it. Therefore, the target proteins selected may not be the actual targets affected by TCM, or it may not be possible to screen against all of the associated targets. Therefore, the bioactive components obtained by molecular docking may not be the components that actually cured the corresponding disease or have been left out.
The development of chemical informatics and bioinformatics has led to the accumulation of data on TCM components, target proteins, and gene expression profiles. To determine a method for the selection of target proteins for molecular docking guided by the ideas of system pharmacology, this study proposed a method for determining the target proteins of TCM and then identified the bioactive components of TCM by molecular docking. This method has been designated the gene expression profile similarity-based identification (GEPSI) method. The basic concept is to choose the gene expression profiles that are targeted by small molecule drugs in Cmap based on the principle that they have higher comparability with the gene expression profiles of a TCM, and calculate the gene expression profiles similarity between the TCM and the small molecule drugs. The target proteins of the small molecule drugs that have higher similarity scores could be considered TCM targets. Aiming at these target proteins, virtual screening is carried out to screen the TCM components, ultimately identifying the bioactive components. Because it considers the entirety of the TCM components and all of the genes affected as the object, this method could embody the holistic thinking of TCM research more concretely. This method provides an effective means for the identification of TCM bioactive components and could serve as a basis for drug repositioning, quality control, and TCM drug design.
2. Methods and Materials
2.1. Principle of GEPSI
Both TCMs and small molecule drugs all can affect gene expression. By comparing the gene expression profiles before and after treatment with TCM or a small molecule drug, up- and downregulated differentially expressed genes can be identified. Then, these up and downregulated differentially expressed genes that are affected by TCMs and small molecule drugs can be compared, and a similarity score can be obtained. If the similarity score is high, the TCM and the small molecule drug may have similar pharmacological action, and the target proteins for the small molecule drugs that have higher similarity score can be considered targets for the TCM. Using these proteins as the targets, we can finally identify the bioactive components of a TCM by molecular docking. We also discuss each step of the ITPI method (Figure 1) in detail in this paper.
2.1.1. Gene Expression Profile Data
The Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) contains some gene expression profiles which were treated with components, herbs, and TCM formulae. These data were utilized to carry out TCM-related research. Connectivity Map (CMap, https://www.broadinstitute.org/cmap/) is a gene expression profile database related to small molecule drugs . Cmap establishes the relations of small molecule drugs, genes, and diseases according to the gene expression differences in human tumor cell lines after treatment with small molecule drugs. By comparing the similarity of different gene expression profiles, Cmap is mainly applicable to the areas of drug development, such as drug repositioning. Different human tumor cell lines (HL60, MCF7, PC3, SKMEL5, and ssMCF7) were treated with small molecules drugs at different concentrations (10 nM, 100 nM, 1 μM, and 10 μM) for different times (6 h, 12 h). At present, Cmap contains data for 1309 small molecule drugs and more than 7000 gene expression profiles. Of these 1309 drugs, 556 drugs were recorded in DrugBank. Of these 556 drugs, 522 drugs had the data of target protein. This study chose gene expression profiles that had high similarity to TCMs with respect to the same cell lines types and platforms.
2.1.2. The Determination of Up and Downregulated Genes
The differentially expressed genes were determined using the bioinformatics toolbox of Matlab . A t-test and false discovery rate (FDR) of multiple hypothesis testing were performed on each gene. Significant differentially expressed genes were detected by random sample replacement (, FDR ≤ 0.1). Up and downregulated genes were distinguished by the magnitude of fold change (FC). If FC ≥ 2, then the significant differentially expressed genes were up-regulated genes, and if FC ≤ 0.5, then the significant differentially expressed genes were downregulated genes.
2.1.3. The Similarity Computation of the Gene Expression Profile
The up- and downregulated genes were used to calculate the gene expression profile similarity. Using up and downregulated genes as the base data, the gene expression profile similarity was automatically calculated in Cmap by the K-S algorithm [34, 36]. A similarity comparison yielded the similarity scores of the gene expression profiles of each small molecule drugs and TCM. Similarity scores fell between −1 and 1. If 0 ≤ similarity scores ≤ 1, the pharmacological action of a small molecule drug and TCM were similar, and a higher absolute value of the similarity score indicated a greater similarity; if −1 ≤ similarity scores ≤ 0, the pharmacological action of a small molecule drug and TCM were adverse, and a higher absolute value indicated less similarity.
2.1.4. Determination of the TCM Target Proteins
If the similarity score for the gene expression profiles of a small molecule drug and TCM was high, then their pharmacological action was similar. The target proteins of a small molecule drugs were considered TCM targets. This study only considered the top 10 small molecule drugs that had a definite pharmacological action and their target proteins were recorded in DrugBank version 4.3.
2.1.5. Data for the TCM Components
The components of a TCM formula were collected from TCMD  and TCMSP . The components were supplemented and perfected by the literature in CNKI and PubMed (1979~2017). The name, structure, and SMILES string of a component was recorded. For components with synonyms, the repetitive components were deleted by the “full structure” algorithm in “ChemBioFinder for Office 12.0”.
2.1.6. Determination of Bioactive Components of a TCM
The three-dimensional structure was downloaded from the PDB (https://www.rcsb.org/pdb/home/home.do), and the structure that had active ligands and higher resolution was preferentially selected. The preprocessing of the target protein included the deletion of ligands, water, and redundant protein conformations; the completion of missing or incomplete residues; the addition of hydrogens; and the distribution of related charges. The amino acids in the target protein that interact with the ligand were selected and were defined as the active pocket. The structure of components was transformed into a three-dimensional structure, endowed with a CHARMM force field and protonated in accordance with the corresponding pH. Molecular docking was carried out by LibDock , and the parameter settings were as follows: the “Conformation Method” was “BEST,” the “Docking Preferences” was “High Quality,” and the other parameters were set to the default. With the “LibDock Score” as the reference, the components that had a score higher than the ligand and the ranked in the top 10 were considered the bioactive components. This information allowed us to identify the bioactive components of the TCM.
2.2. The Application of GEPSI on SWT
Si-Wu-Tang (SWT) is a well-known TCM formula and is prepared from four medicinal herbs including Rehmanniae Radix Praeparata (Rehmannia glutinosa Libosch.), Angelicae Sinensis Radix (Angelica sinensis (Oliv.) Diels), Paeoniae Radix Alba (Paeonia lactiflora Pall.), and Chuanxiong Rhizoma (Ligusticum chuanxiong Hort.). SWT and its series of decoctions (i.e., the Xiang-Fu-Si-Wu decoction, and the Tao-Hong-Si-Wu decoction) have been widely used in clinical gynecology practice for blood stasis syndrome, such as primary dysmenorrheal, breast cancer, and other estrogen-related diseases [40–44]. For SWT, this study applied the TCM bioactive components identification method based on the similarity of the gene expression profiles. In GEO, the number of gene expression profiles that SWT (0.0256 mg/mL, 0.256 mg/mL, and 2.56 mg/mL) acted on, MCF-7, was GSE23610. GSE23610 was obtained on the GPL570 platform (HG-U133_Plus_2) . A total of 3905 gene expression profiles were selected in Cmap for the same cell line (MCF-7) and platform (HG-U133_Plus_2). These gene expression profiles involved 1294 small molecule drugs. In addition, 98 components of Rehmanniae Radix Praeparata, 215 components of Angelicae Sinensis Radix, 85 components of Paeoniae Radix Alba, and 258 components of Chuanxiong Rhizoma were collected. The collected components can be seen in the Supplemental Information 1.
3. Results and Discussions
3.1. Up- and Downregulated Genes
At SWT concentrations of 0.0256 mg/mL and 0.256 mg/mL, the expression of each gene did not obviously change, but when the SWT concentration was 2.56 mg/mL, the expression of each gene obviously changed. Therefore, the gene expression profile that was elicited by SWT (2.56 mg/mL) was chosen for follow-up research.
A t-test and false discovery rate (FDR) multiple hypothesis test were applied to each gene. A large number of genes were found to have biological differences; 442 genes were up-regulated and 189 were downregulated (Supplemental Information 2).
3.2. The Small Molecule Drugs with High Similarity Scores
After the similarity was computed, the similarity scores of the gene expression profiles for 1294 small molecular drugs and SWT were obtained. The top ten small molecule drugs that had explicit pharmacological action and their target proteins contained in DrugBank were retained. The results are shown in Table 1.
3.3. The Primary Pharmacological Actions of the Top Ten Small Molecule Drugs
Table 2 shows that the pharmacological actions of the ten small molecule drugs all involve disease caused by an unbalanced estrogen level. Except for phenoxybenzamine and equilin, the primary pharmacological action of the remaining eight small molecule drugs was closely related to the treatment of breast cancer. Most of the eight drugs have an estrogenic effect. For example, resveratrol and genistein are phytoestrogens; estradiol is a natural estrogen that is secreted by mature ovarian follicles; diethylstilbestrol is a kind of estrogen that is a common endocrine medication for breast cancer. We often think that the occurrence of breast cancer is related to an excessive or imbalanced level of estrogen in the female body , and the regulation of immunity is an important method for the treatment of cancer. To summarize, SWT may have an anti-breast cancer effect because it has a high similarity score with the top ten small molecule drugs.
3.4. The Target Proteins That Were Used in Molecular Docking
Of the top ten small molecule drugs, only the target proteins of four drugs have a three-dimensional structure in the PDB with a high resolution and corresponding bioactive ligands. Therefore, the target proteins of these four drugs were used for molecular docking studies (Table 3).
3.5. Bioactive Components of SWT
After molecular docking, the components whose LibDock score were higher than that of the ligand were identified as bioactive components (the LibDock score of the ligand is shown in Supplemental Information 3). This study identified 46 bioactive components, including 12 components in Paeoniae Radix Alba, 4 components in Chuanxiong Rhizoma, 6 components in Angelicae Sinensis Radix, and 24 components in Rehmanniae Radix Praeparata (the results are shown in Table 4). The 46 bioactive components act on 9 target proteins.
3.6. Verification of the Reliability of GEPSI
Table 4 shows that SWT has anti-breast cancer activity through 46 bioactive components and these components acted on 9 target proteins. Most of the bioactive components, such as catalpol, verbascoside, and paeoniflorin, acted on multiple targets. The types and numbers of targets that the bioactive components acted on were diverse. If we only use one or a few proteins as targets, the bioactive components retrieved may be not complete. For example, 19 bioactive components, such as catalpol, aucubin, and melittoside will be not be retrieved when ISGO is the target protein. Therefore, we should consider all the targets that TCM could affect when a comprehensive bioactive components screening is carried out.
The proteins in Table 4 were the targets of resveratrol, diethylstilbestrol, estradiol, and genistein. According to the literature, these four components were estrogen or had an estrogen-like effect, and resveratrol and genistein had potent anti-breast cancer activity. Hence, the 46 bioactive components of SWT may also have anti-breast cancer and estrogen-like effects. Studies showed that catalpol, a DNA polymerase inhibitor, inhibited the proliferation of six human solid tumor cell lines by acting during the G0-G1 period. The naturally occurring iridoid catalpol is a Taq DNA polymerase inhibitor. However, the formation of analogs bearing one to three silyl ether groups led to antiproliferative compounds against a panel of six human solid tumor cell lines, with GI50 values in the range 1.8–4.8 μM. Cell cycle studies revealed an arrest of the G0/G1 phase that was consistent with DNA polymerase inhibition . Orientin could suppress the proliferation of MCF-7 and present specific dose-response relationships [48, 49]. Paeoniflorin could suppress the proliferation and spread of breast cancer cells through the Notch-1 pathway . The effect of trigalacturonic acid on the proliferation inhibition of Bcap-3 in breast cancer cells was better and it may have an anti-breast cancer potential . To summarize, we found some bioactive components did have the same effect as small molecule drugs via a literature research which indicated the reliability of the bioactive components identification method based on the similarity of gene expression profiles.
This method was more accurate with the protein that TCM actually acted on as the target, and the result was more comprehensive than a determination of the target protein according to disease-related target databases and signal transduction networks. For example, there are 74 breast cancer-related targets in the Therapeutic Target Database (TTD), including the estrogen receptor (ER), the vascular endothelial growth factor receptor 1 (VEGFR1), and the epidermal growth factor receptor (EGFR). However, no evidence was available to support the selection of these proteins as targets. This study identified the target protein that SWT actually acted on by a gene expression similarity comparison, identified all the bioactive components of SWT by molecular docking, and then verified the reliability of this method through a literature investigation. GEPSI could serve as a rapid and effective method for the identification of TCM bioactive components. Although some time is necessary to perfect related databases, such as components of TCM and the target protein structure of small drugs, we believe the data that used in GEPSI will be more complete, and the results will be more accurate with the development of chemical informatics and bioinformatics.
Meanwhile, this study has also revealed that SWT had anti-breast cancer efficacy. However, there have been no studies of these effects. The Tao-hong Si-wu Decoction, a derivative formula, was proved to influence the upper limb swelling after breast cancer surgery and the quality of a chemotherapy patient’s life [52, 53]. Research has shown that Paeoniae Radix Alba, Chuanxiong Rhizoma, Rehmanniae Radix Praeparata, and SWT have plant estrogen-like effects, but the bioactive components have not been identified . The above studies indirectly illustrate the rationality that SWT has an anti-breast cancer effect. That is to say, GEPSI also can be used for drug repositioning. Now that the bioactive components have been identified, we can control the quality of the individual herbs. We can also design an anti-breast cancer drug combination based on the bioactive components in SWT.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
This work was supported by the National Natural Science Foundation of China (Grant no. 81673697).
Supplementary Materials 1: it includes 656 components of SWT. These components were used for molecular docking. Supplementary Materials 2: 442 upregulated genes and 189 downregulated genes. Supplementary Materials 3: the LibDock score of the ligand. (Supplementary Materials)
L. Chen, J. Du, Q. Dai, H. Zhang, W. Pang, and J. Hu, “Prediction of anti-tumor chemical probes of a traditional Chinese medicine formula by HPLC fingerprinting combined with molecular docking,” European Journal of Medicinal Chemistry, vol. 83, pp. 294–306, 2014.View at: Publisher Site | Google Scholar
H. Chen, C. Z. Geng, G. Kuang et al., “Diethylstilbestrol intervention carcinogenesis of breast cancer in wistar rats,” Chinese journal of cancer, vol. 26, no. 6, pp. 596–600, 2007.View at: Google Scholar
J. Geisler, B. Haynes, G. Anker et al., “Treatment with high-dose estrogen (diethylstilbestrol) significantly decreases plasma estrogen and androgen levels but does not influence in vivo aromatization in postmenopausal breast cancer patients,” The Journal of Steroid Biochemistry and Molecular Biology, vol. 96, no. 5, pp. 415–422, 2005.View at: Publisher Site | Google Scholar
D. Monaghan, E. O'Connell, F. L. Cruickshank et al., “Inhibition of protein synthesis and JNK activation are not required for cell death induced by anisomycin and anisomycin analogues,” Biochemical and Biophysical Research Communications, vol. 443, no. 2, pp. 761–767, 2014.View at: Publisher Site | Google Scholar
P. C. J. Schmeits, M. R. Katika, A. A. C. M. Peijnenburg, H. van Loveren, and P. J. M. Hendriksen, “DON shares a similar mode of action as the ribotoxic stress inducer anisomycin while TBTO shares ER stress patterns with the ER stress inducer thapsigargin based on comparative gene expression profiling in Jurkat T cells,” Toxicology Letters, vol. 224, no. 3, pp. 395–406, 2014.View at: Publisher Site | Google Scholar
H. F. Zhang, D. Z. Qian, Y. S. Tan et al., “Digoxin and other cardiac glycosides inhibit HIF-1a synthesis and block tumor growth,” PNAS, vol. 105, no. 50, pp. 19579–19586, 2008.View at: Google Scholar
D. Samanta, D. M. Gilkesa, P. Chaturvedia, L. Xiang, and G. L. Semenza, “Hypoxia-inducible factors are required for chemotherapy resistance of breast cancer stem cells,” Proceedings of the National Acadamy of Sciences of the United States of America, vol. 111, no. 50, pp. E5429–E5438, 2014.View at: Publisher Site | Google Scholar
J. G. M. C. Damoiseaux, R. Theunissen, C. P. M. Broeren, P. J. C. Van Breda Vriesman, and A. M. Duijvestijn, “Comparison of detection techniques for cytokine reverse transcriptase polymerase chain reaction; Digoxigenin-labeled polymerase chain reaction permits sensitive detection of cytokine mRNA in rat heart allografts,” Journal of Immunological Methods, vol. 217, no. 1-2, pp. 185–193, 1998.View at: Publisher Site | Google Scholar
T. H. Kim, Y. J. Shin, A. J. Won et al., “Resveratrol enhances chemosensitivity of doxorubicin in multidrug-resistant human breast cancer cells via increased cellular influx of doxorubicin,” Biochimica et Biophysica Acta (BBA) - General Subjects, vol. 1840, no. 1, pp. 615–625, 2014.View at: Publisher Site | Google Scholar
P. Chalasani, A. Stopeck, K. Clarke, and R. Livingston, “A pilot study of estradiol followed by exemestane for reversing endocrine resistance in postmenopausal women with hormone receptor-positive metastatic breast cancer,” The Oncologist, vol. 19, no. 11, pp. 1127-1128, 2014.View at: Publisher Site | Google Scholar
K. B. Wisinski, W. Xu, A. J. Tevaarwerk et al., “Targeting Estrogen Receptor Beta in a Phase 2 Study of High-Dose Estradiol in Metastatic Triple-Negative Breast Cancer: A Wisconsin Oncology Network Study,” Clinical Breast Cancer, vol. 16, no. 4, pp. 256–261, 2016.View at: Publisher Site | Google Scholar
L. Silvey, J. T. Carpenter Jr., R. H. Wheeler, J. Lee, and C. Conolley, “A randomized comparison of haloperidol plus dexamethasone versus prochlorperazine plus dexamethasone in preventing nausea and vomiting in patients receiving chemotherapy for breast cancer,” Journal of Clinical Oncology, vol. 6, no. 9, pp. 1397–1400, 1988.View at: Publisher Site | Google Scholar
M. Markman, V. Sheidler, D. S. Ettinger, S. A. Quaskey, and E. D. Mellits, “Antiemetic efficacy of dexamethasone. Randomized, double-blind, crossover study with prochlorperazine in patients receiving cancer chemotherapy,” The New England Journal of Medicine, vol. 311, no. 9, pp. 549–552, 1984.View at: Publisher Site | Google Scholar
M. Kogiso, T. Sakai, K. Mitsuya, T. Komatsu, and S. Yamamoto, “Genistein suppresses antigen-specific immune responses through competition with 17β-estradiol for estrogen receptors in ovalbumin-immunized BALB/c mice,” Nutrition Journal , vol. 22, no. 7-8, pp. 802–809, 2006.View at: Publisher Site | Google Scholar
J. S. Strobl, K. L. Kirkwood, T. K. Lantz, M. A. Lewine, V. A. Peterson, and J. F. Worley, “Inhibition of Human Breast Cancer Cell Proliferation in Tissue Culture by the Neuroleptic Agents Pimozide and Thioridazine,” Cancer Research, vol. 50, no. 17, pp. 5399–5405, 1990.View at: Google Scholar
M. T. Ene, “Nonparametric Statistical Methods,” Statistics in Medicine, vol. 19, no. 10, pp. 1386–1388, 1999.View at: Google Scholar
S. Sarvagalla, V. K. Singh, Y.-Y. Ke et al., “Identification of ligand efficient, fragment-like hits from an HTS library: Structure-based virtual screening and docking investigations of 2H- and 3H-pyrazolo tautomers for Aurora kinase A selectivity,” Journal of Computer-Aided Molecular Design, vol. 29, no. 1, pp. 89–100, 2015.View at: Publisher Site | Google Scholar
P. Liu, W. Li, Z.-H. Li et al., “Comparisons of pharmacokinetic and tissue distribution profile of four major bioactive components after oral administration of Xiang-Fu-Si-Wu Decoction effective fraction in normal and dysmenorrheal symptom rats,” Journal of Ethnopharmacology, vol. 154, no. 3, pp. 696–703, 2014.View at: Publisher Site | Google Scholar
L. Liu, H. Y. Ma, Y. P. Tang et al., “Discovery of estrogen receptor a modulators from natural compounds in Si-Wu-Tang series decoctions using estrogen-responsive MCF-7 breast cancer cells,” Bioorganic & Medicinal Chemistry Letters, vol. 22, no. 1, pp. 154–163, 2012.View at: Google Scholar
K.-D. Yu, N.-Y. Rao, A.-X. Chen, L. Fan, C. Yang, and Z.-M. Shao, “A systematic review of the relationship between polymorphic sites in the estrogen receptor-beta (ESR2) gene and breast cancer risk,” Breast Cancer Research and Treatment, vol. 126, no. 1, pp. 37–45, 2011.View at: Publisher Site | Google Scholar
W. Jiang, Studies on synthesis and structure characterization and biological activity of orientin-zinc complexes of trollius chinensis bunge, Hebei North University, 2013.
Y. Y. Zhang, T. H. Mu, and M. Zhang, “Effects of modified sweet potato pectins on the proliferation of cancer cells,” Scientia Agricultura Sinica, vol. 45, no. 9, pp. 1798–1806, 2012.View at: Google Scholar
S. F. Yan, S. X. Wang, Y. B. Zhou et al., “Treatment of 91 cases of edematous upper limb following mammary cancer surgery by, Taohong Siwu Dicoction,” Shanghai journal of traditional Chinese medicine, vol. 43, no. 6, pp. 39-40, 2009.View at: Google Scholar
J. Dong, “Effects of Tao-hong Si-wu Decoction Combined with Neoadjuvant Chemotherapy on,” Guiding Journal of Traditional Chinese Medicine and Pharmacy, vol. 20, no. 5, pp. 41–43, 2014.View at: Google Scholar
Q. Hao, J. Wang, J. Niu et al., “Study on phytoestrogenic-like effects of four kinds of Chinese medicine including Radix Rehmanniae Preparata, Radix Paeoniae Alba, Radix Angelicae Sinensis, Rhizoma Chuanxiong,” China Journal of Chinese Materia Medica, vol. 34, no. 5, pp. 620–624, 2009.View at: Google Scholar