Six-Gene Signature Associated with Immune Cells in the Progression of Atherosclerosis Discovered by Comprehensive Bioinformatics Analyses
Background. As a multifaceted disease, atherosclerosis is often characterized by the formation and accumulation of plaque anchored to the inner wall of the arteries and causes some cardiovascular diseases and vascular embolism. Numerous studies have reported on the pathogenesis of atherosclerosis. However, fewer studies focused on both genes and immune cells, and the correlation of genes and immune cells was evaluated via comprehensive bioinformatics analyses. Methods. 29 samples of atherosclerosis-related gene expression profiling, including 16 human advanced atherosclerosis plaque (AA) and 13 human early atherosclerosis plaque (EA) samples from the Gene Expression Omnibus (GEO) database, were analyzed to get differentially expressed genes (DEGs) and the construction of protein and protein interaction (PPI) networks. Besides, we detected the relative fraction of 22 immune cell types in atherosclerosis by using the deconvolution algorithm of “cell type identification by estimating relative subsets of RNA transcripts (CIBERSORT).” Ultimately, based on the significantly changed types of immune cells, we executed the correlation analysis between DEGs and immune cells to discover the potential genes and pathways associated with immune cells. Results. We identified 17 module genes and 6 types of significantly changed immune cells. Correlation analysis showed that the relative percentage of T cell CD8 has negative correlation with the C1QB expression (, ), and the relative percentage of macrophage M2 has positive correlation with the CD86 expression (, ) in EA. Meanwhile, four gene expressions (CD53, C1QC, NCF2, and ITGAM) have a high correlation with the percentages of T cell CD8 and macrophages (M0 and M2) in AA samples. Conclusions. In this study, we suggested that the progression of atherosclerosis might be related to CD86, C1QB, CD53, C1QC, NCF2, and ITGAM and that it plays a role in regulating immune-competent cells such as T cell CD8 and macrophages M0 and M2. These results will enable studies of the potential genes associated with immune cells in the progression of atherosclerosis, as well as provide insight for discovering new treatments and drugs.
Atherosclerosis is a multifaceted, progressive, and chronic inflammatory arterial disease that is recognized to be the leading cause of morbidity and mortality around the world ). It is characterized by the formation and build-up of atherosclerosis plaque inside the damaged arteries [2, 3]. Plaque is composed of low-density lipoprotein (LDL) cholesterol, fat, calcium, and other substances existing in the blood, which can harden and narrow the arteries [4–7]. Many studies have shown that atherosclerosis can affect any arterial blood vessels in the body and can lead to the atherosclerosis-related diseases, including ischemic heart, carotid artery, peripheral artery, and chronic kidney diseases [8–13]. Risk factors include high blood pressure, abnormal cholesterol levels, diabetes, obesity, family genetic history, smoking, age, and an unhealthy lifestyle.
Data mining has been used in various applications, including sequencing , microarray gene expression analysis [15–17], single-nucleotide polymorphism detection [18, 19], and genomic loss and amplification (copy number variation) analysis [20, 21]. Using microarrays, integrated bioinformatics enables researchers to quickly identify differentially expressed target genes between atherosclerosis samples in a single experiment [22, 23]. CIBERSORT is a deconvolution computational method for quantifying immune cell fractions from bulk tissue gene expression profiles. This method can accurately calculate the relative proportion of 22 types of immune cell compositions in lesion samples [24, 25].
The detailed mechanism of the pathogenesis of atherosclerosis is unclear. Although studies have revealed that chronic inflammation can drive atherosclerosis, which is the leading cause of cardiovascular disease as confirmed by molecular and cellular experiments, fewer studies have been conducted to analyze the correlation of genes and immune cells in atherosclerosis-related big data.
In this study, we reanalyzed the GSE28829 dataset reported previously in Doring et al.’s team research  and detected potential target genes for atherosclerosis treatment from the perspective of big data analysis. We firstly identified candidate DEGs and significant immune cells. Then, we explored the correlation between gene expressions and the relative percentages of immune cells to identify potential gene signatures useful for the diagnosis and therapeutic treatment of atherosclerosis.
2. Materials and Methods
2.1. Data Acquisition
The robust multiarray averaging normalized microarray expression profile GSE28829  and affiliated annotation file were downloaded from the National Center Biotechnology Information Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) website , which was tested on the GPL570 platform based on the Affymetrix Human Genome U133 Plus 2.0 array. GSE28829 contains 13 early atherosclerotic plaque samples (EA group) and 16 advanced atherosclerotic plaque samples (AA group) from the human carotid artery. Figure 1(a) provides an overview of the analysis workflow.
2.2. Data Preprocessing
After the GSE28829 expression matrix was downloaded, probe identification was matched to the corresponding gene symbol. For multiprobes to one gene, we retained the probe showing a significant gene expression value after deleting the non-mRNA probe. Based on this gene expression matrix information, we identified the significant differentially expressed genes and immune cells.
2.3. Identification of DEGs
The limma package was utilized to identify differentially expressed genes (DEGs) between advanced atherosclerotic plaques and early atherosclerotic plaque samples in RStudio [27–29]. The criteria were as follows: (1) the adjusted , a moderate -test corrected by Benjamini and Hochberg’s method ; (2) log fold change (FC) of or log fold change (FC) of .
2.4. Gene Ontology and Pathway Analysis
Gene Ontology (GO) is used to describe the roles of genes and gene products in any organism based on existing biological knowledge and is divided into three independent branches: biological process (BP), cellular component (CC), and molecular function (MF) (Harris et al., 2004; Harris et al., 2006). Metabolic pathways and gene signaling networks based on available databases such as KEGG  and Reactome  were used to describe the pathway enrichment analyses. We used the DAVID website (https://david. http://ncifcrf.gov/) for gene annotation and visualization to perform the GO and pathway analysis. , calculated via Fisher’s exact test , was used as the threshold for statistical significance [32, 34].
2.5. PPI Network Construction and Module Analysis
First, the identified DEGs were uploaded to the STRING  (version 11.0) website which includes 2 billion interactions associated with 24.6 million proteins referred to 5090 organs. STRING was used to determine the PPIs between DEG-encoded proteins. Second, the minimum interaction score was set to 0.4. The PPI networks were constructed using Cytoscape software . The built-in Molecular Complex Detection (MCODE), a well-known automated method for detecting highly interconnected subgraphs as molecular complexes or clusters in large PPI networks was utilized to screen the modules in the PPI network. The correlated parameter criteria were set by default, except . Moreover, functional enrichment analysis was performed for DEGs in the significant module with , calculated via Fisher’s exact test , as the cutoff criterion.
2.6. Immune Cell Infiltration Analysis
Normalized gene expression data were utilized to evaluate the relative proportions of 22 types of infiltrating immune cells via using the CIBERSORT algorithm . The gene expression matrix was uploaded to the CIBERSORT online website (https://cibersort.stanford.edu) by setting the default signature matrix at 1000 permutations. CIBERSORT is a deconvolution algorithm that depends on a set of reference gene expression values (a “signature matrix” of 547 genes in 22 types of immune cells). Next, significant immune cells between EA and AA samples were identified with the threshold Wilcoxon test at .
2.7. Correlation Analysis of Genes and Immune Cells
Pearson correlation test analysis was carried out to illustrate the relationship between gene expressions and the relative percentages of immune cells in EA and AA samples, respectively . The value of the correlation coefficient between gene expressions and the relative proportion of immune cells could indicate the strong, weak, or no correlation. Based on the paired -test, was considered statistically significant.
2.8. Statistical Analysis
The moderate -test was used to identify differentially expressed genes. Fisher’s exact test was applied to perform GO and KEGG analysis. The Wilcoxon test was applied to immune cell analysis. Paired -test was applied to correlation analysis between genes and cells. All statistical analyses were carried out in R version 3.5.2 software.
3.1. Identification of DEGs
In the study, we identified 91 differentially expressed genes (DEGs) in the AA group compared to the EA group (Figure 1(b) and Table 1). Among them, 59 DEGs were upregulated (adjusted and ), and the remaining 32 DEGs were downregulated (adjusted and ).
3.2. GO and Pathway Analysis
DEGs were uploaded to the DAVID website to identify the GO and pathway terms. As shown in Figure 1(c), significantly enriched GO and pathway DEGs were involved in defense response (BP), extracellular space (CC), immunoglobulin receptor binding (MF), and Staphylococcus aureus infection (KEGG pathway). As shown in Table 2, the significant GO terms of upregulated DEGs were mainly enriched in defense response (BP), immune response (BP), and regulation of immune response (BP), while significant GO terms of downregulated DEGs mainly were enriched in muscle contraction (BP), muscle system process (BP), and movement of cell or subcellular component (BP). The pathway terms of upregulated DEGs were enriched in Staphylococcus aureus infection (KEGG), phagosome (KEGG), and leukocyte transendothelial migration (KEGG), while the pathways of downregulated DEGs were unavailable (Table 3).
3.3. PPI Network Construction and Module Analysis
After uploading the 91 DEGs into the STRING online database and downloading the TSV format file of the interaction of multiple genes to Cytoscape software for PPI network construction, 59 DEGs (48 upregulated and 11 downregulated genes) were filtered from the 91 DEGs to construct the PPI networks, which contained 59 nodes/genes and 306 edges (Figure 2(a)); 32 genes did not participate in the PPI networks. Among these 59 nodes/genes, 17 central nodes/genes in module 1 were identified by the MCODE app and significantly associated with immune system function (Figure 2(b), Figure S1).
3.4. Immune Cell Infiltration Analysis
We first used the CIBERSORT algorithm to investigate the relative proportion of the 22 subpopulations of immune cells among EA and AA samples (Figure 3(a)). The relative proportions of 6 types of immune cells were significantly different between the EA and AA groups (Figure 3(b)). The cell types were T cell CD8 (), T cell gamma delta (), monocytes (), macrophage M0 (), macrophage M2 (), and dendritic cells (activated) (). Among these 6 types of immune cells, we found that T cell CD8, monocytes, and dendritic cells (activated) in the EA group were present at higher fractions than in the AA group, while the other three types of immune cells showed the opposite results (Figure 3(b)).
3.5. Correlation Analysis of Genes and Immune Cells
Correlation analysis (Pearson test) was carried out to illustrate and display the relationship between candidate genes and immune cells in EA and AA samples (Figures 4 and 5). As shown, CD86 and macrophage M2 (, ) and C1QB and T cell CD8 (, ) represented good correlation (Figures 4(a) and 4(b)) in early atherosclerosis plaque samples. Moreover, most genes have a close correlation with immune cells in advanced atherosclerosis plaque samples. It is worth noting that four common genes (CD53, C1QC, NCF2, and ITGAM) from module 1 have a close correlation with T cell CD8 and macrophages M0 and M2 (Figures 5(a) and 5(b)). However, we have discarded that situation: although the value was less than 0.05 and had a high absolute value of , the scatter plot shows that the point distribution was aggregated to zero (supplement Figures S2-3).
Atherosclerosis is a disease caused by plaque accumulation within the arteries. Current studies have confirmed that immune cells including dendritic cells, several T cells, monocyte/macrophage subsets, and neutrophils are associated with atherosclerosis [2, 38–40]. It has been shown that specific therapies targeting the pro/anti-inflammatory cytokines such as CCL2, TNFα, and IL-6 have suggested slowing in the progression of atherosclerosis in animal models and might improve cardiovascular outcomes in human subjects in large-scale phase III trials. [41, 42]. Notably, the monoclonal antibody canakinumab, targeting to IL-1β, has reduced the risk of adverse cardiovascular events [41, 42].
In the current study, we aimed to identify the potential molecular gene signatures associated with the immune system during the progression of atherosclerotic disease. We first figured out the genes in module 1 (17 genes) and significantly changed types of immune cells (6 types of immune cells) between advanced atherosclerosis and early atherosclerosis. Afterward, according to the correlation analysis between genes and immune cells, we inferred that CD86 and C1QB have a good correlation with macrophage M2 and T cell CD8 in EA, respectively. What is more, most of the genes have associated with T cell CD8, macrophages (M0 and M2), and four common genes (CD53, C1QC, NCF2, and ITGAM), and all have correlation with the three types of immune cells in AA.
CD86 (cluster of differentiation 86) is a protein encoded by CD86, which is expressed on antigen-presenting cells (APCs) and provides costimulatory signals to T cells . Meletta et al. used CD86/CD80 as a probe for atherosclerosis imaging . Transfer of native Foxp3+ T cells showed a protective effect against experimental atherosclerosis (Ait-Oufella et al.; [44, 45]). CD53 (leukocyte surface antigen) is a member of the “tetraspan family” of membrane proteins and is expressed on various immune cells [46, 47]. CD53 can contribute to improving the transduction of CD2-generated signals in T cells and natural killer cells . C1QB (complement C1q B chain) or C1QC (complement C1q C chain) encodes the C-chain or B-chain polypeptide of serum complement subcomponent C1q, respectively, and deficiency of C1q is associated with glomerulonephritis and lupus erythematosus. The Bos et al. team has suggested that C1QB might be associated with atherosclerosis and coronary artery disease . What is more, the Khoonsari et al. group has revealed that the lower levels of C1QB and C1QC were involved in cell adhesion, migration, regulation of the synapse, and the immune system . NCF2 (neutrophil cytosol factor 2) encodes a subunit of NADPH oxidase, and mutation in this gene can result in chronic granulomatous disease . However, no research has revealed that NCF2 was involved in atherosclerosis. ITGAM (integrin alpha M) is known as complement receptor 3A (CR3A) or cluster of differentiation molecule 11B (CD11B)  and primarily expressed on the surface of innate immune cells . Recent reports revealed that NCF2 and ITGAM play significant immune-regulatory roles in autoimmune disease [54, 55]. To date, fewer studies were focused on how genes (C1QB, C1QC, NCF2, and ITGAM) regulate immune cells (T cell CD8 and macrophages M0 or M2) and their relationship between gene expressions and the relative percentages of immune cells.
Based on this fact, (1) atherosclerotic diseases are related to immune cells and (2) there are examples of researchers using modified genes (such as IL-1β, CCL2, and IL-6) to treat atherosclerotic diseases [41, 42]. The innovation of this study is to screen out gene signatures associated with immune cells in the progression of atherosclerosis. According to the relationship between gene expressions and the relative proportions of immune cells, these genes may interact with immune cells through some unknown cell membrane receptors or ligands. It may be possible to modify the interactions of these genes with membrane receptors or ligands to identify new therapeutic treatments for atherosclerosis, as well as mechanisms of atherosclerosis regulation.
Nevertheless, there exist some inevitable difficulties in this study that should be taken into consideration. For instance, the atherosclerosis-related datasets were fewer and not as easily acquired and collected from the open public database as cancer datasets, which leads to the lack of comprehensiveness of the study which cannot be used to verify our results. Although the limited sample size of atherosclerosis may reduce the confidence, the approaches and ideas in this study are helpful in enlightening the inspiration of other researchers. Of course, additional molecular and cellular experiments should be performed to assess their characteristics.
The identification of 6 specific gene signatures and correlated immune cells in the progression of atherosclerosis may give us a clue to explore the mechanism of cardiovascular disease and its therapeutic treatments.
|NCBI:||National Center for Biotechnology Information|
|GEO:||Gene Expression Omnibus|
|DEGs:||Differentially expressed genes|
|PPI:||Protein and protein interaction|
|CIBERSORT:||Cell type identification by estimating relative subsets of RNA transcripts|
|KEGG:||Kyoto Encyclopedia of Genes and Genomes|
|MCODE:||Molecular complex detection.|
The data associated with this article has been deposited in the NCBI-GEO website (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi).
Conflicts of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
BZ performed comparative analysis using bioinformatics tools. DW, YLL, XHZ, ZW, JLW, TS, LSD, YW, and YHZ participated in data analysis and discussion. BZ interpreted the data and wrote the manuscript. YLZ organized and offered funding for the project. All authors read and approved the final manuscript.
We also would like to thank Huabo Yang from the College of Foreign Languages and Cultures, Xiamen University, for correcting the grammar in the manuscript. We also thank Yingying Zhang (the General Hospital of PLA Rocket Force) for the statistics analysis. This study was supported by the National Natural Science Foundation of China (No. 81770294).
Figure S1: the chord plot for functional enrichments of module 1 genes. Figure S2: the scatterplot of correlation between CTSS expression and the relative proportion of T cell gamma delta. Gray-shaded areas in scatterplots represent the standard errors of the blue regression lines. : correlation coefficient. Figure S3: the scatterplot of correlation between 17 gene expressions and the proportion of dendritic cells activated. Gray-shaded areas in scatterplots represent the standard errors of the blue regression lines. : correlation coefficient. (Supplementary Materials)
W. Jin, W. Wu, K. Yang et al., “The single nucleotide polymorphisms of chromosome 9p21 and CD147 were relevant with the carotid plaque risk in acute cerebral infarction patients among Chinese Han population,” Journal of Molecular Neuroscience, vol. 70, pp. 1–11, 2020.View at: Publisher Site | Google Scholar
V. Aboyans, J.-B. Ricco, M.-L. E. L. Bartelink et al., “2017 ESC guidelines on the diagnosis and treatment of peripheral arterial diseases, in collaboration with the European Society for Vascular Surgery (ESVS),” European Heart Journal, vol. 39, no. 9, pp. 763–816, 2018.View at: Publisher Site | Google Scholar
S. S. Anand, S. Yusuf, V. Vuksan et al., “Differences in risk factors, atherosclerosis, and cardiovascular disease between ethnic groups in Canada: the Study of Health Assessment and Risk in Ethnic groups (SHARE),” The Lancet, vol. 356, no. 9226, pp. 279–284, 2000.View at: Publisher Site | Google Scholar
A. A. Brown, A. Viñuela, O. Delaneau, T. D. Spector, K. S. Small, and E. T. Dermitzakis, “Predicting causal variants affecting expression by using whole-genome sequencing and RNA-seq from multiple human tissues,” Nature Genetics, vol. 49, no. 12, pp. 1747–1751, 2017.View at: Publisher Site | Google Scholar
B. Phipson, S. Lee, I. J. Majewski, W. S. Alexander, and G. K. Smyth, “Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression,” The Annals of Applied Statistics, vol. 10, no. 2, pp. 946–963, 2016.View at: Publisher Site | Google Scholar
S. Bos, M. Phillips, G. F. Watts, A. J. M. Verhoeven, E. J. G. Sijbrands, and N. C. Ward, “Novel protein biomarkers associated with coronary artery disease in statin- treated patients with familial hypercholesterolemia,” Journal of Clinical Lipidology, vol. 11, no. 3, pp. 682–693, 2017.View at: Publisher Site | Google Scholar