Integrated Analysis of Multiscale Large-Scale Biological Data for Investigating Human Disease 2020View this Special Issue
Research Article | Open Access
The Blood Gene Expression Signature for Kawasaki Disease in Children Identified with Advanced Feature Selection Methods
Kawasaki disease (KD) is an acute vasculitis, accompanied by coronary artery aneurysm, coronary artery dilatation, arrhythmia, and other serious cardiovascular diseases. So far, the etiology of KD is unclear; it is necessary to study the molecular mechanism and related factors of KD. In this study, we analyzed the expression profiles of 75 DB (identifying bacteria), 122 DV (identifying virus), 71 HC (healthy control), and 311 KD (Kawasaki disease) samples. 332 key genes related to KD and pathogen infections were identified using a combination of advanced feature selection methods: (1) Boruta, (2) Monte-Carlo Feature Selection (MCFS), and (3) Incremental Feature Selection (IFS). The number of signature genes was narrowed down step by step. Subsequently, their functions were revealed by KEGG and GO enrichment analyses. Our results provided clues of potential molecular mechanisms of KD and were helpful for KD detection and treatment.
Kawasaki disease (KD) is an acute vasculitis, accompanied by coronary artery aneurysm, coronary artery dilatation, arrhythmia, and other serious cardiovascular diseases [1, 2]. It was first described by Japanese doctor Kawasaki in the late 1960s and has since been reported around the world with an increasing incidence [3, 4]. According to the recent survey, Japan owns the highest incidence of KD with 265 cases per 100,000 kids under the age of five . KD initially manifested as high fever, cervical lymphadenopathy, and mucocutaneous inflammation . Aspirin therapy and intravenous immunoglobulin (IVIG) injection play a key role in the effective treatment of KD, reducing the incidence of coronary artery complications from 5% to 25% . KD occurs not only in infant and childhood period but even in adolescence. The young age of onset may suggest that susceptibility may be related to the maturity of the immune system .
So far, the etiology of KD is unclear, but epidemiological features indicate that there may be a connection between it and as-yet-undefined pathogen infections. In the surveys of Uehara and Belay, the incidence of KD reached a peak in winter and spring, which was similar to that of many respiratory diseases. This seasonal feature provides a new thought that KD may be caused by one or several pathogens related to respiratory diseases [2, 8, 9]. According to statistics, 8-42% of patients was associated with respiratory virus infection and 33% with bacterial infection [10–13]. Viral infection leads to abnormal lymphocyte subsets and inflammation, which were positively correlated with the occurrence of vascular inflammation in KD . Rowley et al. found that the upregulation of expression of the interferon-stimulated gene was detected in acute lung tissue of KD, which illustrated the presence of cellular immune response after viral infection. They also observed that coronary artery inflammation of KD was characterized by antiviral immune response, including the upregulation of related genes induced by type I interferon and activation of cytotoxic T lymphocytes [15–17]. A related study suggested that some common respiratory viruses, such as enteroviruses, adenoviruses, coronaviruses, and rhinoviruses, were associated with KD cases . It is reported that among these viruses, human coronavirus (HCoV)-229E may be involved in the occurrence of KD . All of these strongly support the hypothesis that the infection of viruses and bacteria may be related to KD.
Up to date, there is no clinical specific diagnostic test for KD, and the diagnosis is still highly dependent on the symptoms and ultrasound imaging results . Therefore, it is still necessary to study the molecular mechanism and related factors of KD. In this study, we analyzed the expression profiles of DB (identifying bacteria), DV (identifying virus), HC (healthy control), and KD (Kawasaki disease) samples. By comparing their expression difference, we obtained 332 key genes related to KD and pathogen infections. Subsequently, their functions were revealed by KEGG and GO enrichment analysis. Our study provides a direction for the study of potential molecular mechanism of KD occurrence.
2. Materials and Methods
The gene expression profiles of 75 DB (identifying bacteria), 122 DV (identifying virus), 71 HC (healthy control), and 311 KD (Kawasaki disease) samples were downloaded from GEO (Gene Expression Omnibus) under the accession number GSE73464 .These samples were measured with two microarray platforms: Illumina HumanHT-12 V3.0 expression beadchip and Illumina HumanHT-12 V4.0 expression beadchip. Only the common 25,159 genes were analyzed. We performed quantile normalization to make sure the samples from a different batch were comparable using the R function “normalize.quantiles” in package preprocessCore (https://bioconductor.org/packages/preprocessCore/).
2.2. Boruta Feature Filtering
Since there were many genes and most of them were not associated with KD, we applied Boruta feature filtering  to detect all the relevant genes first. Boruta feature filtering is an advanced feature selection method wrapped with random forest. First, the real dataset was shuffled. Then, the importance of each feature was calculated. The features with real importance scores significantly higher than the shuffled ones were kept. Iteratively, all relevant features were selected. With Boruta feature filtering, we got a much smaller number of features for further analysis. We used python package Boruta (https://pypi.org/project/Boruta/) to apply the Boruta feature filtering.
2.3. Monte-Carlo Feature Selection
We adopted the Monte-Carlo Feature Selection (MCFS)  to rank the relevant features. It generated a number of randomly selected feature sets and then constructed many classification trees [23–25]. By ensembling these classification trees, the importance of each feature was calculated. In general, a feature was important if it had been selected by many classification trees. Suppose was the total number of relevant features selected by Boruta, features () were randomly selected for times, and trees for each of the subsets were constructed. Finally, there were classification trees. The relative importance (RI) of feature was where was the weighted classification accuracy of decision tree , was the information gain of node which was a decision rule of feature , was the number of samples under node , was the number of samples in decision tree , and and were adjusted parameters.
Based on RI, the features were ranked as where was the total number of relevant features, and the feature with smaller index had greater RI.
2.4. Incremental Feature Selection
After the features were ranked by MCFS, it was still difficult to decide how many features should be selected. To avoid arbitrary chosen cutoffs, we applied Incremental Feature Selection (IFS) [26–30]. For the selected and ranked feature list , we created a series of feature subsets by iteratively adding top ranking features into the previous feature subsets and then evaluated their performance by building SVM classifiers and applying a leave-one-out cross validation (LOOCV). The feature subset with the highest LOOCV accuracy was selected.
3. Results and Discussion
3.1. The Irrelevant Genes of Kawasaki Disease Were Filtered by Boruta
The genome-wide expression measurements of genes provided a powerful way to understand the molecular functions of Kawasaki disease. But most of the genes were not associated with KD and were noise for sophisticated bioinformatics analysis. Therefore, we applied the Boruta algorithm to filter the irrelevant genes and kept the relevant genes. After performing Boruta, the dimension of genes was reduced to 1,485 from the original 25,159 genes.
3.2. The Genes Were Ranked Based on Their Importance in Kawasaki Disease
For the 1,485 KD relevant genes, we wanted to know how strong it was associated with KD. To rank them based on their importance, we used the MCFS method. It can rank the genes based on their contributions in a series of classification trees. Since it was an ensemble learning method, the results were reliable and robust. The ranked genes and their relative importance were listed in Table S1. The top 663 genes were marked as “top Ranking” genes by MCFS.
3.3. The Kawasaki Disease Signature Genes Selected with IFS Method
The number of genes, 663, was still too large for gene signature. To further reduce the number of genes, we applied IFS procedure on the top 663 genes in Table S1. We tried different numbers of top ranked genes and calculated their SVM LOOCV accuracy. The IFS curve was shown in Figure 1. The highest LOOCV accuracy was 0.933 when 332 genes were used. Therefore, these 332 genes were selected as the final Kawasaki disease signature genes.
The confusion matrix of the 332 genes is shown in Table 1. It can be seen that most samples were correctly classified. Among the four groups, only DB had a relatively poor performance. The other three groups all had excellent performance.
3.4. The Biological Significance of 332 Selected Genes
We found that some genes have been confirmed to be associated with KD. For example, Haptoglobin (HP), an acute-phase protein synthesized by the liver, responds to inflammatory cytokines and has been thought to be associated with vascular disease [31, 32]. Huang et al. made a comprehensive evaluation of the acute phase reactants in patients with KD. It was found that the level of serum HP in KD cases was significantly higher than that in other febrile diseases. The ratio of HP/apolipoprotein A-I could accurately distinguish KD from other febrile diseases and could be used as an auxiliary laboratory index in the acute phase of KD . The early diagnosis and treatment of KD are very important for better prognosis and better survival rate in children. By studying the relationship between HP phenotype and coronary artery abnormal (CAA) formation in patients with KD, Lee et al. found that the clinical symptoms of HP-1 patients were delayed or incomplete, and the late diagnosis of KD was related to Haptoglobin phenotype . BAX is an essential medium for endogenous apoptosis of the permeable mitochondrial outer membrane . In the study of Tsujimoto et al., they measured the expression levels of antiapoptotic protein A1 and proapoptotic protein BAX and the ratio of A1/BAX in the viral infection group, bacterial infection group, KD group, and healthy children group. The results showed that the ratio of A1/BAX in patients with acute KD was significantly increased, suggesting that spontaneous apoptosis of PMN was inhibited in patients with acute KD .
To comprehensively study the biological functions of these 332 selected genes, we enrich them onto KEGG pathways and GO terms using a hypergeometric test. The enrichment results with FDR smaller than 0.05 are given in Table 2.
Since there were too many enriched “GO BP” and “GO CC” terms, only the top 10 terms were listed.
The results of KEGG enrichment analysis showed that the key genes were significantly correlated with influenza A, Epstein-Barr virus (EBV) infection, hepatitis C, systemic lupus erythematosus, and measles. Many previous studies have shown that KD is associated with influenza, coronavirus, and EBV. A case report in Wang et al. described a case of incomplete KD (IKD) consistent with influenza A (H1N1) pdm09 virus, suggesting that influenza infection may be a potential cause of KD . In addition, a study from Korea shows that the monthly incidence of KD showed significant correlation with the monthly overall viral detection, including human bocavirus, enterovirus, and influenza . A case-control study shows that specimens of respiratory secretions from 8 of 11 children with KD and from 1 of 22 control subjects tested positive for New Haven coronavirus (HCoV-NH). These data suggest that HCoV-NH infection is associated with KD . Unfortunately, in another study in Taiwan, researchers did not find any association between HCoV-NH infection and KD . A recent study reported an unusually high incidence of Kawasaki disease in children in a French centre for emerging infectious diseases: 17 cases in 11 days. In 82% of the cases, IgG antibodies for SARS-CoV-2 were detected, suggesting an association between the virus and this syndrome in children . As for the correlation between KD and EBV, Huang et al. found that EBV sequences were detected in 83% repeatedly tested KD patients within 3 months after onset; the proportion is much higher than that of the control group . These virological studies indicate that an unusual EBV-cell interaction may occur in KD. Besides, the prevalent ages at onset for KD and EBV infection are known to be similar in Korea and Japan . Pavone et al. suggest in the case report that KD is caused by a new virus that may cross-react with EBV. Therefore, in febrile children with EBV infection or similar conditions to consider the possibility of Kawasaki disease, it is necessary to make a differential diagnosis in order to start intravenous immunoglobulin therapy in time . Lee et al. reported that children with KD had normal immune response to EBV infection. Children with a history of KD appeared to be infected with EBV later than those with no history of KD . It has been reported that in patients with a history of KD, the occurrence of the second autoimmune disease should also be considered. In addition, the initial manifestations of lupus may mimic KD .
Furthermore, through the enrichment analysis of GO pathway, it is found that these key genes were enriched in the functions related to cytoplasmic composition, immune response, enzyme activity, and molecular binding. There is a lot of evidence that the innate immune system plays an important role in initiating and mediating host inflammation. The acute phase of Kawasaki disease is characterized by inhibitory T cell deficiency. The obvious activation of T cells, B cells, and monocytes is related to the increase of cytokines secreted by these immune effector cells. This immune activation promotes the injury of vascular endothelial cells in Kawasaki disease. The light and electron microscopic studies on the antigens in the ciliated bronchial epithelium of acute KD by Rowley and Shulman showed that the KD-associated antigens were located in the cytoplasmic inclusions consistent with the aggregation of viral proteins and associated nucleic acids .
To sum up, we analyzed the gene expression profiles of KD samples and identified the blood gene signature of KD. The functional analysis of these KD signature genes suggested that the correlation between KD and pathogen infection, especially the new influenza virus H1N1, should attract more attention. In addition, the potential mechanism of KD mediated by virus infection is also worthy of further study, which may provide scientific basis and new insights for the pathogenesis of KD. Our study also provides a direction for the study of etiology of KD in the future.
The gene expression data is available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE73464.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Table S1: the ranked genes and their relative importance calculated with MCFS. (Supplementary Materials)
- J. W. Newburger, M. Takahashi, and J. C. Burns, “Kawasaki disease,” Journal of the American College of Cardiology, vol. 67, no. 14, pp. 1738–1749, 2016.
- R. Uehara and E. D. Belay, “Epidemiology of Kawasaki disease in Asia, Europe, and the United States,” Journal of Epidemiology, vol. 22, no. 2, pp. 79–85, 2012.
- Y. Nakamura, H. Yanagawa, and T. Kawasaki, “Mortality among children with Kawasaki disease in Japan,” The New England Journal of Medicine, vol. 326, no. 19, pp. 1246–1249, 1992.
- G. F. Porter and T. L. Gentles, “Images in clinical medicine. Giant coronary-artery aneurysm in Kawasaki's disease,” The New England journal of medicine, vol. 345, no. 2, p. 98, 2001.
- N. Makino, Y. Nakamura, M. Yashiro et al., “Descriptive epidemiology of Kawasaki disease in Japan, 2011-2012: from the results of the 22nd nationwide survey,” Journal of Epidemiology, vol. 25, no. 3, pp. 239–245, 2015.
- H. Kato, S. Koike, M. Yamamoto, Y. Ito, and E. Yano, “Coronary aneurysms in infants and young children with acute febrile mucocutaneous lymph node syndrome,” The Journal of Pediatrics, vol. 86, no. 6, pp. 892–898, 1975.
- J. W. Newburger, M. Takahashi, A. S. Beiser et al., “A single intravenous infusion of gamma globulin as compared with four infusions in the treatment of acute Kawasaki syndrome,” The New England Journal of Medicine, vol. 324, no. 23, pp. 1633–1639, 1991.
- S. Singh, P. Vignesh, and D. Burgner, “The epidemiology of Kawasaki disease: a global update,” Archives of Disease in Childhood, vol. 100, no. 11, pp. 1084–1088, 2015.
- Y. Nakamura, “Kawasaki disease: epidemiology and the lessons from it,” International Journal of Rheumatic Diseases, vol. 21, no. 1, pp. 16–19, 2018.
- S. M. Benseler, B. W. McCrindle, E. D. Silverman, P. N. Tyrrell, J. Wong, and R. S. M. Yeung, “Infections and Kawasaki disease: implications for coronary artery outcome,” Pediatrics, vol. 116, no. 6, pp. e760–e766, 2005.
- L. Y. Chang, C. Y. Lu, P. L. Shao et al., “Viral infections associated with Kawasaki disease,” Journal of the Formosan Medical Association, vol. 113, no. 3, pp. 148–154, 2014.
- A. Jordan-Villegas, M. L. Chang, O. Ramilo, and A. Mejías, “Concomitant respiratory viral infections in children with Kawasaki disease,” The Pediatric Infectious Disease Journal, vol. 29, no. 8, pp. 770–772, 2010.
- J. L. Turnier, M. S. Anderson, H. R. Heizer, P. N. Jone, M. P. Glodé, and S. R. Dominguez, “Concurrent respiratory viruses and Kawasaki disease,” Pediatrics, vol. 136, no. 3, pp. e609–e614, 2015.
- J. Wang, F. Sun, H. L. Deng, and R. Q. Liu, “Influenza A (H1N1) pdm09 virus infection in a patient with incomplete Kawasaki disease: a case report,” Medicine, vol. 98, no. 15, article e15009, 2019.
- A. H. Rowley, “Is Kawasaki disease an infectious disorder?” International Journal of Rheumatic Diseases, vol. 21, no. 1, pp. 20–25, 2018.
- A. H. Rowley, S. C. Baker, S. T. Shulman et al., “Ultrastructural, immunofluorescence, and RNA evidence support the hypothesis of a “new” virus associated with Kawasaki disease,” The Journal of Infectious Diseases, vol. 203, no. 7, pp. 1021–1030, 2011.
- A. H. Rowley, K. M. Wylie, K. Y. A. Kim et al., “The transcriptional profile of coronary arteritis in Kawasaki disease,” BMC genomics, vol. 16, no. 1, 2015.
- K. Shirato, Y. Imada, M. Kawase, K. Nakagaki, S. Matsuyama, and F. Taguchi, “Possible involvement of infection with human coronavirus 229E, but not NL63, in Kawasaki disease,” Journal of Medical Virology, vol. 86, no. 12, pp. 2146–2153, 2014.
- H. C. Kuo, K. D. Yang, W. C. Chang, L. P. Ger, and K. S. Hsieh, “Kawasaki disease: an update on diagnosis and treatment,” Pediatrics and Neonatology, vol. 53, no. 1, pp. 4–11, 2012.
- V. J. Wright, J. A. Herberg, M. Kaforou et al., “Diagnosis of Kawasaki disease using a minimal whole-blood gene expression signature,” JAMA pediatrics, vol. 172, no. 10, article e182293, 2018.
- M. Kursa and W. Rudnicki, “Feature selection with the Boruta package,” Journal of Statistical Software, vol. 36, no. 11, pp. 1–13, 2010.
- M. Draminski, A. Rada-Iglesias, S. Enroth, C. Wadelius, J. Koronacki, and J. Komorowski, “Monte Carlo feature selection for supervised classification,” Bioinformatics, vol. 24, no. 1, pp. 110–117, 2007.
- D. Wang, J.-R. Li, Y.-H. Zhang, L. Chen, T. Huang, and Y.-D. Cai, “Identification of differentially expressed genes between original breast cancer and xenograft using machine learning algorithms,” Genes, vol. 9, no. 3, p. 155, 2018.
- L. Chen, J. Li, Y. H. Zhang et al., “Identification of gene expression signatures across different types of neural stem cells with the Monte-Carlo feature selection method,” Journal of Cellular Biochemistry, vol. 119, no. 4, pp. 3394–3403, 2018.
- X. Pan, X. Hu, Y. Zhang et al., “Identifying patients with atrioventricular septal defect in down syndrome populations by using self-normalizing neural networks and feature selection,” Genes, vol. 9, no. 4, p. 208, 2018.
- Y. Zhou, T. Huang, G. Huang, N. Zhang, X. Y. Kong, and Y. D. Cai, “Prediction of protein N-formylation and comparison with N-acetylation based on a feature selection method,” Neurocomputing, vol. 217, Supplement C, pp. 53–62, 2016.
- Y. Cai, T. Huang, L. Hu, X. Shi, L. Xie, and Y. Li, “Prediction of lysine ubiquitination with mRMR feature selection and analysis,” Amino Acids, vol. 42, no. 4, pp. 1387–1395, 2012.
- L. Chen, Y.-H. Zhang, G. Lu, T. Huang, and Y.-D. Cai, “Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways,” Artificial Intelligence in Medicine, vol. 76, pp. 27–36, 2017.
- P. W. Zhang, L. Chen, T. Huang, N. Zhang, X. Y. Kong, and Y. D. Cai, “Classifying ten types of major cancers based on reverse phase protein array profiles,” PLoS ONE, vol. 10, no. 3, article e0123147, 2015.
- L. Chen, S. Wang, Y.-H. Zhang et al., “Identify key sequence features to improve CRISPR sgRNA efficacy,” IEEE Access, vol. 5, pp. 26582–26590, 2017.
- K. Carter and M. Worwood, “Haptoglobin: a review of the major allele frequencies worldwide and their association with diseases,” International Journal of Laboratory Hematology, vol. 29, no. 2, pp. 92–110, 2007.
- N. Fiotti, C. Giansante, E. Ponte et al., “Atherosclerosis and inflammation. Patterns of cytokine regulation in patients with peripheral arterial disease,” Atherosclerosis, vol. 145, no. 1, pp. 51–60, 1999.
- M. Y. Huang, M. Gupta-Malhotra, J. J. Huang, F. K. Syu, and T. Y. Huang, “Acute-phase reactants and a supplemental diagnostic aid for Kawasaki disease,” Pediatric Cardiology, vol. 31, no. 8, pp. 1209–1213, 2010.
- W. C. Lee, K. P. Hwang, Y. T. King et al., “Late diagnosis of Kawasaki disease is associated with haptoglobin phenotype,” European Journal of Clinical Investigation, vol. 30, no. 5, pp. 379–382, 2000.
- A. Y. Robin, S. Iyer, R. W. Birkinshaw et al., “Ensemble properties of Bax determine its function,” Structure (London, England : 1993), vol. 26, no. 10, pp. 1346–1359.e5, 2018.
- H. Tsujimoto, S. Takeshita, K. Nakatani, Y. Kawamura, T. Tokutomi, and I. Sekine, “Delayed apoptosis of circulating neutrophils in Kawasaki disease,” Clinical and Experimental Immunology, vol. 126, no. 2, pp. 355–364, 2001.
- G. B. Kim, S. Park, B. S. Kwon, J. W. Han, Y. W. Park, and Y. M. Hong, “Evaluation of the temporal association between Kawasaki disease and viral infections in South Korea,” Korean Circulation Journal, vol. 44, no. 4, pp. 250–254, 2014.
- F. Esper, E. D. Shapiro, C. Weibel, D. Ferguson, M. L. Landry, and J. S. Kahn, “Association between a novel human coronavirus and Kawasaki disease,” The Journal of Infectious Diseases, vol. 191, no. 4, pp. 499–502, 2005.
- L.-Y. Chang, B. L. Chiang, C. L. Kao et al., “Lack of association between infection with a novel human coronavirus (HCoV), HCoV-NH, and Kawasaki disease in Taiwan,” The Journal of Infectious Diseases, vol. 193, no. 2, pp. 283–286, 2006.
- A. Moreira, “Kawasaki disease linked to COVID-19 in children,” Nature Reviews Immunology, 2020.
- X. Huang, P. Huang, L. Zhang et al., “Influenza infection and Kawasaki disease,” Revista da Sociedade Brasileira de Medicina Tropical, vol. 48, no. 3, pp. 243–248, 2015.
- M. Okano, N. Hase, Y. Sakiyama, and S. Matsumoto, “Long-term observation in patients with Kawasaki syndrome and their relation to Epstein-Barr virus infection,” The Pediatric Infectious Disease Journal, vol. 9, no. 2, pp. 139-140, 1990.
- P. Pavone, S. Cocuzza, E. Passaniti et al., “Otorrhea in Kawasaki disease diagnosis complicated by an EBV infection: coincidental disease or a true association,” European Review for Medical and Pharmacological Sciences, vol. 17, no. 7, pp. 989–993, 2013.
- S. J. Lee, K. Y. Lee, J. W. Han, J. S. Lee, and K. T. Whang, “Epstein-Barr virus antibodies in Kawasaki disease,” Yonsei Medical Journal, vol. 47, no. 4, pp. 475–479, 2006.
- J. C. Diniz, R. T. Almeida, N. E. Aikawa, A. M. E. Sallum, P. T. Sakane, and C. A. Silva, “Kawasaki disease and juvenile systemic lupus erythematosus,” Lupus, vol. 21, no. 1, pp. 89–92, 2011.
- A. H. Rowley and S. T. Shulman, “New developments in the search for the etiologic agent of Kawasaki disease,” Current Opinion in Pediatrics, vol. 19, no. 1, pp. 71–74, 2007.
Copyright © 2020 Bing Hu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.