About this Journal Submit a Manuscript Table of Contents
Genetics Research International
Volume 2013 (2013), Article ID 546909, 7 pages
http://dx.doi.org/10.1155/2013/546909
Research Article

Regression Modeling and Meta-Analysis of Diagnostic Accuracy of SNP-Based Pathogenicity Detection Tools for UGT1A1 Gene Mutation

1Golestan Blv. Toxicology Research Center, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
2Genetic Department, Faculty of Science, Shahid Chamran University, Ahvaz, Iran
3Department of Medical Genetics, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
4Research Center of Thalassemia & Hemoglobinopathy, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran

Received 11 May 2013; Revised 30 June 2013; Accepted 12 July 2013

Academic Editor: Kenta Nakai

Copyright © 2013 Fakher Rahim et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Aims. This review summarized all available evidence on the accuracy of SNP-based pathogenicity detection tools and introduced regression model based on functional scores, mutation score, and genomic variation degree. Materials and Methods. A comprehensive search was performed to find all mutations related to Crigler-Najjar syndrome. The pathogenicity prediction was done using SNP-based pathogenicity detection tools including SIFT, PHD-SNP, PolyPhen2, fathmm, Provean, and Mutpred. Overall, 59 different SNPs related to missense mutations in the UGT1A1 gene, were reviewed. Results. Comparing the diagnostic OR, our model showed high detection potential (diagnostic OR: 16.71, 95% CI: 3.38–82.69). The highest MCC and ACC belonged to our suggested model (46.8% and 73.3%), followed by SIFT (34.19% and 62.71%). The AUC analysis showed a significance overall performance of our suggested model compared to the selected SNP-based pathogenicity detection tool (). Conclusion. Our suggested model is comparable to the well-established SNP-based pathogenicity detection tools that can appropriately reflect the role of a disease-associated SNP in both local and global structures. Although the accuracy of our suggested model is not relatively high, the functional impact of the pathogenic mutations is highlighted at the protein level, which improves the understanding of the molecular basis of mutation pathogenesis.

1. Introduction

Crigler-Najjar syndrome (CNS) (MIM nos. 218800, 606785) type I and type II are inherited as autosomal recessive conditions that is resulted from mutations in the UGT1A1 gene (UGT1A1; MIM nos. 191740) [14]. Type I is characterized by almost complete absence of UGT1A1 enzyme activity, and these patients are refractory to phenobarbital treatment, while type II is a less severe form of deficiency [5, 6]. Patients with CNS are at permanent risk of developing severe neurologic complications such as hearing problems, mental retardation, and choreoathetosis due to severe unconjugated hyperbilirubinemia [7]. It is well known that UGT1A1 is expressed specifically in the liver and that it is difficult to perform an expression analysis directly on the patients by invasive liver biopsy but to state that the mutation causes inactivation of the enzyme you could perform an in vitro functional study by cloning the mutated cDNA of UGT1A1 in an expression vector. The constructs could be transfected in hepatic cell lines as HepG2 or HUH7. The expression analysis on these cells overexpressing the mutated forms of UGT1A1 will allow you to finally demonstrate the inactivation of the enzyme [8]. The UGT1A1 gene comprises five consecutive exons located on chromosome 2q37 by which complete or partial inactivation of any exon causes CNS. Single variations in deoxyribonucleic acid (DNA) base pairs responsible for protein, called coding which is single nucleotides polymorphism (SNP), leads to changes in amino acids that ultimately affect the protein structure and function. Different such types of SNPs include, missense mutations, nonsense, silent mutations, and splice-site mutations. The majority of missense mutations leads to considerable variation in the protein structure and function, causing the disease symptoms. Data about nonsynonymous SNPs exists in public repositories such as SWISSPROT [9], dbSNP [10], and HGVBASE [11].

Genetic methods including the detection of genes linked to the disease phenotypes and the identification of aberrant functions of these genes have, in recent years, provided worthy understanding into the biological foundations of genetic mutation [12]. The present review summarized all available evidence on the accuracy of SNP-based pathogenicity detection tools and introducing regression model based on different scores including functional scores, mutation score, and genomic variation degree and compared the results to the published clinical result.

2. Materials and Methods

2.1. SNP Data Sources and Collection

An inclusive search was done to find all CNS-related mutations. The major data repositories, including HGMD, dbSNP, SNPdbe, and Ensembl, were reviewed. All CNS-related mutations were extracted and double checked for duplicated queries and then tabulated (Table 1).

tab1
Table 1: Prediction results of SNP-based pathogenicity detection tools compared with the published results.
2.2. Inclusion Criteria

Only UGT1A1-gene-related missense mutations were included.

2.3. Exclusion Criteria

Other types of mutation such as synonymous or nonsense were excluded.

2.4. Data Extraction

The pathogenicity prediction was done using SNP-based detection tools including SIFT [13], PHD-SNP [14], PolyPhen2 [15], fathmm [16], Provean [17], and Mutpred [18]. Then a regression model was designed using functional scores, mutation score, and genomic variation degree. For each SNP-based pathogenicity detection tool and our regression model, we extracted a table including positive prediction of the disease (True Positive, TP), negative prediction as neutral (true negative, TN), positive prediction in nondisease (false positive, FP), and negative prediction in disease (false negative, FN). In order to assess the phenotypic characterization and clinical features of the disease of interest, we searched databases, including SWISSPROT [9], dbSNP [10], Ensembl [19], OMIM [20], DECIPHER [21], and HGVBASE [11]. Furthermore, we compared the results of SNP-based pathogenicity detection tools with the results of phenotypic description tools. Then we calculated the diagnostic odds ratio (diagnostic OR), which is a single indicator of test performance and varies between 0 and infinity [22].

2.5. Statistical Analysis

All the analyses were done by SPSS 16.0. A regression model was designed using three categories, including functional score [23], structural score (GV, genomic variation score) [24], and conservation score [25]. Each SNP-based pathogenicity detection tool was compared by the reference values using logistic regression. The sensitivity , specificity , accuracy , diagnostic , and Matthew’s correlation coefficient () were calculated using the following formula:

The metadisk was used to calculate individual and pooled diagnostic OR, sensitivity, specificity, negative likelihood ratio, and positive likelihood ratio [26]. We also compared the AUC (area under curve), which is a popular index of the overall performance of a test, using the summary receiver operating characteristic (SROC) curve [27].

3. Results

Overall, 59 different SNPs related to missense mutations in the UGT1A1 gene were reviewed using the designed protocol (Figure 1). Our regression model was as y = 3.39 + (−0.24 × functional score) + (−0.14 × GV score) + (−2.44 × conservation score). Comparing the diagnostic OR, our model showed high detection potential (diagnostic OR: 16.71, 95% CI: 3.38–82.69) (Figure 2). The highest MCC and ACC was belonged to our suggested model (46.8% and 73.3%), followed by SIFT (34.19% and 62.71%) (Table 2). The SROC curves reflected an acceptable and fairly good overall diagnostic performance for our suggested model compared to the SNP-based pathogenicity detection tools (Figure 3). The AUC analysis showed a significance overall performance of our suggested model compared to the selected SNP-based pathogenicity detection tool (Table 3).

tab2
Table 2: Calculated Matthew’s correlation coefficient (MCC) and accuracy (ACC) of the selected SNP-based pathogenicity detection tools and suggested model.
tab3
Table 3: Area under curve for all the selected SNP-based pathogenicity detection tools.
546909.fig.001
Figure 1: Flowchart of searching for SNPs.
fig2
Figure 2: The individual and pooled diagnostic OR, sensitivity, specificity, negative likelihood ratio, positive likelihood ratio.
546909.fig.003
Figure 3: The summary receiver operating characteristic (SROC) curve of the selected SNP-based pathogenicity detection tools.

4. Discussion

Since the late 1990s, the initiation of research using genetic testing or molecular medicine, development of diagnostics accuracy tests, and molecular assays that measure levels of genes or specific mutations are used to provide a specific therapy for an individual’s diseases. We suggested a regression model based on different scores including functional scores, conservation score, and genomic variation degree and compared the results to the published clinical result as reference. We observed the effect of a set of disease-causing missense mutations, determined from the general population. The susceptibility of Mendelian inherited disease is most frequently associated with SNPs; hence, the mechanisms by which this occurs are still poorly known. From a biological point of view, the mutated residues are important for the proper functioning of a suitable protein structure [28].

Genetic variation in phenotype of the diseases is often difficult to detect because of the complex genetic nature of these species. Using functional characteristics of the genetic mutation will provide a powerful tool to uncovering genetic traits in more complex species and provide novel insights into the molecular mechanisms of the diseases [29]. More importantly, the associations between genetic variations of SNPs of candidate genes that are selected to represent the phenotype are variable and an important feature from the disease study point of view [30].

Sensitivity was not reduced, while higher sensitivity was observed in our suggested model followed by PolyPhen2, Mutpred, and SIFT. We compared our suggested model to several well-established SNP-based pathogenicity detection tools, by which the satisfactory performance of our model and SIFT indicates the importance of a mutation position in the context of the entire protein. It is therefore reasonable to believe that analyzing the results of some SNP-based pathogenicity detection tools such as, our proposed model, SIFT and PolyPhen2 is both feasible and promising but not very excellent.

Saunders and Baker [31] and Bao and Cui [32] claimed that in case of unavailability of the conservation score, structural characteristics are valuable predictors. In this study we support using the sequence conservation score which is a good predictor and showed that an acceptable level of accuracy is achieved using the conservation score. Dobson et al., used machine learning methods to measure the sequence conservation score and showed that it is the most powerful single predictor and reported a high level of accuracy using the conservation score alone [33]. They also reported higher accuracy in structural characteristics in combination with the conservation score. We also showed that structural characteristics in combination with the conservation score improves prediction accuracy and can reduce the error rate of the conservation score alone.

Ng and Henikoff used sequence and/or structure to predict the effect of a missense mutation on protein function in a mathematical model and claimed that their suggested model is a good SNP-based pathogenicity detection tools [13]. Capriotti et al. [14] developed a mathematical method that started from the protein sequence information, which can predict whether a new phenotype derived from a nsSNP can be related to a genetic disease in humans. They reported more than 74% accuracy in predicting whether a single point mutation can be disease related or not. Stitziel et al. [15] introduced a tool based on the hidden Markov models (HMM) for analyzing sequence homology of SNPs and reported 68% accuracy in predicting whether a single point mutation can be disease related or not. Shihab et al. [16], described a functional analysis Through Hidden markov models software and server and reported 71% accuracy in the predicton, which was less than SIFT (74%) but equal to PolyPhen2 (71%). Choi et al. [17] developed a new algorithm, which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions and in-frame insertions and deletions. They reported 84.8% accuracy compared to SIFT (84.5%) and PolyPhen2 (84.7%) in whether predicting that mutation can be disease related or not. In the present study we observed the highest accuracy with our suggested model as 73.33% compared with SIFT (62.71%) followed by PolyPhen2 and Mutpred (61.02%, in both).

5. Conclusions

Our suggested model is comparable to the well-established SNP-based pathogenicity detection tools and can appropriately reflect the role of a disease-associated SNP in both local and global structures. A major drawback of the weighted SNP-based pathogenicity detection tools is the inherited restriction that falls within conserved protein domains. Hence, unlike other sequence-based prediction tools, which are too slow for practical use in large-scale sequencing projects, the weighted tools are computationally inexpensive and fast. Although the accuracy of our suggested model is not relatively high, the functional impact of the pathogenic mutations at the protein level is highlighted, which improves the understanding of the molecular basis of mutation pathogenesis.

References

  1. B. Lodoso Torrecilla, E. Palomo Atance, C. Camarena Grande et al., “Crigler-Najjar syndrome: diagnosis and treatment,” Anales de Pediatria, vol. 65, no. 1, pp. 73–78, 2006. View at Publisher · View at Google Scholar · View at Scopus
  2. K. M. Nair, P. Lohse, and S. Nampoothiri, “Crigler-Najjar syndrome type 2: novel UGT1A1 mutation,” Indian Journal of Human Genetics, vol. 18, no. 2, pp. 233–234, 2012.
  3. H. Sagili, N. Pramya, D. Jayalaksmi, and R. Rani, “Crigler-Najjar syndrome II and pregnancy outcome,” Journal of Obstetrics and Gynaecology, vol. 32, no. 2, pp. 188–189, 2012. View at Publisher · View at Google Scholar · View at Scopus
  4. H. Aloulou, A. B. Thabet, S. Khanfir et al., “Type I Crigler Najjar syndrome in Tunisia: a study of 30 cases,” Tunisie Medicale, vol. 88, no. 10, pp. 707–709, 2010. View at Scopus
  5. A. Iolascon, A. Melon, B. Coppola, and M. C. Rosatelli, “Crigler-Najjar syndrome type II resulting from three different mutations in the bilirubin uridine 5′-diphosphate-glucuronosyltransferase (UGT1A1) gene,” Journal of Medical Genetics, vol. 37, no. 9, pp. 712–713, 2000. View at Scopus
  6. M. I. Shevell, B. Bernard, and J. W. Adelson, “Crigler-Najjar syndrome type I: treatment by home phototherapy followed by orthotopic hepatic transplantation,” Journal of Pediatrics, vol. 110, no. 3, pp. 429–431, 1987. View at Scopus
  7. M. I. Shevell, A. Majnemer, and D. Schiff, “Neurologic perspectives of Crigler-Najjar syndrome type I,” Journal of Child Neurology, vol. 13, no. 6, pp. 265–269, 1998. View at Scopus
  8. M. Kanou, T. Usui, H. Ueyama, H. Sato, I. Ohkubo, and T. Mizutani, “Stimulation of transcriptional expression of human UDP- glucuronosyltransferase 1A1 by dexamethasone,” Molecular Biology Reports, vol. 31, no. 3, pp. 151–158, 2004. View at Publisher · View at Google Scholar · View at Scopus
  9. B. Boeckmann, A. Bairoch, R. Apweiler et al., “The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003,” Nucleic Acids Research, vol. 31, no. 1, pp. 365–370, 2003. View at Publisher · View at Google Scholar · View at Scopus
  10. S. T. Sherry, M.-H. Ward, M. Kholodov et al., “DbSNP: the NCBI database of genetic variation,” Nucleic Acids Research, vol. 29, no. 1, pp. 308–311, 2001. View at Scopus
  11. D. Fredman, G. Munns, D. Rios et al., “HGVbase: a curated resource describing human DNA variation and phenotype relationships,” Nucleic Acids Research, vol. 32, pp. D516–D519, 2004. View at Scopus
  12. A. Szalontai and K. Csiszar, “Genetic insights into the functional elements of language,” Human Genetics, 2013. View at Publisher · View at Google Scholar
  13. P. C. Ng and S. Henikoff, “Predicting the effects of amino acid substitutions on protein function,” Annual Review of Genomics and Human Genetics, vol. 7, pp. 61–80, 2006. View at Publisher · View at Google Scholar · View at Scopus
  14. E. Capriotti, R. Calabrese, and R. Casadio, “Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information,” Bioinformatics, vol. 22, no. 22, pp. 2729–2734, 2006. View at Publisher · View at Google Scholar · View at Scopus
  15. N. O. Stitziel, Y. Y. Tseng, D. Pervouchine, D. Goddeau, S. Kasif, and J. Liang, “Structural location of disease-associated single-nucleotide polymorphisms,” Journal of Molecular Biology, vol. 327, no. 5, pp. 1021–1030, 2003. View at Publisher · View at Google Scholar · View at Scopus
  16. H. A. Shihab, J. Gough, D. N. Cooper et al., “Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models,” Human Mutation, vol. 34, no. 1, pp. 57–65, 2013.
  17. Y. Choi, G. E. Sims, S. Murphy, J. R. Miller, and A. P. Chan, “Predicting the functional effect of amino acid substitutions and indels,” PLoS ONE, vol. 7, no. 10, Article ID e46688, 2012.
  18. B. Li, V. G. Krishnan, M. E. Mort et al., “Automated inference of molecular mechanisms of disease from amino acid substitutions,” Bioinformatics, vol. 25, no. 21, pp. 2744–2750, 2009. View at Publisher · View at Google Scholar · View at Scopus
  19. P. Flicek, M. R. Amode, D. Barrell, K. Beal, et al., “Ensembl 2012,” Nucleic Acids Research, vol. 40, pp. D84–D90, 2012.
  20. A. Hamosh, A. F. Scott, J. S. Amberger, C. A. Bocchini, and V. A. McKusick, “Online mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders,” Nucleic Acids Research, vol. 33, pp. D514–D517, 2005. View at Publisher · View at Google Scholar · View at Scopus
  21. H. V. Firth, S. M. Richards, A. P. Bevan et al., “DECIPHER: database of chromosomal imbalance and phenotype in humans using ensembl resources,” The American Journal of Human Genetics, vol. 84, no. 4, pp. 524–533, 2009. View at Publisher · View at Google Scholar · View at Scopus
  22. A. S. Glas, J. G. Lijmer, M. H. Prins, G. J. Bonsel, and P. M. M. Bossuyt, “The diagnostic odds ratio: a single indicator of test performance,” Journal of Clinical Epidemiology, vol. 56, no. 11, pp. 1129–1135, 2003. View at Publisher · View at Google Scholar · View at Scopus
  23. B. Reva, Y. Antipin, and C. Sander, “Predicting the functional impact of protein mutations: application to cancer genomics,” Nucleic Acids Research, vol. 39, no. 17, p. e118, 2011. View at Publisher · View at Google Scholar · View at Scopus
  24. E. Mathe, M. Olivier, S. Kato, C. Ishioka, P. Hainaut, and S. V. Tavtigian, “Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods,” Nucleic Acids Research, vol. 34, no. 5, pp. 1317–1325, 2006. View at Publisher · View at Google Scholar · View at Scopus
  25. H. Ashkenazy, E. Erez, E. Martz, T. Pupko, and N. Ben-Tal, “ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids,” Nucleic Acids Research, vol. 38, no. 2, Article ID gkq399, pp. W529–W533, 2010. View at Publisher · View at Google Scholar · View at Scopus
  26. J. Zamora, V. Abraira, A. Muriel, K. Khan, and A. Coomarasamy, “Meta-DiSc: a software for meta-analysis of test accuracy data,” BMC Medical Research Methodology, vol. 6, article 31, 2006. View at Publisher · View at Google Scholar · View at Scopus
  27. S. D. Walter, “Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data,” Statistics in Medicine, vol. 21, no. 9, pp. 1237–1256, 2002. View at Publisher · View at Google Scholar · View at Scopus
  28. Z. Wang and J. Moult, “SNPs, protein structure, and disease,” Human Mutation, vol. 17, no. 4, pp. 263–270, 2001. View at Publisher · View at Google Scholar · View at Scopus
  29. M. C. Shelden and U. Roessner, “Advances in functional genomics for investigating salinity stress tolerance mechanisms in cereals,” Frontiers in Plant Science, vol. 4, article 123, 2013.
  30. S. Rajasekaran, R. M. Kanna, N. Senthil et al., “Phenotype variations affect genetic association studies of degenerative disc disease: conclusions of analysis of genetic association of 58 single nucleotide polymorphisms with highly specific phenotypes for disc degeneration in 332 subjects,” Spine Journal, vol. 9430, no. 13, pp. 1529–30, 2013. View at Publisher · View at Google Scholar
  31. C. T. Saunders and D. Baker, “Evaluation of structural and evolutionary contributions to deleterious mutation prediction,” Journal of Molecular Biology, vol. 322, no. 4, pp. 891–901, 2002. View at Publisher · View at Google Scholar · View at Scopus
  32. L. Bao and Y. Cui, “Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information,” Bioinformatics, vol. 21, no. 10, pp. 2185–2190, 2005. View at Publisher · View at Google Scholar · View at Scopus
  33. R. J. Dobson, P. B. Munroe, M. J. Caulfield, and M. A. S. Saqi, “Predicting deleterious nsSNPs: an analysis of sequence and structural attributes,” BMC Bioinformatics, vol. 7, article 217, 2006. View at Publisher · View at Google Scholar · View at Scopus