Advances in Bioinformatics

Advances in Bioinformatics / 2016 / Article

Research Article | Open Access

Volume 2016 |Article ID 2632917 | 15 pages | https://doi.org/10.1155/2016/2632917

Bioinformatics Approach for Prediction of Functional Coding/Noncoding Simple Polymorphisms (SNPs/Indels) in Human BRAF Gene

Academic Editor: Ming Chen
Received26 Nov 2015
Revised10 May 2016
Accepted12 May 2016
Published10 Jul 2016

Abstract

This study was carried out for Homo sapiens single variation (SNPs/Indels) in BRAF gene through coding/non-coding regions. Variants data was obtained from database of SNP even last update of November, 2015. Many bioinformatics tools were used to identify functional SNPs and indels in proteins functions, structures and expressions. Results shown, for coding polymorphisms, 111 SNPs predicted as highly damaging and six other were less. For UTRs, showed five SNPs and one indel were altered in micro RNAs binding sites (3′ UTR), furthermore nil SNP or indel have functional altered in transcription factor binding sites (5′ UTR). In addition for 5′/3′ splice sites, analysis showed that one SNP within 5′ splice site and one Indel in 3′ splice site showed potential alteration of splicing. In conclude these previous functional identified SNPs and indels could lead to gene alteration, which may be directly or indirectly contribute to the occurrence of many diseases.

1. Introduction

Genetic alterations (mutations) in general can be divided into two categories, inheritable (germline mutations) with 2% to 4% occurrence and sporadic (somatic mutations) [1, 2]. BRAF coding gene, member of RAF family, located on chromosome seven (7q34), region from 140,715,951 to 140,924,764 base pairs which cover approximately 190 kb, is composed of 18 exons, and its translated protein name is “B-Raf proto-oncogene serine/threonine protein kinase.” This protein belongs to raf/mil family, which plays a role in regulating the MAP kinase/ERKs signaling pathway, which affects cell division, differentiation, and secretion [3]. Several studies reported the mutation prevalence in BRAF gene through various cancers, including non-Hodgkin lymphoma, colorectal cancer, malignant melanoma, thyroid carcinoma, non-small-cell lung carcinoma, and adenocarcinoma of lung [35]. Mutations in this gene have also been associated with various diseases such as cardiofaciocutaneous syndrome, a disease characterized by heart defects, mental retardation, and a distinctive facial appearance, Noonan syndrome, multiple lentigines syndrome or LEOPARD syndrome, giant congenital melanocytic nevus, and Erdheim-Chester disease [6, 7].

Single nucleotide polymorphisms (SNPs) markers are single-base changes in DNA sequence, with allele frequency of 1% or greater among population; it normally occurs throughout the genome with frequency of about one in every 1000 nucleotides, which is considered the simplest and common type of the genetic markers leading to DNA variation among individuals [8]. Nonsynonymous SNPs (nsSNPs) are one of coding SNPs types, important type of SNPs leading to the diversity of encoded human proteins, whereas they affect gene regulation by altering DNA and transcriptional binding factors, maintain the structural integrity of the cell, and affect proteins function in the different signal transduction pathways [9]. About 2% of the all known single nucleotide variants associated with genetic diseases are nonsynonymous SNPs and contribute to the functional diversity of the encoded proteins in the human population [10]. SNPs may be responsible for genetic diversity, evolution process, differences in traits, drugs response, and complex and common diseases such as diabetes, hypertension, and cancers. Therefore, identification and analysis of numerous SNP variations in genes may help in understanding their effects on genes product and their association with diseases and also could help in the development of new medical testing markers and individualized medication treatment [11].

1000 Genomes Project showed that most human genetic variation is represented by SNPs. Database of SNP (dbSNP) is one of the most databases serving as a central and public store for genetic variation since its initiation in September 1998 [12]. Any laboratory or individual can use the index variation, sequence information around polymorphism, and specific experimental conditions for further research applications. As with all NCBI resources, the data within dbSNP is available for free and in a variety of forms. In November 17, 2015, SNP database contained 160508575 number of Homo sapiens variants. From total number of variants, of which 144205811 were SNPs, 16064552 were Indels (single or multi-insertion/deletion). Database of SNP contains the results of HapMap and 1000 Genomes Projects (http://www.ncbi.nlm.nih.gov/snp/).

Through noncoding regions (3′ UTR, 5′ UTR), polymorphisms such as SNPs in microRNAs (miRNAs/mRNA) binding sites which are called mirSNPs can affect miRNAs function and then gene expression, resulting in many human diseases such as cancers [13]. Identification of SNPs responsible for phenotypes change is considered a difficulty, whereas it requires multiple testing for different SNPs in candidate genes [9]. One possible way to overcome this problem was to prioritize SNPs according to their structural and functional significance using different bioinformatics prediction tools. This study was focusing on functional SNPs within coding, 5′ UTR, 3′/5′ splice sites, transcription factor, and miRNA binding sites simple polymorphisms (SNPs/Indels) in human BRAF gene.

2. Materials and Methods

SNPs located in target gene were obtained from the database of SNPs (dbSNP); it is a public-domain archive for a broad collection of simple genetic polymorphisms. This collection of polymorphisms includes single-base nucleotide substitutions (SNPs), small-scale multibase deletions or insertions (also called deletion-insertion polymorphisms), and retroposable element insertions and microsatellite repeat variations (short tandem repeats or STRs) (http://www.ncbi.nlm.nih.gov/snp/). The related protein sequences are obtained from UniProt database (http://www.uniprot.org/).

SNP database contains SNPs or Indels within 3′/5′ UTR, 3′/5′ splice sites, coding synonymous, intron, and nonsynonymous which represent missense, nonsense, stop gain, and frameshift. In this study Homo sapiens SNPs and Indels (single insertion or deletion) within coding (nonsynonymous), 3′/5′ UTR, and 3′/5′ splice sites had been selected and submitted to bioinformatics tools for further investigation. Distributions of single variants are shown in Table 1.


RegionsNumber of SNPsNumber of IndelsTotal

All over BRAF gene9585111110696
Coding3502352
(i) Missense228228
(ii) Nonsense and stop gain44
(iii) Frame shift22
3′ UTR33131362
5′ UTR55
3′ splice site112
5′ splice site101

About the main diagram of SNPs analysis, for missense SNPs, analysis was done by using three tools (SIFT server, PolyPhen, and SNAP2) and SNPs predicted as functional or damaging by previous triple servers were arranged in Table 2. More information about triple predicted SNPs is shown in Table 3. For frameshift SNPs, the analysis was done using SIFT server. By the same token for 3′ UTR SNPs and Indels, PolymiRTS database was used (Table 6). After that, for 5′ UTR SNPs (in transcription factor binding sites), PROMO tool was used (Table 7). Lastly for 3′/5′ splice sites SNPs and Indels, analysis was done using HSF tool (Table 8).


SNP IDCh7 locationNucleotide changeProtein IDAmino acid changeClin/sig

rs150050723140534634T|GENSP00000288602Q93H
rs180177034140501336C|GENSP00000288602A246PPath/3
rs387906660140501350G|AENSP00000288602T241MPath/3
rs387906660G|CENSP00000288602T241R
rs387906661140501351T|GENSP00000288602T241PPath/5
rs397507466140501337T|GENSP00000288602L245FPath/2
rs397507467140501332A|GENSP00000288602F247SL.Path/1
rs397509343140501331A|CENSP00000288602F247LL.Path/1
rs267601317140494148G|AENSP00000288602P367L
rs199927105140482930G|TENSP00000288602P402H
rs121913348140481417C|GENSP00000288602G464APath/3
rs121913348C|AENSP00000288602G464VL.Path/1
rs121913348C|TENSP00000288602G464E
rs121913353140481412C|GENSP00000288602G466RL.Path/1
rs121913357140481403C|TENSP00000288602G469R
rs121913376140481397C|AENSP00000288602V471FPath/1
rs121913376C|TENSP00000288602V471IL.Path/1
rs180177033140481420A|CENSP00000288602I463SPath/1
rs397507473140481405A|GENSP00000288602F468SPath/2
rs121913349140481418C|GENSP00000288602G464R
rs121913371140481478G|AENSP00000288602R444W
rs121913351140481411C|GENSP00000288602G466APath/2
rs121913351C|AENSP00000288602G466V
rs121913351C|TENSP00000288602G466E
rs121913355140481402C|GENSP00000288602G469APath/1
rs121913355C|AENSP00000288602G469V
rs121913355C|TENSP00000288602G469E
rs180177036140477853C|GENSP00000288602L485FPath/2
rs180177038C|TENSP00000288602E501K
rs180177039T|CENSP00000288602E501G
rs397507474140477861T|GENSP00000288602K483QPath/2
rs397507475140477854A|GENSP00000288602L485SPath/2
rs397507476140477811T|GENSP00000288602K499NPath/1
rs397507477140477795G|AENSP00000288602L505FL.Path/1
rs375520366140476806G|AENSP00000288602P490SPath/1
rs180177041140476806C|GENSP00000288602G534RPath/1
rs397507479140476811C|TENSP00000288602C532YPath/1
rs180177040140453987T|GENSP00000288602N581HPath/2
rs180177040T|CENSP00000288602N581D
rs397507481140454006G|CENSP00000288602H574QPath/1
rs121913341140453150A|CENSP00000288602F595LPath/2
rs121913361140453149C|GENSP00000288602G596RUn.S/1
rs121913364140453134T|CENSP00000288602K601E
rs121913366140453145A|TENSP00000288602L597QPath/1
rs121913366A|CENSP00000288602L597R
rs121913370140453193T|CENSP00000288602N581SL.Path/1
rs121913375140453139G|AENSP00000288602T599IPath/1
rs397507483140453148C|AENSP00000288602G596VPath/3
rs397507484140453133T|AENSP00000288602K601IPath/1
L.Path/1
rs121913225140453151A|GENSP00000288602F595S
rs121913337140453153A|TENSP00000288602D594E
rs121913362140453159T|CENSP00000288602I592M
rs121913365140453132T|GENSP00000288602K601N
rs372569965140453127C|TENSP00000288602R603Q
rs180177042140449165A|TENSP00000288602D638EPath/2
rs397507485140439727C|TENSP00000288602R671QPath/1
rs397507486140439613T|CENSP00000288602Q709RPath/1
rs55715359140439664A|GENSP00000288602L692S
rs397507487140434543G|AENSP00000288602R719CPath/1
rs200490285140434452G|TENSP00000288602A749D
rs368528867140434542C|AENSP00000288602R719L
rs180177040140453987T|GENSP00000418033N9HPath/2
rs121913341140453150A|CENSP00000418033F23LPath/1
L.Path/2
rs121913361140453149C|GENSP00000418033G24RUn.S
rs121913364140453134T|CENSP00000418033K29EPath/2
L.Path/1
rs121913366140453145A|TENSP00000418033L25QPath/1
rs121913366A|CENSP00000418033L25R
rs121913375140453139G|AENSP00000418033T27IPath/1
rs397507483140453148C|AENSP00000418033G24YPath/3
rs397507484140453133T|AENSP00000418033K29IPath/1
L.Path/1
rs121913225140453151A|GENSP00000418033F23S
rs121913337140453153A|TENSP00000418033D22E
rs121913362140453159T|CENSP00000418033I20M
rs121913365140453132T|GENSP00000418033K29N
rs180177042140449165A|TENSP00000418033D66EPath/2
rs199927105140482930G|TENSP00000419060P10H
rs121913357140481403C|TENSP00000419060G77RPath/3
L.Path/1
Un.S/0
rs180177033140481420A|CENSP00000419060I71SPath/1
rs397507473140481405A|GENSP00000419060F76SPath/2
rs121913371140481478G|AENSP00000419060R52W
rs121913351140481411C|AENSP00000419060G74VPath/2
rs121913351C|TENSP00000419060G74E
rs121913355140481402C|GENSP00000419060G77APath/5
rs121913355C|AENSP00000419060G77V
rs121913355C|TENSP00000419060G77E
rs180177036140477853C|GENSP00000419060L93FPath/2
rs180177037140477813T|CENSP00000419060K107EPath/2
rs180177038140477807C|TENSP00000419060E109KPath/3
L.Path/1
rs180177039140477806T|CENSP00000419060E109GPath/2
L.Path/1
rs397507479140476811C|TENSP00000419060C140YPath/1
Un.S/1
rs180177040140453987T|GENSP00000419060N189HPath/2
rs180177040T|CENSP00000419060N189D
rs397507481140454006G|CENSP00000419060H182QPath/1
rs121913341 140453150A|CENSP00000419060F203LPath/2
L.Path/1
rs121913361140453149C|GENSP00000419060G204RUn.S/0
rs121913364140453134T|CENSP00000419060K209EPath/5
rs121913366140453145A|TENSP00000419060L205QPath/1
rs121913366A|CENSP00000419060L205R
rs121913370140453193T|CENSP00000419060N189SL.Path/1
rs397507483140453148C|AENSP00000419060G204VPath/3
rs121913225140453151A|GENSP00000419060F203S
rs121913337140453153A|TENSP00000419060D202E
rs121913362140453159T|CENSP00000419060I200M
rs372569965140453127C|TENSP00000419060R211Q
rs180177042140449165A|TENSP00000419060D246EPath/2
rs39750748514043972C|TENSP00000419060R279QPath/1
rs397507486140439613T|CENSP00000419060Q317RPath/1
rs397507487140434543G|AENSP00000419060R327CPath/1
rs200490285140434452G|TENSP00000419060A357D
rs397507476140477811T|GENSP00000419060K107NPath/1
rs150050723140534634T|GENSP00000420119Q78H
rs180177032140481423C|AENSP00000419060R70IPath/1
rs121913378140453137C|TENSP00000288602V600ML.Path/1
rs121913369140453146G|CENSP00000288602L597VPath/4
rs121913369G|CENSP00000419060L205V
rs121913378140453137C|TENSP00000419060V208ML.Path/1
rs397507481140454006G|CENSP00000418033H2QPath/1

SNP ID refers to dbSNP. Ch7: location within chromosome number seven (assembly GRCh37/hg19). Clin/sig: clinical significance refers to ClinVar database; significant results could be one of the following: Path: pathogenic, benign; L.Path: likely pathogenic, or/and Un.S: unsignificant. Number after significant results refers to number of diseases that are associated with this SNP.

Protein IDAmino acid changeSIFT predictionScore PolyPhen-2 predictionScoreSNAP2Score

ENSP00000288602Q93HDamaging0Probably damaging0.974Effect15
ENSP00000288602A246PDamaging0Probably damaging0.999Effect73
ENSP00000288602T241MDamaging0Probably damaging1Effect72
ENSP00000288602T241RDamaging0Probably damaging1Effect73
ENSP00000288602T241PDamaging0Probably damaging1Effect86
ENSP00000288602L245FDamaging0Probably damaging0.999Effect64
ENSP00000288602F247SDamaging0.01Possibly damaging0.443Effect77
ENSP00000288602F247LDamaging0Probably damaging0.987Effect70
ENSP00000288602P367LDamaging0.01Probably damaging0.999Effect36
ENSP00000288602P402HDamaging0Probably damaging0.948Effect29
ENSP00000288602G464ADamaging0Probably damaging0.999Effect50
ENSP00000288602G464VDamaging0Probably damaging1Effect70
ENSP00000288602G464EDamaging0Probably damaging1Effect84
ENSP00000288602G466RDamaging0Probably damaging0.996Effect95
ENSP00000288602G469RDamaging0Probably damaging1Effect90
ENSP00000288602V471FDamaging0Probably damaging0.954Effect78
ENSP00000288602V471IDamaging0Possibly damaging0.22Effect37
ENSP00000288602I463SDamaging0Possibly damaging0.714Effect75
ENSP00000288602F468SDamaging0Possibly damaging0.67Effect82
ENSP00000288602G464RDamaging0Probably damaging1Effect79
ENSP00000288602R444WDamaging0Probably damaging0.999Effect67
ENSP00000288602G466ADamaging0Probably damaging0.97Effect87
ENSP00000288602G466VDamaging0Probably damaging0.983Effect93
ENSP00000288602G466EDamaging0Probably damaging0.996Effect94
ENSP00000288602G469ADamaging0Probably damaging0.969Effect73
ENSP00000288602G469VDamaging0Probably damaging1Effect89
ENSP00000288602G469EDamaging0Probably damaging1Effect93
ENSP00000288602L485FDamaging0Probably damaging0.999Effect5
ENSP00000288602E501KDamaging0.01Probably damaging0.997Effect84
ENSP00000288602E501GDamaging0Probably damaging1Effect81
ENSP00000288602K483QDamaging0Probably damaging0.999Effect79
ENSP00000288602L485SDamaging0Probably damaging0.999Effect42
ENSP00000288602K499NDamaging0Probably damaging0.884Effect58
ENSP00000288602L505FDamaging0.01Possibly damaging0.698Effect41
ENSP00000288602P490SDamaging0Possibly damaging0.494Effect31
ENSP00000288602G534RDamaging0Probably damaging1Effect34
ENSP00000288602C532YDamaging0Probably damaging1Effect75
ENSP00000288602N581HDamaging0Probably damaging1Effect73
ENSP00000288602N581DDamaging0.05Possibly damaging0.503Effect76
ENSP00000288602H574QDamaging0Probably damaging0.999Effect89
ENSP00000288602F595LDamaging0Probably damaging1Effect92
ENSP00000288602G596RDamaging0Probably damaging1Effect95
ENSP00000288602K601EDamaging0Probably damaging0.997Effect58
ENSP00000288602L597QDamaging0Probably damaging1Effect71
ENSP00000288602L597RDamaging0Probably damaging0.999Effect81
ENSP00000288602N581SDamaging0.04Possibly damaging0.517Effect56
ENSP00000288602T599IDamaging0Probably damaging0.997Effect62
ENSP00000288602G596VDamaging0Probably damaging1Effect92
ENSP00000288602K601IDamaging0Probably damaging0.986Effect18
ENSP00000288602F595SDamaging0Probably damaging1Effect95
ENSP00000288602D594EDamaging0Probably damaging0.999Effect90
ENSP00000288602I592MDamaging0Probably damaging0.997Effect55
ENSP00000288602K601NDamaging0Probably damaging0.939Effect49
ENSP00000288602R603QDamaging0.03Probably damaging0.971Effect37
ENSP00000288602D638EDamaging0Probably damaging1Effect91
ENSP00000420119Q78HDamaging0Probably damaging0.948Effect41
ENSP00000288602R671QDamaging0Probably damaging0.996Effect28
ENSP00000288602Q709RDamaging0.01Possibly damaging0.776Effect45
ENSP00000288602L692SDamaging0Probably damaging0.999Effect30
ENSP00000288602R719CDamaging0Probably damaging0.996Effect34
ENSP00000288602A749DDamaging0.03Possibly damaging0.819Effect47
ENSP00000288602R719LDamaging0Possibly damaging0.551Effect45
ENSP00000418033N9HDamaging0Probably damaging0.98Effect66
ENSP00000418033F23LDamaging0Probably damaging0.993Effect69
ENSP00000418033G24RDamaging0Probably damaging1Effect90
ENSP00000418033K29EDamaging0.04Probably damaging0.909Effect41
ENSP00000418033L25QDamaging0Probably damaging0.996Effect42
ENSP00000418033L25RDamaging0Probably damaging0.996Effect61
ENSP00000418033T27IDamaging0.01Probably damaging0.996Effect64
ENSP00000418033G24YDamaging0Probably damaging1Effect79
ENSP00000418033K29IDamaging0Probably damaging0.907Effect8
ENSP00000418033F23SDamaging0Probably damaging0.996Effect87
ENSP00000418033D22EDamaging0Probably damaging0.997Effect80
ENSP00000418033I20MDamaging0Probably damaging0.945Effect28
ENSP00000418033K29NDamaging0Probably damaging0.949Effect35
ENSP00000418033D66EDamaging0Probably damaging0.997Effect93
ENSP00000419060P10HDamaging0Probably damaging0.976Effect6
ENSP00000419060G77RDamaging0Probably damaging0.962Effect89
ENSP00000419060I71SDamaging0Probably damaging0.999Effect86
ENSP00000419060F76SDamaging0Probably damaging0.999Effect64
ENSP00000419060R52WDamaging0Probably damaging0.976Effect55
ENSP00000419060G74VDamaging0Possibly damaging0.351Effect81
ENSP00000419060G74EDamaging0Possibly damaging0.51Effect87
ENSP00000419060G77ADamaging0Possibly damaging0.344Effect52
ENSP00000419060G77VDamaging0Possibly damaging0.276Effect82
ENSP00000419060G77EDamaging0Probably damaging0.883Effect89
ENSP00000419060L93FDamaging0Probably damaging0.919Effect6
ENSP00000419060K107EDamaging0Probably damaging0.989Effect50
ENSP00000419060E109KDamaging0.01Probably damaging0.992Effect91
ENSP00000419060E109GDamaging0Probably damaging0.998Effect90
ENSP00000419060C140YDamaging0Probably damaging0.932Effect81
ENSP00000419060N189HDamaging0Probably damaging0.996Effect66
ENSP00000419060N189DDamaging0.03Probably damaging0.998Effect66
ENSP00000419060H182QDamaging0Probably damaging0.999Effect72
ENSP00000419060F203LDamaging0Probably damaging1Effect89
ENSP00000419060G204RDamaging0Probably damaging0.999Effect90
ENSP00000419060K209EDamaging0Possibly damaging0.477Effect51
ENSP00000419060L205QDamaging0Probably damaging1Effect56
ENSP00000419060L205RDamaging0Probably damaging1Effect75
ENSP00000419060N189SDamaging0.03Probably damaging0.989Effect43
ENSP00000419060G204VDamaging0Possibly damaging0.242Effect87
ENSP00000419060F203SDamaging0Probably damaging1Effect94
ENSP00000419060D202EDamaging0Probably damaging1Effect87
ENSP00000419060I200MDamaging0Probably damaging1Effect32
ENSP00000419060R211QDamaging0.05Probably damaging0.979Effect8
ENSP00000419060D246EDamaging0Probably damaging0.998Effect96
ENSP00000419060R279QDamaging0.02Probably damaging1Effect56
ENSP00000419060Q317RDamaging0.03Probably damaging0.975Effect2
ENSP00000419060R327CDamaging0Possibly damaging0.291Effect43
ENSP00000419060A357DDamaging0.03Possibly damaging0.507Effect22
ENSP00000419060K107NDamaging0Probably damaging0.994Effect32

2.1. SIFT (Sorting Intolerant from Tolerant) Server

SIFT server is an online bioinformatics server that is used to predict the damaging effect of nucleotide substitution and frame shift (insertion/deletion) on protein function based on the maintenance degree of the amino acid residues in sequence alignments derived from closely related sequences with the main assumption; that is, evolutionarily conserved regions tend to be less tolerant to mutations, and so mutations in these regions mainly affect its function [14]. SIFT server has different input data order as follows: dbSNP reference number (rs ID number), protein sequence, and chromosome location. For this tool coding SNPs and Indels were separated from total and submitted as rs ID numbers for (missense, nonsense, and stop gain) SNPs and as chromosome location for frame shift Indels. SIFT server assigns score for each residue from 0 to 1, where ≤0.05 score is considered by the algorithm to be damaging amino acid substitutions and >0.05 score is predicting tolerance [15]. SIFT version 5.2.2 is available at http://sift.bii.a-star.edu.sg/index.html.

2.2. PolyPhen-2 (Polymorphism Phenotyping) Server

An online bioinformatics server automatically predicts the nsSNPs that affect with amino acid substitution structure and function of protein, using a comparative method. PolyPhen searches for protein 3D structures and make multiple alignments of homologous sequences and amino acid contact in several protein databases and calculate position-specific independent count scores (PSIC) for each of two variants and then computes the PSIC scores difference between two variants, where the higher PSIC score difference indicates that the functional impact of amino acid substitution is likely to occur [16]. PolyPhen-2 outcome can be one of the following: probably damaging, possibly damaging, or benign, with score range from 0 to 1 [9]. PolyPhen server is available at http://genetics.bwh.harvard.edu/pph2/index.shtml.

2.3. SNAP2 Server

SNAP2 is a trained classifier that is based on a machine learning device called “neural network.” It distinguishes between effect and neutral variants/nonsynonymous SNPs by taking a variety of sequence and variant features into account. The most important input signal for the prediction is the evolutionary information taken from an automatically generated multiple sequence alignment. Also structural features such as predicted secondary structure and solvent accessibility are considered. If available, also annotation (i.e., known functional residues, pattern, and regions) of the sequence or close homologs are pulled in. Predicting a score (ranges from −100 strong neutral prediction to +100 strong effect prediction), analysis suggests that the prediction score is to some extent correlated to the severity of effect [17] (https://rostlab.org/services/snap/).

From the total functional nsSNPs predicted by the three previous tools (SIFT server, PolyPhen, and SNAP2), the higher 15 functional nsSNPs (got higher predicted score) were selected for next analysis.

2.4. I-Mutant Suite

I-Mutant version 3.0 is a suite of support vector machine, based predictors integrated in a unique web server. It offers the opportunity to predict the protein stability changes upon single-site variations from the protein structure or sequence. I-Mutant result is designed as follows: DDG < 0: decrease stability, DDG > 0: increase stability, or DDG = 0: neutral [18]. I-Mutant 3.0 is available at http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi.

2.5. CPH Models

A protein homology modeling prediction server, used to predict the 3D structure of proteins with an unknown 3D structure model, in CPH models the template recognition based on profile-profile alignment guided by secondary structure and exposure predictions [19]. Protein sequences requirements were submitted to CPH server to get the model as PDB file (for the structure that could not be predicted by automated Project HOPE server). The resultant PDB files were opened using Chimera program which was used to visualize the PDB structure (http://www.cbs.dtu.dk/services/CPHmodels/).

2.6. UCSF Chimera Model Software

Chimera is a high-quality extensible molecular graphics program designed to maximize interactive visualization, analysis system, and related data [20]. This software was produced by University of California, San Francisco [9]. Chimera outcome was used to get high-quality images of, first, whole protein 3D structure that needed protein IDs ENSP00000288602, ENSP00000418033 and ENSP00000419060 (Figure 1) and, second, determined native and mutant residues for mutations that could not be detected by next automated Project HOPE server (Figure 2) (http://www.cgl.ucsf.edu/chimera/).

2.7. Automatic Protein Structural Analysis and Information Using HOPE Server

Automatic mutant analysis server can provide insight into the structural effects of a mutation. HOPE collects information from a wide range of information sources including calculations on the 3D coordinates of the protein by using WHAT IF Web services, sequence annotations from the UniProt database, and predictions by DAS services. Homology models are built with YASARA. Data is stored in a database and used in a decision scheme to identify the effects of a mutation on the protein’s 3D structure and function. HOPE builds a report with text, figures, and animations that is easy to use and understandable for (bio)medical researchers [21] (http://www.cmbi.ru.nl/hope/method) (Figure 2).

2.8. PolymiRTS Database (3′ UTR)

It is an integrated platform for analyzing the functional impact of genetic polymorphisms (SNPs and Indels) within microRNAs binding sites [13]. Single variants within 3′ UTR were selected from total variants and submitted to PolymiRTS server, to check if these variants could disrupt or create new miRNA binding sites or have no impact at all. PolymiRTS is available at http://compbio.uthsc.edu/miRSNP/ (Table 6).

2.9. Effect of SNPs within 5′ UTR on Transcription Factor Binding Sites

PROMO is a virtual laboratory for the identification of putative transcription factor binding sites (TFBS) in DNA sequences from a species or groups of species of interest. TFBS defined in the TRANSFAC database are used to construct specific binding site weight matrices for TFBS prediction. The user can inspect the result of the search through a graphical interface and downloadable text files [22]. Input data was two sequences for each SNP within 5′ UTR: first sequence contained a wide nucleotide allele and the second contained a new substitution nucleotide as in Table 7 (http://alggen.lsi.upc.es/cgi-bin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3).

2.10. Effect of 3′/5′ Splice Sites SNPs/Indels (HSF Tool)

Human Splicing Finder (HSF) is a tool to predict the effects of mutations on splicing signals or to identify splicing motifs in any human sequence. It contains all available matrices for auxiliary sequence prediction as well as new ones for binding sites of the 9G8 and Tra2-β serine-arginine proteins and the hnRNP A1 ribonucleoprotein. It also developed new position weight matrices to assess the strength of 5′ and 3′ splice sites and branch points [23]. In this study HSF was used to detect the functional SNPs and Indels within 3′/5′ splice sites. Input data was nucleotide sequence containing the single substitution as SNP or insertion/deletion as Indel as in Table 8 (http://www.umd.be/HSF3/index.html).

3. Results and Discussion

Some information about total single variants and functional nsSNPs predicted with triple or double tools is obtained from many databases (dbSNP, UniProt, HapMap, 1000 Genomes Project, gene bank, and ClinVar) (Tables 1 and 2). In addition there was no functional SNP presented within HapMap or 1000 Genomes Project databases.

3.1. Predicted Results by SIFT, PolyPhen, and SNAP2 Servers

For 232 nsSNPs of BRAF gene, 111 variants were predicted to be damaging or effect by triple (SIFT, PolyPhen, and SNAP2) servers (Table 3). In addition one SNP (rs180177032, R70I) was predicted to be functional by double (SIFT and SNAP2) tools only. Furthermore five SNPs (V600M, L597V, L205V, V208M, and H2Q) were predicted as functional by double (SIFT and PolyPhen) servers only (Table 4). On the other hand, two Indels, frame shift (rs35546910, ch7:140834611; rs777474487, ch7:140783126-), showed no effect on protein at all.


Protein IDAmino acid changeSIFT predictionScorePolyPhen-2 predictionScoreSNAP2 predictionScore

ENSP00000419060R70IDamaging0Benign0.311Effect34
ENSP00000288602V600MDamaging0Probably damaging0.99Neutral
ENSP00000288602L597VDamaging0Probably damaging0.862Neutral0
ENSP00000419060L205VDamaging0Probably damaging0.995Neutral
ENSP00000419060V208MDamaging0.03Probably damaging0.996Neutral
ENSP00000418033H2QDamaging0Probably damaging0.998Neutral

From the previous results (Table 3), 15 nsSNPs with the maximum predicted score through triple servers were selected to predict their stability index (Table 5) and visualize wide and mutant residues in their protein 3D structure (Figure 2).


Protein IDAmino acid positionWTMTPHTemperature (°C)SVM2 prediction effectDDG value prediction kcal/molRI

ENSP00000288602466GR7.025Increase4
ENSP00000288602469GR7.025Increase4
ENSP00000288602466GV7.025Increase2
ENSP00000288602466GE7.025Decrease6
ENSP00000288602469GE7.025Increase5
ENSP00000288602595FL7.025Decrease8
ENSP00000288602596GR7.025Decrease1
ENSP00000288602596GV7.025Decrease2
ENSP00000288602595FS7.025Decrease8
ENSP00000288602638DE7.025Decrease7
ENSP0000041803324GR7.025Decrease8
ENSP0000041803366DE7.025Increase3
ENSP00000419060109EK7.025Decrease8
ENSP00000419060203FS7.025Decrease8
ENSP00000419060246DE7.025Decrease7

WT: wild type amino acid. MT: mutant type amino acid. DDG: delta DG (units of free energy) (DDG < 0: decreased stability, DDG > 0: increased stability). RI: reliability index.

dbSNP IDVariant typeAncestral alleleAllelemiR IDConservationmiRSiteFunction classContext + score change

rs114105685SNPGAhsa-miR-30a-5p3gTGTTTACAggtgC
hsa-miR-30b-5p3gTGTTTACAggtgC
hsa-miR-30c-5p3gTGTTTACAggtgC
hsa-miR-30d-5p3gTGTTTACAggtgC
hsa-miR-30e-5p3gTGTTTACAggtgC
hsa-miR-3607-3p4gtgTTTACAGgtgCNo change

rs184804021SNPTThsa-miR-39084tagaCATTGCTAaD
hsa-miR-3942-5p5tagacATTGCTAaD
hsa-miR-4703-5p5tagacATTGCTAaD
hsa-miR-4766-3p5tagacATTGCTAAD
Chsa-miR-374b-3p5tagacaCTGCTAAC
hsa-miR-42745tagacACTGCTAaC
hsa-miR-45194tagaCACTGCTAaC
hsa-miR-4524a-5p5tagacaCTGCTAAC
hsa-miR-4524b-5p5tagacaCTGCTAAC
hsa-miR-6499-3p7tagACACTGCtaaC
hsa-miR-6733-3p5taGACACTGctaaC

rs140083479SNPAAhsa-miR-39084atagaCATTGCTAD
hsa-miR-3942-5p5atagacATTGCTAD
hsa-miR-4703-5p5atagacATTGCTAD
hsa-miR-4766-3p5atagacATTGCTAD
hsa-miR-4796-5p3ATAGACAttgctaD
Thsa-miR-548p5atagacTTTGCTAC

rs200393520 Indelhsa-miR-43112acTCTCTTTtttt O0.004
T

rs143647707SNPAAhsa-miR-5580-3p3gaaCATATGTttgD
Thsa-miR-7-1-3p4gaacATTTGTTtgC
hsa-miR-7-2-3p4gaacATTTGTTtgC

rs202148822SNPGGhsa-miR-19763tCAGGAGAgtagcD
hsa-miR-6845-3p3tcAGGAGAGtagcD

Conservation: occurrence of the miRNA site in other vertebrate genomes in addition to the query genome. miRSite: sequence context of the miRNA site: bases complementary to the seed region are in capital letters and SNPs are highlighted in bold font. Function class: D: the derived allele disrupts a conserved miRNA site (ancestral allele with support > 2); C: the derived allele creates a new miRNA site; O: the ancestral allele can not be determined. Context score: negative increase = increase in SNP functionality.

SNP IDTranscription factor predictedPrediction

rs71645935Pax-5, p53No effect
rs71645936FOXP3No effect
rs397507453None
rs762432076None
rs769116177None


Polymorphism IDType of splice sitePrediction

rs199910929 (SNP)5′ splice site(i) Alteration of an exonic ESE site
(ii) Potential alteration of splicing
rs776683449 (SNP)3′ splice siteNo significant splicing motif alteration detected
This mutation has probably no impact on splicing
rs775598011 (Indel)3′ splice site(i) Alteration of an exonic ESE site
(ii) Potential alteration of splicing

3.2. UTRs and Splice Sites

Results in untranslated regions showed lower number of functional SNPs and Indels than coding nsSNPs. 3′ UTR SNPs and Indels showed that five SNPs and one Indel were altered in microRNAs binding sites, which lead to disturbing or creating new binding sites (Table 6). Furthermore miRNAs associated with these functional SNPs/Indel are associated with many genes, and defect in these miRNAs could lead to effect on all associated genes expressions.

On the other hand, for 5′ UTR SNPs (five SNPs obtained), results showed that two SNPs were found in transcription factor binding sites with none being altered, and the remaining three were not located within any TF binding sites, meaning that none of five SNPs showed an effect on TF binding sites (Table 7). In addition, about the three single variants (two SNPs and one Indel) within 5′/3′ splice sites, analysis showed that one SNP within 5′ splice site and one Indel in 3′ splice site showed potential alteration of splicing (Table 8).

To date the complete mechanisms by which a nucleotide variant may result in a phenotypic change are for the most part unknown. In silico analysis using powerful software tools can facilitate predicting the phenotypic effect of nonsynonymous coding SNPs on the physicochemical properties of the concerned proteins. Such information is critical for genotype-phenotype correlations and also to understand disease biology. Given the fact that nsSNPs in critical cellular genes such as BRAF modify the normal programs of cell proliferation, differentiation, and death, they are believed to play an important role in disease predisposition. Therefore, efforts were made to identify SNPs that can modify the structure, function, and expression of the BRAF gene.

Through one of the most significant BRAF mutations, when thymine is substituted with adenine at nucleotide 1799, it results in an amino acid substitution at position 600 from valine (V) to a glutamic acid (E), which is called V600E, located in the activation segment that has been found in many human cancers. For example, it was reported as the most common genetic mutation related to papillary thyroid cancer and occurs in approximately 45% of patients [24, 25]. In silico investigation also presented this mutation as highly damaging substitution that could cause a disease using SIFT and PolyPhen online tools. Furthermore Project HOPE server results showed that the wide type residue (V) is smaller in size (Figure 3), neutral in charge, and more hydrophobic. On the other hand mutant residue (E) is bigger in size (Figure 3), negatively charged, and less hydrophobic. In addition the mutated residue is located in a domain that is important for the activity of the protein and in contact with another domain that is also important for the activity. The interaction between these domains could be disturbed by the mutation, which might affect the function of the protein.

4. Conclusion

The current study shows the in silico analysis of genetic single variants within the coding region, 3′/5′ UTR and 3′/5′ splice sites of BRAF gene. These polymorphisms could directly or indirectly influence the intermolecular and intramolecular interactions of amino acid residues and protein expression and can culminate into disease risks. By analyzing the conformational changes and interactions of amino acid residues within BRAF proteins, we have identified significant structural and functional changes that can explain the activity deviations, caused by several mutations. Furthermore significant pathology or likely pathology showed association of many detected SNPs with many diseases through clinical variation database (http://www.ncbi.nlm.nih.gov/clinvar/). They include the following diseases: cardiofaciocutaneous syndrome, Noonan syndrome, LEOPARD syndrome, RASopathy, non-small-cell lung cancer, carcinoma of colon, adenocarcinoma of lung, thyroid cancer, malignant lymphoma, non-Hodgkin lymphoma. Screening for BRAF variants may be useful for molecular diagnosis and development of vital molecular inhibitors of genes pathways. This study demonstrates the significance of different bioinformatics tools to figure out the phenotypic changes and protein function, associated with the structure-function relationship of BRAF gene. More evidence is required for the involvement of deregulated miRNA networks in cancer development. Resultant SNPs can be applied for further investigation and diagnosis of many associated diseases.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

References

  1. M. Volante, P. Collini, Y. E. Nikiforov et al., “Poorly differentiated thyroid carcinoma: the Turin proposal for the use of uniform diagnostic criteria and an algorithmic diagnostic approach,” American Journal of Surgical Pathology, vol. 31, no. 8, pp. 1256–1264, 2007. View at: Publisher Site | Google Scholar
  2. R. H. Grogan, E. J. Mitmaker, and O. H. Clark, “The evolution of biomarkers in thyroid cancer-from mass screening to a personalized biosignature,” Cancers, vol. 2, no. 2, pp. 885–912, 2010. View at: Publisher Site | Google Scholar
  3. E. Domingo and S. Schwartz Jr., “BRAF (v-raf murine sarcoma viral oncogene homolog B1),” Atlas of Genetics and Cytogenetics in Oncology and Haematology, vol. 8, no. 4, pp. 302–306, 2004. View at: Publisher Site | Google Scholar
  4. M. R. M. Hussain, M. Baig, H. S. A. Mohamoud et al., “BRAF gene: from human cancers to developmental syndromes,” Saudi Journal of Biological Sciences, vol. 22, no. 4, pp. 359–373, 2015. View at: Publisher Site | Google Scholar
  5. R. D. Hall and R. R. Kudchadkar, “Braf mutations: signaling, epidemiology, and clinical experience in multiple malignancies,” Cancer Control, vol. 21, no. 3, pp. 221–230, 2014. View at: Google Scholar
  6. M. R. M. Hussain, M. Baig, H. S. A. Mohamoud et al., “BRAF gene: from human cancers to developmental syndromes,” Saudi Journal of Biological Sciences, vol. 22, no. 4, pp. 359–373, 2015. View at: Publisher Site | Google Scholar
  7. J. Bosco, A. Allende, W. Varikatt, R. Lee, and G. J. Stewart, “Does the BRAFV600E mutation herald a new treatment era for Erdheim-Chester disease? A case-based review of a rare and difficult to diagnose disorder,” Internal Medicine Journal, vol. 45, no. 3, pp. 348–351, 2015. View at: Publisher Site | Google Scholar
  8. R. Guerra and Z. Yu, “Single nucleotide polymorphisms and their applications,” in Computational and Statistical Approaches to Genomics, W. Zhang and I. Shmulevich, Eds., chapter 16, pp. 311–349, Springer, Berlin, Germany, 2006. View at: Publisher Site | Google Scholar
  9. M. M. Hassan, A. A. Dowd, A. H. Mohamed et al., “Computational analysis of deleterious nsSNPs within HLA-DRB1 and HLA-DQB1 genes responsible for Allograft rejection,” International Journal of Computational Bioinformatics and in Silico Modeling, vol. 3, no. 6, pp. 562–577, 2014. View at: Google Scholar
  10. M. Alanazi, Z. Abduljaleel, W. Khan et al., “In silico analysis of single nucleotide polymorphism (SNPs) in human β-globin gene,” PLoS ONE, vol. 6, no. 10, Article ID e25876, 2011. View at: Google Scholar
  11. A. A. Komar and Humana Press, Single Nucleotide Polymorphism-Methods and Protocols, vol. 578, Humana Press, Totowa, NJ, USA, 2009.
  12. E. M. Smigielski, K. Sirotkin, M. Ward, and S. T. Sherry, “dbSNP: a database of single nucleotide polymorphisms,” Nucleic Acids Research, vol. 28, no. 1, pp. 352–355, 2000. View at: Publisher Site | Google Scholar
  13. A. Bhattacharya, J. D. Ziebarth, and Y. Cui, “PolymiRTS Database 3.0: Linking polymorphisms in microRNAs and their target sites with human diseases and biological pathways,” Nucleic Acids Research, vol. 42, no. 1, pp. D86–D91, 2014. View at: Publisher Site | Google Scholar
  14. P. Kumar, J. Hu, S. Henikoff, G. Schneider, C. Pauline, and P. C. Ng, “SIFT web server: predicting effects of amino acid substitutions on proteins,” Nucleic Acids Research, vol. 40, no. 1, pp. W452–W457, 2012. View at: Publisher Site | Google Scholar
  15. P. Kumar, S. Henikoff, and P. C. Ng, “Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm,” Nature Protocols, vol. 4, no. 7, pp. 1073–1082, 2009. View at: Publisher Site | Google Scholar
  16. S. M. O. Sarour, A. M. Zayed, M. O. M. Ibrahim et al., “New mutation found within OTOR gene involved in deafness in two Sudanese families from Al-Jazirah state-Sudan: using Next Generation Sequencing (NGS),” Bio-Genetics Journal, vol. 2, no. 6, pp. 46–50, 2014. View at: Google Scholar
  17. M. Hecht, Y. Bromberg, and B. Rost, “News from the protein mutability landscape,” Journal of Molecular Biology, vol. 425, no. 21, pp. 3937–3948, 2013. View at: Publisher Site | Google Scholar
  18. E. Capriotti, P. Fariselli, R. Calabrese, and R. Casadio, “Predicting protein stability changes from sequences using support vector machines,” Bioinformatics, vol. 21, no. 2, pp. ii54–ii58, 2005. View at: Publisher Site | Google Scholar
  19. M. Nielsen, C. Lundegaard, O. Lund, and T. N. Petersen, “CPHmodels 3.2.remote homology modeling using structure-guided sequence profiles,” Nucleic Acids Research, vol. 38, pp. W576–W581, 2010, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2896139/pdf/gkq535.pdf. View at: Publisher Site | Google Scholar
  20. G. S. Couch, D. K. Hendrix, and T. E. Ferrin, “Nucleic acid visualization with UCSF Chimera,” Nucleic Acids Research, vol. 34, no. 4, article e29, pp. 1–5, 2006. View at: Google Scholar
  21. H. Venselaar, T. A. H. te Beek, R. K. P. Kuipers, M. L. Hekkelman, and G. Vriend, “Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces,” BMC Bioinformatics, vol. 11, article 548, 2010. View at: Publisher Site | Google Scholar
  22. D. Farré, R. Roset, M. Huerta et al., “Identification of patterns in biological sequences at the ALGGEN server: PROMO and MALGEN,” Nucleic Acids Research, vol. 31, no. 13, pp. 3651–3653, 2003. View at: Publisher Site | Google Scholar
  23. F. Desmet, D. Hamroun, M. Lalande, G. Collod-Beroud, M. Claustres, and C. Beroud, “Human Splicing Finder: an online bioinformatics tool to predict splicing signals,” Nucleic Acids Research, vol. 37, no. 9, pp. e67–e67, 2009, http://nar.oxfordjournals.org/content/early/2009/04/01/nar.gkp215.full.pdf+html. View at: Publisher Site | Google Scholar
  24. Y. H. Tan, Y. Liu, K. W. Eu et al., “Detection of BRAF V600E mutation by pyrosequencing,” Pathology, vol. 40, no. 3, pp. 295–298, 2008. View at: Publisher Site | Google Scholar
  25. M. Yarchoan, V. A. LiVolsi, and M. S. Brose, “BRAF mutation and thyroid cancer recurrence,” Journal of Clinical Oncology, vol. 33, no. 1, pp. 7–8, 2015, http://jco.ascopubs.org/content/early/2014/11/20/JCO.2014.59.3657.full.pdf+html. View at: Publisher Site | Google Scholar

Copyright © 2016 Mohamed M. Hassan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

3553 Views | 1316 Downloads | 4 Citations
 PDF  Download Citation  Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at help@hindawi.com to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19. Sign up here as a reviewer to help fast-track new submissions.