A vicilin-like glycoprotein from the seeds of Nicotiana sylvestris, flowering tobacco, has been identified using nanoLC/ESI-MS/MS. Sequences from a fragment of protein demonstrated homology with vicilins from other members of the Solanaceae family, notably potato (Solanum demissum). Reducing and nonreducing SDS-PAGE analyses of the identified protein indicated that fragments resulting from in situ proteolytic processing are joined by intrachain disulphide bonds. Staining with Con A lectin was specifically inhibited by mannose suggested the presence of -linked glycosylation which was confirmed by carbohydrate compositional analysis of PVDF-bound protein subunits. HPAEC-PAD analysis of the monosaccharides released from the glycoprotein by acid hydrolysis revealed glucosamine and mannose. -acetylglucosamine termination of attached oligosaccharides was further verified by inhibitable WGA lectin staining. Immunostaining of PVDF-bound N. sylvestris proteins with antibodies against G. max total protein demonstrated cross-staining at masses corresponding to fragments from the proteolytically processed protein subunits.

1. Introduction

Solanaceae encompasses many plants that are nutritionally and/or economically important including potatoes, tomatoes, eggplant (all Solanum spp.) and tobacco (Nicotiana spp.). Species of the genus Nicotiana have become leading proof-of-concept laboratory platforms for plant-based biopharmaceuticals [16] and have gained popularity as recombinant hosts through the development of successful transformation protocols including viral and bacterial methods [7, 8]. In 1990, tobacco was stably transformed with a gene encoding Streptococcus mutans protein antigen A, demonstrating that plants could be useful as vaccine-producing organisms [7]. Fully functional recombinant antibodies, which may find eventual use as therapeutics, have also been produced in tobacco plants [9, 10]. Seed deposition of recombinant proteins in tobacco species has also been demonstrated [9, 11, 12].

Seeds represent a highly desirable depository for recombinant protein products due to their excellent accumulation and stability [9, 11, 13]. Endogenous storage proteins are highly concentrated in seeds and constitute one of the largest caches of non-animal protein in nature. Storage proteins are systematically degraded during germination to ensure a constant supply of nitrogen and carbon-rich intermediates useful for synthesis of new materials [14, 15].

Previously, Sano and Kawashima [16] partially characterized the abundant 11S (also called legumin) protein component from N. tabacum seeds. As is the case with many nonleguminous, dicotyledonous (dicot) plants, the seed protein content of Nicotiana spp. is composed largely of 11S proteins [16]. However, dicot species may also produce large amounts of 7S (sometimes 8S) globulins more commonly referred to as vicilins.

Vicilin-type globulins may comprise up to 80% of protein found in some legume seeds [15, 17], but usually constitute a significantly smaller percentage in nonlegumes. Typically, vicilins are sparsely glycosylated trimeric clusters of combined molecular mass between 140 and 190 kDa [18]. Under physiological conditions, vicilin complexes associate without interchain disulphide bonds. However, at higher concentrations, and in the presence of magnesium and calcium ions, they may also form complexes with greater numbers of subunits [15, 19]. Many vicilins undergo proteolytic modification, or “nicking," and this processing creates a high degree of heterogeneity in the subunit population [12, 20, 21]. Vicilins of adzuki bean (Vigna angularis) have been shown to have trends in thermostability and surface hydrophobicity attributed to posttranslational proteolytic processing [21].

Vicilin subunits contain two cupin domains. Each cupin domain is a 6-stranded, short beta-barrel structure common to several subfamilies of the globulin superfamily of proteins, including the 11S storage proteins, to which they are closely related (Figure 1) [15, 22, 23]. Characterized vicilin glycoproteins from dicots soy (Glycine max) [24], pumpkin (Cucurbita maxima) [25], and carrots (Daucus carota) [26] have also been identified as allergens. Vicilin-like proteins from monocotyledonous species with characteristics similar to dicot 7S vicilins have been identified in wheat (Triticum spp.), barley (Hordeum spp.), and oat (Avena spp.) [27, 28].

While Higgins et al. [12] noted the presence of endogenous vicilin in their report of recombinant pea (Pisum sativum) vicilin expression in N. tabacum seeds, based on cross-reactivity of an antibody fraction made against the pea protein, no characterization of the endogenous protein was carried out at that time. This is the first report, to our knowledge, describing an endogenous, glycosylated, vicilin-like protein from plants of the Nicotiana genus.

2. Materials and Methods

2.1. Soluble Protein Extraction

Seeds from Nicotiana sylvestris were purchased from Johnny’s Seeds (Winslow, Me, USA). All procedures performed at unless specified otherwise. Seeds (28 g) were ground to a uniform fine powder in a coffee grinder, and defatted in 100 mL cold acetone with rapid stirring for 1 hour. The acetone slurry was filtered, and the residue washed with one additional volume of fresh acetone and dried completely under vacuum. 2 g portions of the dry delipidated seed powder were first extracted using 20 mL double-distilled water (ddH2O) with gentle shaking for 30 minutes. Albumin-containing supernatant was removed following centrifugation at for 15 minutes. To extract the total soluble globulin protein (TSG) the pellet was suspended in 20 mL of extraction buffer containing 50 mM Tris-HCl, 200 mM NaCl, and 0.1 mM PMSF, pH 8.0 and the slurry continuously inverted for 1 hour. The majority of the insoluble content was removed by centrifugation for 15 minutes at . Supernatant, in 1.5 mL aliquots, was further centrifuged ( , 30 minutes) to yield TSG. Extracts from dried uncooked seeds of G. max and C. maxima were prepared using the same extraction buffer and general process but were subjected to neither delipidation nor separate extraction of albumins prior to globulin extraction. Protein content was estimated by Bradford method [29] (absorbance at 595 nm) using a Spectramax spectrophotometer (Molecular Devices, Sunnyvale, Calif, USA).

2.2. Size-Exclusion Chromatography

Approximately 2.5 mg TSG prefiltered through m acetate membrane (Millipore, Billerica, Mass, USA) was subjected to fast protein liquid chromatography (FPLC) using a Superdex 200 10/30 size exclusion column (GE Health Sciences, Uppsala, Sweden). Isocratic elution was performed at a flow rate of 250 L/min using 50 mM Tris-HCl, 200 mM NaCl, pH 8.0. The eluant was monitored by absorption at 280 nm, collected in fractions, and pooled appropriately. Elution calibration was performed using goat IgG (Mr 150 kDa), human transferrin (Mr 81 kDa), and human myoglobin (Mr 39 kDa).

2.3. SDS-PAGE and Enrichment of 7S Proteins

Crude and postsize exclusion TSG was subjected to SDS-PAGE analysis under reducing (R-SDS-PAGE) and nonreducing (NR-SDS-PAGE) conditions. Enrichment of 7S proteins was carried out by first separating pooled postsize exclusion protein fractions under nonreducing conditions in 12% Bis-Tris polyacrylamide gel (Invitrogen, Carlsbad, Calif, USA) followed by direct excision of the nonreduced vicilin (Mr 48 kDa) band. The excised gel containing the 7S protein was macerated to a fine paste in a microcentrifuge tube. The protein was then re-extracted from the paste with the addition of 40 L of 5 × R-SDS-PAGE sample buffer containing -mercaptoethanol ( -ME) followed by agitation at for 30 minutes. An additional 20% v/v of 1 × reducing sample buffer was added, the contents vortexed briefly, and the gel pieces sedimented by centrifugation at 5000 × g for 1 minute. Supernatant containing the reduced protein was loaded onto 4–12% Bis-Tris polyacrylamide gel and separated in MES running buffer (Invitrogen).

2.4. Protein Sequence Identification

Protein samples were run on 4–15% Tris-HCl gels (Bio-Rad) under reducing conditions after completion of size exclusion chromatography. Gels were stained with Coomassie Brilliant Blue (CBB) in a fixative solution containing 0.05% (w/v) CBB, 30% methanol, and 10% acetic acid and partially destained in 30% methanol, 10% acetic acid for protein visualization. Bands of interest were excised and completely destained, macerated in deionized water and subjected to exhaustive trypsin digestion. Peptides were extracted from the gel pieces with 5 % formic acid/50 % acetonitrile, evaporated to dryness in a Speedvac vacuum centrifuge (Savant, Farmingdale, NY), and reconstituted in L of 1% formic acid.

A microbore HPLC system (Surveyor, Thermo, San Jose, Calif, USA) was modified to operate at capillary flow rates using a simple T-piece flow-splitter. Columns (8 cm 100 m I.D.) were prepared by packing , 5 m Zorbax C18 resin at 500 psi into columns with integrated electrospray tips made from fused silica, pulled to a 5 m tip using a laser puller (Sutter Instrument Co., Novato, Calif, USA). Electrospray voltage of 1.8 kV was applied using a gold electrode via a liquid junction upstream of the column. Samples were introduced onto the analytical column using a Surveyor autosampler (Thermo, San Jose, Calif, USA). The HPLC column eluent was eluted directly into the electrospray ionization source of a Thermo LCQ Deca ion trap mass spectrometer.

Peptides were eluted in a gradient using 5% acetonitrile, 0.1% formic acid (buffer A) and 90% acetonitrile, 0.1% formic acid (buffer B), at a flow rate of 500 nL/min. Following an initial wash with buffer A for 10 minutes, peptides were eluted with a linear gradient from 0–50% buffer B over a 25-minute interval, followed by 50–98% B over 5 minutes and a 5-minute wash at 98% B. Automated peak recognition, dynamic exclusion, and daughter ion scanning of the top three most intense ions were performed using the Xcalibur software as previously described [30]: (i) full mass survey scan 400–1500 amu, (ii) MS/MS of most abundant ion from survey scan, (iii) MS/MS of 2nd most abundant ion from survey scan, (iv) MS/MS of 3rd most abundant ion from survey scan. Other instrument parameters included: collision energy 39%, activation Q 0.25, activation time 30 milliseconds, isolation width 2.0 amu, dynamic exclusion enabled with repeat count 2, duration 0.5 minute, exclusion duration 5 minutes, exclusion mass width low 1.5 amu, high 1.5 amu.

2.5. Database Searching, Result Filtering, and Validation

MS/MS data were analyzed using SEQUEST run under Bioworks 3.1 (Thermo, San Jose, Calif, USA) [31, 32]. All spectra were searched against the NCBI nonredundant protein sequence database (downloaded August 2007). Search parameters included peptide and fragment mass tolerance of 2 Da and 0.2 Da, respectively, modification of cysteine with 57 amu (iodoacetamidation), and differential modification of methionine with 16 amu (oxidation).

Initial criteria for a preliminary positive peptide identification for a doubly charged peptide were a correlation factor (Xcorr) greater than 2.5, a delta cross-correlation factor (dCn) greater than 0.1, a minimum of one tryptic peptide terminus, and a high preliminary scoring [33]. For triply charged peptides the correlation factor threshold was set at 3.5, and for singly charged peptides the threshold was set at 1.8. False positive rates for protein identification were assessed using reversed database searching [34, 35]. Imposing a minimum of two peptides per protein on each nanoLC-MS/MS data set [36], no proteins fitting the criteria were detected in the reversed database searches, indicating a protein identification confidence level of greater than 99% [37].

Peptide sequences identified from the NCBI nonredundant protein sequence database were used with the BLAST algorithm [38, 39] to find homologues. Homologues with greatest identity and greatest amount of characterization were considered. FASTA amino acid sequences were aligned using the ClustalW alignment tool (http://align.genome.jp/). Isoelectric point and mass estimation were performed using the ExPASy online algorithm (http://www.expasy.ch/tools/pi_tool.html).

2.6. Transfer of Proteins to PVDF and Lectin Staining

g of protein per well was loaded onto 4–15% Tris-HCl gels (Bio-Rad) and electrophoresed for approximately 1 hour at 50 mA (starting potential 150 V). Proteins were transferred to sequencing grade polyvinylene diflouride (PVDF) m membrane (Millipore) in a semidry transfer apparatus (Bio-Rad) at 1.5 mA/cm2 for 2 hours at a maximum of 15 V. Prior to transfer, PVDF was washed twice in 100% methanol and rinsed in transfer buffer. Blots were used directly after transfer or washed in ddH2O and dried for storage at room temperature for later use.

Concanavalin A (Con A lectin) conjugated to alkaline phosphatase (EY Laboratories, San Mateo, Calif, USA) was used to probe for N-linked oligosaccharides. PVDF membranes with bound proteins from seeds of N. sylvestris were washed in 20 mM Tris-HCl, 100 mM NaCl, 0.05% Tween 20, pH 7.4 (TBST), and then blocked for 1 hour in 1% porcine gelatin (Sigma-Aldrich, St. Louis, Mo, USA) in TBST. After washing twice (all washes 10 minutes) in TBST and once in lectin buffer (TBST plus 1 mM concentrations of MgCl2, MnCl2, and CaCl2), Con A in lectin buffer ( g in 15 mL) was added to the blot and incubated for 1 hour at room temperature with gentle agitation. A control blot was incubated in an identical manner with the addition of 10 mM mannose (Sigma-Aldrich) in the lectin buffer. After incubation in lectin solution, the blot was washed twice with TBST and once with ddH2O. Lectin binding was visualized colorimetrically by reaction of BCIP/NBT (Sigma-Aldrich) and was stopped by addition of water. Alkaline phosphatase conjugated WGA (EY Labs) was used with the same protocol with the exception of the inhibitory procedure which consisted of preincubation of the lectin in 100 mM GlcNAc for 1 hour prior to and 10 mM GlcNAc during the membrane incubation. Blots were dried at room temperature in the absence of light prior to being scanned with white light and stored as digital files.

2.7. Immunostaining

Polyclonal antibody prepared against G. max total seed protein (S2519, Sigma-Aldrich) was used to probe PVDF-bound proteins for cross-reactivity. SDS-PAGE samples of G. max, C. maxima, and N. sylvestris proteins as well as control proteins. Protein samples were prepared with -ME as mentioned previously and separated in 4–12% Bis-Tris gel (Invitrogen). Control samples consisting of human transferrin, bovine fetuin, bovine serum albumin (BSA), asialofetuin, peanut lectin (PNA), and ribonuclease B (all Sigma-Aldrich) were prepared in the same manner. After protein was transferred to PVDF, the membranes were washed for 10 minutes in PBST (20 mM phosphate buffer, 140 mM NaCl, 20 mM KCl, 0.05% Tween 20, pH 7.4) prior to blocking for 30 minutes in 1% BSA + PBST. Antibody was used at dilution in PBST, and incubation time was 1 hour at room temperature with gentle shaking. Membranes were washed in PBST twice for 5 minutes prior to the addition of secondary antibody against rabbit IgG (A3812, Sigma-Aldrich), conjugated to alkaline phosphatase, used at a concentration of in PBST. BCIP/alkaline phosphatase reaction was stopped by addition of water. The blot was rinsed twice for 30 seconds in ddH2O and allowed to dry at room temperature in the dark prior to being scanned with white light and stored as a digital image.

2.8. Carbohydrate Analysis

Unstained protein bands were excised from PVDF membrane and hydrolyzed in sealed glass ampules containing 4 N trifluoroacetic acid (TFA) for 3 hours at . As a positive control for the hydrolysis conditions, bovine fetuin (Sigma-Aldrich) was also subjected to TFA. To correct for background, identical portions of PVDF from buffer control lanes were subjected to TFA hydrolysis. Hydrolysates were neutralized by dropwise addition of 0.5 N sodium hydroxide and dried in a vacuum centrifuge (Labconco, Kansas City, Mo, USA). Dry carbohydrates were reconstituted in L HPLC grade water prior to high-pH anion exchange chromatography with pulsed amperometric detection (HPAEC-PAD, Dionex, Sunnyvale, Calif, USA). L aliquots of the carbohydrate solutions were separated on a CarboPac PA10 column (4 × 250 mm, analytical) preceded by an amino trap column (4 × 50 mm) to minimize interference from protein degradation (Dionex, Sunnyvale, Calif, USA). Isocratic elution of 18 mM NaOH was performed at a flow rate of 1 mL/min over 18.5 minutes at a column temperature of . A standard panel consisting of fucose (Fuc), galactosamine (GalN), glucosamine (GlcN), galactose (Gal), glucose (Glc), and mannose (Man) was used as a reference.

3. Results

3.1. Protein Extraction and Visualization

200 mM NaCl was used during the protein extraction due to the insignificant increase in solubility with 500 mM NaCl and to aid in downstream analysis and processing. Soluble globulin protein was readily extracted from N. sylvestris seeds at slightly alkaline pH, but solubility dropped sharply below pH 8.0. Although tris-(hydroxymethyl)-aminomethane (Tris) is a weak chelator, no additional measures were taken to sequester metal ions during protein extraction. Solubility reached a maximum of approximately 5 mg/mL at at which point aggregates of protein began to form and precipitate. The presence of free Mg2+ and Ca2+ ions may have been a limiting factor on the maximum total protein solubility [40]. R-SDS-PAGE of TSG produced one major band at approximately 48 kDa and several bands between 31 and 35 kDa and between 20 and 23 kDa as well as many smaller polypeptides which were poorly resolved (Figure 2, Total protein, lane R). A separation of protein aggregates 300 kDa by centrifugation in 300 kDa molecular weight cut-off (MWCO) spin filters resulted in a pellet which could be resolubilized to give a similar protein profile to the total extract (data not shown), leading to the conclusion that the globulins aggregate into poorly soluble, high molecular mass complexes when in high concentration as noted by Freitas et al. [14]. A portion of the protein also aggregated after heat denaturation with SDS and did not readily migrate within polyacrylamide gel, resulting in streaking and areas of poor resolution at relative masses greater than 75 kDa as shown in Figure 2 (Total protein, lanes R and NR). Molecular masses corresponding to complexes of proteins ( 100 to 200 kDa) were visualized in the gel and were prevalent in the total protein mixture (Figure 2, Total protein, R and NR and in size exclusion pool A, bands 1 and 2).

3.2. Size Exclusion Chromatography and SDS-PAGE

Size exclusion chromatography was used as preparative separation method for TSG. Isocratic elution from size-exclusion media via FPLC produced three major peaks in the 280 nm absorbance curve (Figure 2, Size exclusion pools A, B, and C) with the largest percentage of proteins eluting together over a range of 140–160 kDa in pool A. Post-size-exclusion fractions of the protein demonstrated a similar NR-SDS-PAGE profile in comparison to total soluble protein (Figure 2, lane NR versus size exclusion pool A), although the number of major bands present under nonreducing conditions was significantly less in the post size-exclusion pool A. Nonreducing conditions produced a pair of distinct bands at 44 kDa and 48 kDa (Figure 2, Size exclusion pool A, bands 7S and 11S).

Based on the previous characterization of the 11S proteins from N. tabacum [16], it was surmised that the 44 kDa band is 11S protein (Figure 2, Size exclusion pool A, band 11S).

Attempted anion exchange separation of intact 7S proteins resulted in poor resolution and considerable precipitation on the column (data not shown). Alternatively, for the purpose of this initial characterization, an enrichment step for the individual protein fractions was performed using a two-step SDS-PAGE procedure following size-exclusion chromatography. Individual bands were isolated from the post-size-exclusion NR-SDS-PAGE gel as indicated (Figure 2, Size exclusion pool A, bands 1, 2, 7S, and 11S), treated with -ME, and electrophoresed a second time (Figure 2, Enriched).

Protein bands at 100 kDa and 150 kDa Mr which may have, respectively, represented di- and tri-meric complexes comprised of 50 kDa structures are visible in both NR-SDS-PAGE (Figure 2, Size exclusion pool A, bands 1 and 2) and R-SDS-PAGE (Figure 2, Enriched, lanes 1 and 2) gels. It is unlikely, however, that these are complexes of 7S proteins based on the subsequent R-SDS-PAGE patterns (Figure 2, Enriched, lanes 1 and 2). R-SDS-PAGE of the excised and enriched 7S band (Figure 2, Size exclusion pool A, band 7S) displayed a large diffuse band above 50 kDa, a small but distinct band at 48 kDa, and several fragments between 20 and 37 kDa (Figure 2, Enriched, lane 7S).

3.3. Sequence Identity and Comparison

Bands excised from R-SDS-PAGE gels were subjected to trypsin digestion and subsequent mass spectrometric analysis by nanoLC/ESI-MS-MS. Four sequence fragments were identified from a protein band of 22 kDa (Figure 2, Total protein, lane R, arrow and Enriched, lane 7S, arrow) as shown in Table 1. The four masses were found to entirely match portions of the sequence from the putative vicilin protein from Solanum demissum. Of these, two were adjacent and the sequences overlapped by two residues. From the NCBI nonredundant protein database, there were no additional identical hits for the individual sequences obtained from MS/MS. A number of less homologous matches, including nonstorage proteins and functional proteins and precursors, were identified when each sequence component was compared to sequences from other members of Solanaceae (not shown).

3.4. Lectin Blotting and Carbohydrate Analysis

Concanavalin A (Con A) has a high affinity for mannose [41] found in N-linked oligosaccharide chains. Staining of N. sylvestris seed protein with Con A (Figure 3) was readily inhibited by the addition of 10 mM mannose to the incubation buffer. Monosaccharides hydrolyzed from PVDF-bound bands (unstained) after R-SDS-PAGE separation of N. sylvestris protein were analyzed by HPAEC-PAD. Quantities of each monosaccharide (picomoles) were then used to establish a ratio in relation to Man as shown in Table 2. Deacetylation of amino sugars occurs as a result of the TFA hydrolysis, and the resulting peaks corresponding to GlcN and GalN are representative of N-acetylglucosamine (GlcNAc) and N-acetylgalactosamine (GalNAc), respectively. The monosaccharide profile of the 48 kDa mass and smaller proteolytically processed components ( 20–24 kDa) showed the presence of differing glycosylation among these populations (Table 2). Glc, which is a ubiquitous contaminant, was also present and was not evaluated. No evidence of Fuc was present.

WGA lectin, which has been reported to bind to GlcNAc residues [42], was used to probe the post-size-exclusion pool A proteins. The pool A protein was separated via R-SDS-PAGE and transferred to PVDF. Although the band at 48 kDa was lightly stained by WGA, prominent staining of bands between 20 and 25 kDa was observed (Figure 4). Staining was inhibited by preincubating the lectin followed by sustained presence of 10 mM GlcNAc in the blot incubation, indicating a legitimate carbohydrate-mediated binding between the PVDF-bound glycoproteins and WGA (Figure 4).

Prediction of glycosylation for a portion of the S. demissum vicilin incorporating sequences obtained from N.sylvestris, shown in Figure 5, indicated a possible N-linked site (Asn-Gly-Ser/Thr). The lack of defined mass to charge (m/z) for a peptide sequence at this region in the N. sylvestris protein spectra was most likely due to the presence of the oligosaccharide during MS analysis. Further comparison of amino acid sequences from G. max, C. maxima, D. carota, and L. esculentum (from the mRNA), to the S. demissum peptide sequence confirms that the glycosylation point within the cupin domain at the carboxy end of each of these peptide chains is highly conserved (Figure 5).

3.5. IgG Cross-Reactivity

The hypothesis that sequence homology alone may be an indicator of antibody cross-reactivity was explored by immunoblotting. An IgG fraction prepared against G. max total seed protein was used to probe PVDF-bound extracts from N. sylvestris and C. maxima. Anti-G. max IgGs bind to proteins from C. maxima seeds and N. sylvestris (Figure 6). Staining of N. sylvestris protein is most intense at molecular masses between 20 and 25 kDa and also corresponds to the known vicilin proteins and fragments from C. maxima (MP27/MP32) and G. max (Bd 28 K and 23 kDa glycopeptides from that protein) (Figure 6, arrow), reinforcing the possibility that vicilin proteins from tobacco may be similar immunogenic agents. Further testing of N. sylvestris protein samples with sera from presensitized individuals was not undertaken in this study.

4. Discussion

A glycoprotein from the seeds of N. sylvestris has been identified that it undergoes proteolytic cleavage and exhibits homology with vicilin proteins from other members of Solanaceae as well as other plant families. The protein forms intrachain disulfide bonds, forms noncovalently bound complexes of high molecular mass, has asparagine-linked glycosylation and is recognized by antibodies raised against total seed protein from G. max.

One of the hallmarks of the 7S vicilin protein family is the lack of cysteine residues and resulting interchain disulphide bonds. The putative vicilin protein identified from N. sylvestris does, however, form intrachain disulphide bonds which can be broken under reducing conditions. Observation of the presence of cysteines in the sequence of the S. demissum vicilin to which the N. sylvestris protein fragments share homology indicated the possibility of interchain disulphide bonding between residues near the middle and toward the carboxy terminus of the polypeptide chain as indicated in Figure 5 (marked S below alignment). It can be inferred from this that proteolytic activity must occur at a midpoint between these two cysteines as the addition of -ME resulted in separation of polypeptides with masses less than those observed after the initial NR-SDS-PAGE separations. Vicilins from C. maxima and G. max share these cysteine residues at conserved positions, although it should be noted that the D. carota protein, which was reported to be isolated from tap root tissue instead of seeds, has only one cysteine residue within the cupin domain at the amino end and no cysteine residues in the carboxy cupin domain as shown in Figure 5.

Although vicilins from D. carota (AAC15238) and G. max (BAB21619) have two complete cupin regions and vicilin symmetry as a result of this, the vicilin from S. demissum (AAT40548) varies in that the amino-terminal cupin region is incomplete at the amino end and incorporates a 26-amino acid insertion. Although insertions such as this have been proposed [23] as the evolutionary mechanism that led to the emergence of the extant structure of 11S seed proteins (Figure 1), the S. demissum protein shares more significant identity with 7S proteins (as evidenced through sequence comparison, data not shown). A search for nucleotide homology (tBLASTn) limited to the family Solanceae produced a nearly identical match for the carboxy and amino terminal gene sequences of vicilin mRNA from S. lycopersicum (Figure 7), and a cDNA fragment from N. plumbaginifolia, commonly referred to as “Tex-Mex" tobacco (Figure 8). In silico translation of the mRNA from S. lycopersicum (BT013421.1) resulted in a polypeptide containing one partial cupin domain. This domain, which is toward the carboxy terminus of the peptide, has 93% identity and 96% homology with the putative S. demissum vicilin (Figure 7). At the amino end, the first 143 amino acids of S. demissum vicilin gave an identity of 76% (109/143) and homology of 78% (112/143) when compared to the S. lycopersicum sequence. Comparison of the sequences also revealed the highly conserved N-linked glycosylation site found in each of the other vicilins examined here.

The theoretical isoelectric point (pI) of the complete S. demissum vicilin sequence is 5.9 as determined by using the ExPASy online algorithm. In comparison, vicilin-like proteins from N. sylvestris were poorly solubilized below pH 7.5 and nearly insoluble at pH 6. The rapid decrease in solubility below pH 8 corresponds well with the findings detailed in a proteome analysis of endosperm and embryos from S. lycopersicum in which vicilin-like proteins isolated from the endosperm of seeds were determined to have pI values ranging from 6.9 to 8.1 [43].

Plants have many advantages as hosts for recombinant technologies, including the possibility of high yield, freedom from animal pathogen contamination, and low production cost overhead [7, 44]. It is plausible, however, that endogenous protein allergens may be coextracted along with recombinant proteins [18, 22]. The vicilin glycoproteins from soy beans, pumpkin seeds, and tap roots of carrots compared here have also been identified as allergens, and each of these proteins has a highly conserved peptide sequence (Figure 5). Previously, evidence has been presented to suggest that those individuals presensitized to vicilin proteins from one plant species may be affected by IgE recognition of vicilins derived from these and additional species [20, 45]. Although allergy is a function of IgE antibodies, cross-reactive IgG binding (Figure 6) demonstrates the possibility that antibodies of other classes may bind to epitopes present on N. sylvestris protein.

Considerable interest has been given to cross-reactivity of seed proteins in relation to the increasing occurrence of severe allergic reaction [22]. General areas of conserved peptide sequences comprising large domains across the cupin superfamily (e.g., the cupin domains themselves) appear to be similar enough to produce cross-reactivity [22]. Of particular interest is the cross-species similarity of the more carboxy-localized cupin domains (Figure 5). It is tempting to assume that homology alone is a suitable determinant of allergenicity and Ig cross-reactivity, and this has previously been addressed without a firm consensus having been established [18]. It is possible that antibody cross-reactivity in vivo is dependent upon a combination of conformational similarity and sequence homology [18, 22, 46] and not exclusively one or the other. Given the degree of homology between the putative vicilin we have identified from N. sylvestris and the other vicilins discussed here, cross-reactivity prediction on the basis of peptide sequence homology alone is reasonable. Our results demonstrating that antibodies against G. max total seed protein also bind to proteins from C. maxima and N. sylvestris are in agreement with the above supposition.

Based on compositional analysis of monosaccharides released by acid hydrolysis, glycosylation of the vicilin-like protein from N. sylvestris is not uniform among subunits and fragments (Figures 3 and 4, Table 2). High molecular mass bands (Figure 3, band A), representative of vicilin monomers not subjected to proteolytic modifications, showed a nearly two-fold presence of GlcN with respect to Man, while the 22 kDa (Figure 3, band B) fragments separated under reducing conditions produced a GlcN : Man ratio closer to 1 as shown in Table 2. Interestingly, when exposed to anti-G. max IgG, the 22 kDa fragments appeared to stain with greater intensity than the larger ( 50 kDa) mass (Figure 6, arrow, lane 3). These bands also stained with Con A and WGA lectins (Figures 3 and 4). The vicilin Bd 28K from G. max was reported to have a dependence upon the N-linked glycans attached to that protein in the binding of serum IgE from patients with G. max allergy [47]. It was determined that the Asn41 residue of Bd 28K is glycosylated (this site is shown in Figure 5) and that IgE binding to the protein appeared to be greatly enhanced by the presence of the N-linked oligosaccharide. Later work focusing primarily on the carboxy region of Bd 28K and on its proteolytically derived 23 kDa fragments demonstrated that Asn349 is glycosylated and that sera from G. max-sensitive patients showed reduced binding to fragments produced in E. coli due to the absence of the oligosaccharide [24]. However, recombinant peanut proteins rAra h2 and rAra h3, also produced in E. coli and therefore lacking glycosylation, were found to be allergens comparable to their counterparts naturally produced in peanut seeds [48, 49]. From these contrasting findings, it may be inferred that epitope similarities between seed proteins which originate both from the amino acid sequence and carbohydrate modifications have the potential to produce an immune response in presensitized individuals.


CBB:Coomassie Brilliant Blue
FPLC:Fast protein liquid chromatography
HPAEC-PAD:High-pH anion exchange chromatography with pulsed amperiometric detection
NanoLC/ESI-MS/MS:Nanoscale liquid chromatography-electrospray ionization tandem mass spectrometry
SDS-PAGE:Sodium dodecyl sulfate polyacrylamide gel electrophoresis


J. Q. Gerlach and L. Joshi would like to thank Amy D. Smith, Mostafa Sheykhnazari, Miti Shah, Vinay Nagaraj, Michelle Kilcoyne, and Sasha Daskolova for valuable discussions and analysis advice. P. A. Haynes would like to thank Vicki Chandler and Stuart Bourne for support. Funding for this research was provided by the Biodesign Institute, Arizona State University, the Wallace Research Foundation, and Bio5 Institute.