Abstract

Cryo-electron microscopy (cryo-EM) has produced density maps of various resolutions. Although α-helices can be detected from density maps at 5–8 Å resolutions, β-strands are challenging to detect at such density maps due to close-spacing of β-strands. The variety of shapes of β-sheets adds the complexity of β-strands detection from density maps. We propose a new approach to model traces of β-strands for β-barrel density regions that are extracted from cryo-EM density maps. In the test containing eight β-barrels extracted from experimental cryo-EM density maps at 5.5 Å–8.25 Å resolution, StrandRoller detected about 74.26% of the amino acids in the β-strands with an overall 2.05 Å 2-way distance between the detected β-traces and the observed ones, if the best of the fifteen detection cases is considered.

1. Introduction

Cryo-electron microscopy (cryo-EM) has become a major experimental technique to study structures of large protein complexes [1, 2]. Many large complexes have been resolved to about 3 Å resolution recently [3, 4], at which the position of protein backbone can be distinguished. For cryo-EM maps at lower resolutions, such as 5–8 Å (referred to as medium resolution in the paper), detailed molecular features are not resolved. It is a challenging problem to derive atomic structures from such density maps. Two types of approaches have been proposed. Fitting relies on a suitable atomic structure [59] and de novo modeling relies on the match of secondary structures between those in the density map and those in the protein sequence [1017]. The most characteristic patterns in cryo-EM density maps at medium resolutions come from secondary structures of a protein chain. An α-helix often appears as a cylinder or a stick and can be identified using image processing methods [14, 1822]. A β-sheet consists of multiple β-stands (Figure 1). A β-sheet often appears as a thin layer of density and can be detected computationally [14, 20, 2325]. The spacing between two neighboring β-strands is between 4.5 and 5 Å, and therefore β-strands are not resolved at medium resolutions [26, 27].

The position of β-strands provides important constraints in backbone modeling of a protein. We previously proposed an approach to predict the location of β-strands using StrandTwister [28]. StrandTwister is built on the principle of right-handed twist of β-strands that was discovered as early as 1970s [29]. The right-handed twist was measured along the peptide orientation as about 0° to 30° per residue [30], and an estimation of the strand orientation was proposed using the corners of a β-sheet [31]. StrandTwister was able to detect β-strands from single β-sheets. However, β-sheets have a variety of shapes that add complexity for β-strands detection. Some β-sheets appear as rolls and propellers, and others are β-barrels (Figure 2). Currently there is no computational method to detect β-strands from a β-barrel density map. In this paper, we propose a method to predict β-strands by utilizing prior knowledge about β-barrels. The proposed method StrandRoller is quite different from the method of StrandTwister.

A β-barrel is a large β-sheet in which the first β-strand is hydrogen-bonded with the last β-strand. β-barrels are commonly found in porins and other proteins that span cell membranes [32]. It was noticed by McLachlan in 1979 that the number of strands and their relative stagger completely determines the overall structure of a β-barrel [33]. The main structural characteristics of an ideal β-barrel have been discussed based on a cylindrical barrel [3335]. Studies have shown that tilt angle and interstrand distance for all β-barrel structures vary within a fairly small range [3537]. Our method, StrandRoller, is designed to utilize such characteristics of tilt angles and interstrand distance.

A helix identified from the medium resolution cryo-EM map is often represented as a line (colored line in Figure 1), referred to as an α-trace that corresponds to the central axis of a helix. We define a β-trace as the central line along a β-strand. In particular, the observed β-trace is the line interpolating all geometrical centers of three consecutive Cα atoms on a β-strand plus two Cα atoms at the ends of the β-strand (black line in Figure 3). An observed β-trace represents the line along the atomic structure of a β-strand (Figure 3). Given the β-barrel density voxels, the problem of β-strands detection is to find the orientation (Figure 3(a)) and location (Figure 3(b)) of β-traces from the three-dimensional cryo-EM map.

Preliminary result of StrandRoller was shown in [39]. Details of the method and more thorough tests on different sizes of β-barrels are shown in this paper. In addition, we tested our method on eight pieces of experimental cryo-EM data that were downloaded from EM DataBank. The result suggests that StrandRoller can be used for the prediction of β-traces from medium resolution cryo-EM density maps of β-barrel when the rough barrel region is segmented.

2. Method

β-Barrels have a characteristic shape. Atomic structures of β-barrels have been modeled as hyperboloid surfaces [4042] or catenoid surfaces [43]. Although atomic structures of β-barrels are different from β-barrel density maps, many small β-barrels visually appear as cylinders with nonuniform ends. Although various models can be used to approximate the major area of a β-barrel, β-barrel density maps often deviate from the mathematical models at certain regions. In order to create a surface well representing the density map, we used a simple elliptical model initially and adjusted the model at those regions that do not fit. Strand generation was then performed on the adjusted barrel surface model (Figure 4). We assume that the β-barrel density has been segmented from the entire cryo-EM density map.

2.1. β-Barrel Surface Model from Cryo-EM Density Voxels

In order to represent the shape of a β-barrel, we reduced the β-barrel density map to a surface. Two steps are involved in creating the surface model. The first step involves identification of the axis of the barrel density map (Figure 5(a)). The barrel density map was first translated to global origin based on its geometric center. An elliptical cylinder (1) was then utilized to search for the orientation of central axis. The orientation was selected using exhaustive search and least square fitting to the cylinder. The entire barrel density was then rotated such that the central axis aligns with the -axis (Figure 5(b)).

Instead of a mathematical formula, our β-barrel model consists of a thin layer of density voxels that closely represents the morphed barrel shape and outline of barrel at two ends (yellow points in Figure 5(e)). The barrel model was generated using cross-sections from bottom to top of the volume. The density voxels on each cross-section of -axis (Figure 5(c), gray) appear to be nearby the ideal model of ellipse. The voxels that are closest to the ideal ellipse were selected as the points on barrel model (Figure 5(c), yellow). Note that such a discrete model closely represents the three-dimensional distribution of the voxels. For example, when the fitted ideal elliptical cylinder is outside the density (arrows in Figure 5(c)), the voxels on the density map were used to adjust the barrel model. It appears that the resulting barrel model clearly represents the morphed regions, especially at the two ends of barrel (arrows in Figure 5(e)). We find that it is important to have an accurate barrel surface to model the β-traces accurately.

2.2. Strand Generation on the Barrel Model

McLachlan noticed in 1979 that the number of strands and their relative stagger completely determines the overall structure of a β-barrel [33]. The main structural characteristics of ideal β-barrel have been discussed based on a cylindrical barrel [3335]. Studies have shown that tilt angle (Figure 5(f)) of a β-strand can vary between 30° and 60°, as reflected in the known structures of membrane proteins [3537]. The tilt angle may vary by ° for different strands in the same β-barrel [37]. However, the interstrand distance remains to be 4.5~5 Å due to hydrogen bonds between two neighboring β-strands. StrandRoller uses previous knowledge about the tilt angle and the interstrand distance in the modeling of β-strands.

An initial β-trace (blue in Figure 5(f)) was produced by tilting the barrel axis with and projecting it onto the barrel surface model. The second β-trace was then generated from the previous one by traveling a horizontal distance (2) on the barrel surface (Figure 5(f)).

Given a tilt angle, the entire set of β-traces can be built iteratively on the barrel surface (Figure 5(g)) until the last β-trace is generated (Figure 5(f)). The tilt angle was sampled every 5° between 35° and 55°, and three translations were sampled at each tilt angle. There are fifteen sets of β-traces generated for one barrel volume. Note that the fifteen sets are within a very small range of tilt angle (20°) and translation distance (4.8 Å). The barrel density along with the detected β-strands was eventually translated and rotated back to the original position in the map after the detection is done (Figure 5(h)).

3. Result

StrandRoller was tested using three sets of β-barrel density maps: eighteen small simulated maps, fourteen large simulated maps, and eight experimental cryo-EM β-barrel maps. The proteins used in the simulated test set were collected from the β-barrel transmembrane superfamily of Orientations of Proteins in Membranes (OPM) database [44] with less than 40% sequence similarity. The atomic structures of β-barrels were used to generate β-barrel density maps at 10 Å resolution using the pdb2mrc function in EMAN [38], with a sampling of 1 Å/pixel. The experimental cryo-EM density maps were downloaded from EMDB (http://www.emdatabank.org/). Since atomic structures are available for such cryo-EM maps, the density region that corresponds to one chain of the protein was first segmented using the atomic structure as an envelope. The β-sheet voxels were then manually outlined based on the atomic structure of the β-barrel. Such segmented testing maps bare the characteristic of a β-sheet and have an outline of a β-barrel. The accuracy of β-strand detection was evaluated using two parameters as previously implemented [28]: 2-way distance between the set of detected β-traces and the set of observed β-traces and number of amino acids covered in the detected β-trace. The observed β-trace is the line interpolating all geometrical centers of three consecutive Cα atoms on a β-strand plus the two Cα atoms at the ends of β-strand, as shown in Figure 3.

In order to estimate how much of a β-strand was detected, the percentage of the detected Cα atoms of an observed β-strand was calculated. An amino acid of a β-strand is considered detected if the projection distance from its Cα atom to the corresponding detected β-trace is less than 2.5 Å, which is about half β-strand spacing. Since the number of detected β-strands may be different from the number of observed β-strands, one-to-one correspondence needs to be established between subsets of the β-traces. For example, if detected set contains five β-traces while observed set contains six β-traces, five out of the six observed β-traces which have the overall smallest 2-way distance with the five detected β-traces will be selected for the calculation of 2-way distance. This ensures that the same number of detected β-traces ( is compared to the same number of observed β-traces in which is compared with for. The number of misdetected (and/or wrongly detected) β-strands can be inferred from the difference between the total number of the observed and that of the detected β-traces. The 2-way distance of a β-strand, was calculated for each pair of lines and . The overall 2-way distance reflects how far the two sets of β-traces (detected and observed) are from each other.

In formula (3), and are the numbers of points on detected β-traces and observed β-traces , respectively. and are the indices of a point along lines and , respectively. is the projection distance from point of to . The projection of point is required to be within line . In case it is outside, the distance between and an end of was used as an approximate distance.

3.1. Performance on the Simulated Density Data

The purpose of this test is to investigate if traces of β-strands can be modeled from β-barrel density maps simulated to 10 Å resolution, at which the separation of β-strands is not visible. To discuss the ability of our β-trace detection, we use the best of fifteen sampled sets. The best set is the one that is closest to the observed set in terms of 2-way distance.

3.1.1. Small-Medium Barrels

Small-medium β-barrels refer to those with less than 15 β-strands in each. The test of eighteen simulated small-medium sized β-barrel density maps shows that one of the fifteen sets of β-traces aligns very well with the observed set of β-traces, with an overall 2-way distance of 1.61 Å for the detected β-traces (Table 1). In the case of sheet A13 of PDB structure 1G7K, the detected set of β-traces appears to align with the β-strands very well (Figure 6(a)). In this case, all the eleven strands were detected with a small 2-way distance of 1.8 Å (Table 1 row 1).

To analyze the sensitivity of the detection, we calculated the percentage of the detected Cα atoms of an observed β-strand. For example, 1TX2_B has all eight β-strands detected (Table 1 row 5). It missed three amino acids. For the eight detected β-strands, the 2-way distance is only 0.92 Å. Among the eighteen test cases, StrandRoller appears to be able to detect 78.26% of the β-strands fairly accurately in one of the fifteen sampled sets of β-traces (Table 1). Seventeen test cases have the number of β-strands detected the same as observed. The number of detected amino acids and 2-way distance are two parameters that have been used previously in accuracy measurement. Length-association method was proposed recently and can be a potentially more sensitive method to evaluate secondary structure detection [45].

3.1.2. Large Barrels

Large barrels in this paper refer to those with more than 15 β-strands. Large barrels appear to be more challenging. Some extremely large β-barrels, such as the 22-stranded β-barrels 2GUF_D23 and 2HDI_D23 (Table 2), were still well detected. The 2-way distance is 2.03 Å in the case of 2GUF_D23 (Table 2 row 1) and 1.93 Å in the case of 2HDI_D23 (Table 2 row 2). β-Barrel 2GUF_D23 has twenty-one of twenty-two β-strands detected (Table 2 row 1). It missed sixty-six amino acids in which most are at the edge (arrows in Figure 6(b)). For the twenty-one detected β-strands, the 2-way distance is 2.03 Å. Among the fourteen test cases of large sized β-barrels, StrandRoller appears to be able to detect 69.46% of the β-strands fairly accurately in one of the fifteen possible sets of β-traces, with an overall 2-way distance of 2.12 Å for the detected β-traces (Table 2).

It is noticed that the performance of StrandRoller is better on small-medium sized barrel than on the large sized barrel. Large β-barrels are more likely to adopt flexible shapes. The missing detection appears to be more at the edge of large β-barrels, where the β-strands tend to be more flexible (arrows in Figure 6(b)). The number of β-strands in large β-barrels also tends to be hard to detect due to the error accumulated during strand generation step. Since each β-trace is deducted from the previous generated one, error could be propagated while traveling around the barrel.

3.2. Performance on Experimental Cryo-EM Data

StrandRoller was tested using eight β-barrels obtained from experimental cryo-EM density maps. The eight test cases are small ribosomal proteins in which the first β-strand is hydrogen-bonded with the last β-strand. Experimental data are often more challenging to analyze due to the noise and missing density. Figure 7 shows three density regions that were segmented from cryo-EM maps at 5.8 Å, 5.5 Å, and 6.7 Å resolutions, respectively. At these resolutions, β-strands are not visible in density maps. StrandRoller was able to detect all β-strands on the barrels and they align fairly well with the observed β-traces. In the case of 70S ribosome EMD_1657 (sheet AH4 in protein 4V5H), the 2-way distance for the five-stranded barrel is 1.94 Å, and it detected 25 of 30 amino acids on the β-barrel (Figure 7(a)). In the case of 80S Ribosome EMD_1780 (sheet_AH4 in protein 4V7E), the 2-way distance for the six-stranded barrel is 1.88 Å, and it detected 24 of 28 amino acids on the β-barrel (Figure 7(b)). We noticed that the eight β-barrels in the cryo-EM maps are all small barrels with less than nine β-strands. StrandRoller appears to be fairly accurate in detection of β-strands from such cryo-EM maps with an overall 2.05 Å 2-way distance and 74.26% of amino acids detected (Table 3).

4. Discussion

We previously showed that the accuracy of β-strand detection is affected by the accuracy of β-sheet detection [28]. This is also true in the context of β-barrels. A β-barrel is a closed structure, and the number of β-strands may be estimated from the diameter of the barrel. Figures 7(c) and 7(d) show the β-traces detected from β-barrel 5036_4V69_AD5 that were manually extracted from the cryo-EM density map using two different segmentations. The segmentation in Figure 7(c) is more conservative than that in Figure 7(d). The relaxed segmentation of the same barrel includes more density volume at the edge of the barrel. Although the 2-way distance is 1.73 Å in the more conservative segmentation versus 1.83 Å in the other, the main difference in the resulting β-traces appears to be the length difference. Both detected the same number of β-strands with similar orientation and position. Our result suggests that the number of detected β-strands is not sensitive to the density segmentation errors at the two ends of β-barrel.

5. Conclusion

The position of β-strands is critical for modeling atomic structures of proteins. However, it has been a challenging problem to detect β-strands when no separation of the β-strands is visible from the density maps. The variety of shapes of β-sheets adds the complexity of this problem. We previously proposed StrandTwister to detect β-strands from single β-sheet using right-handed twist [28]. We propose a new method to predict β-strands from a β-barrel density map directly using the characteristic tilt angles of the β-barrel. This approach bypasses the need to measure twist angles. Our results show that this approach is feasible. As long as the rough density region of a β-barrel is isolated from the entire density map, location of β-strands can be modeled. However, current limiting factor is the lack of automatic detection methods of β-barrels from a cryo-EM density map. In fact a β-barrel has a fundamental shape character in which a hole is surrounded by a β-sheet. However, accurate detection of β-barrels needs to consider different characters of the hole depending on different sizes of β-barrels. We are hopeful that such a detection tool will be available in the near future.

StrandRoller does not require the resolution of cryo-EM density map to be higher than 5 Å to resolve the separation of β-strands. It applies to the maps with lower resolutions. In the test containing eight experimental cryo-EM β-barrel maps between 5.5 Å and 8.25 Å, StrandRoller detected about 74.26% of the amino acids in the β-strands in one of the fifteen sets of predicted traces. We demonstrate again that it is possible to derive β-strands from density maps at medium resolutions. To our knowledge, StrandRoller is the first method that attempts to address the problem of β-strands detection from medium resolution β-barrel maps. Future work includes developing more accurate methods in identification of β-traces and generating alternative β-traces for further evaluation in modeling.

Competing Interests

The authors declare that they have no competing interests.

Authors’ Contributions

Dong Si and Jing He developed the method. Dong Si implemented the method and conducted the tests. Dong Si and Jing He wrote the manuscript.

Acknowledgments

The work in this paper is partially supported by NSF DBI-1356621, NIH R01-GM062968, and the FP3 fund of the Old Dominion University.