Abstract

This paper introduces an original algorithm for the labeling of the regions of a partitioned image according to the stacking level of membranes in transmission electron microscopy (TEM) images. Image analysis of membrane protein TEM images represents a particular challenging task because of the important noise and heterogeneity present in these images. The proposed algorithm adapts automatically to fluctuations and gray level ranges characterizing each membrane stacking level. Some information about the organization of the objects in the images is introduced as prior knowledge. Three types of qualitative and quantitative experiments have been specifically devised and implemented to assess the algorithm.

1. Introduction

Biological objects appear fairly transparent in electron microscopy images, as they are weakly scattered by the electron beam that crosses them. The superposition of several objects is frequent when a sample in suspension is dropped on a support film; the image gray level is then determined by the set of successive layers. Incidences of occlusion, which complicate object recognition in other imaging techniques, are here replaced by an overlapping of slightly opaque elements.

This paper introduces an estimation technique of the object stacking level of each region in a prior segmented TEM image, based on the mean gray level and the neighborhood of the region. The important noise in the images and the various sources of fluctuations increases the difficulty of the task. The proposed solution defines on one hand a local contrast level, on the other computes the classification thresholds for each image using some a priori knowledge on the nature of the observed samples.

This study lies within the more general scope of the automation of the analysis of the 2D crystallization process of artificial membrane proteins in transmission electron microscopy (TEM) [1]. The samples are placed in an auto-loader (of 96 grids) and analyzed without human intervention. Our team has developed image processing algorithms that interact with the electron microscope to reproduce the microscopist's screening steps. A preliminary study led to the conclusion that mimicking the 3-step approach of the microscopist would be best. First, the overall gridquality is rapidly assessed at very low magnification (about ×50–300) to retain a number of nondamaged grid squares that are interesting for further analysis. These regions are then acquired at medium magnification (about ×2000–5000) to select potentially interesting regions. Finally, the diffraction pattern of these latter regions observed at high resolution (about ×20000–50000) is analyzed to assess the crystallinity of the sample. An experimented microscopist can screen a grid for crystals in 15–20 minutes and give a quality grade. During these concentrated moments he switches back and forth between the medium magnification mode to choose a region of interest and the high magnification mode to check the diffraction pattern.

In this paper we concentrate on the selection of potentially crystalline regions at the medium magnification level. The biologist's brain, with a good bit of training, learns to discriminate these regions. This capability is however very challenging to mimic with a computational approach. The information that we think is relevant for this region selection is the stacking level of the membranes combined with their sizes and contour characteristics.

The proposed method estimates, for each region, the gray level variation compared to the empty support film, and is followed by a recursive analysis of the segmented image to estimate a membrane stacking level. The original contribution is to select all the regions belonging, with high confidence, to stacking level 𝐿, based on two criteria: the region has a mean gray level weaker than the threshold at level 𝐿1, and the region is contiguous to at least one region at level 𝐿1. These criteria are justified by the nature of the analyzed samples. A preselection of stacking level 𝐿 regions allows to determine a characteristic contrast 𝑄𝐿 and to extend the level 𝐿 labeling to all regions presenting a similar gray level range.

Section 2 presents the general context of this work. Section 3 introduces the characteristics of the membrane images acquired in TEM. Section 4 presents our algorithm of threshold determination for the identification of object stacking level. Section 5 presents the results analyzed both qualitatively and quantitatively.

2. General Context

Several studies have been performed to develop the automation of each of the methods leading to the structural analysis of proteins: Wilson [2] proposed a method for the evaluation of 3D-crystallization experiments using a neural network classification from characteristics extracted with images tools; Zhu et al. [3] contributed to the automatic selection of single particles in Transmission Electron Images; For the assessment of 2D-crystallization experiments, two software packages are suitable: First, Oostergetel et al. [4] developed the GRACE package, a semiautomatic tool where Regions Of Interest (ROI) are selected manually by the user who targets the potentially crystallized membranes. Second, the Leginon [5] application, initially developed for single particle experiments, has been extended for the detection of well contrasted rectangular crystals. To enhance the automation of 2D-crystallization, the HT-3DEM [6] project, in which this study stands, has lead to the elaboration of a fully automated platform, which comprises the control of the microscope as well as the analysis of the images.

Considering the specimen (biological membranes), and the image acquisition process (Transmission Electron Microscopy), we deduce that the gray level of the membranes is a function of their stacking level. Therefore, the local stacking level assessment uses a classification of regions based on their gray level histograms. As it will be seen, this task is far from being trivial: indeed, the histogram of such images gives no clue concerning the different classes; the uni-modal histogram is not sufficient to extract the classes given by the specialists. The dotted line in Figure 1, represents the gray level histogram of the TEM image in Figure 2(a). This histogram is clearly unimodal. It is therefore very difficult to identify pixels belonging to different classes: those from nonstacked objects (red histogram), those from bistacked objects (green histogram), or those from overstacked objects (cyan histogram). The black histogram refers to background pixels. An efficient algorithm has therefore been developed to identify each class.

Many thresholding algorithms have already been proposed; they are generally used for bilevel thresholding, to identify foreground objects from the background [7, 8]. Many of them are extendable to multilevel thresholding as well. Sezgin and Sankur [8] proposed to classify these methods in six groups, according to the information they are exploiting: (i) histogram shape-based methods, (ii) clustering-based methods, (iii) entropy-based methods, (iv) object attribute-based methods, (v) spatial methods and (vi) local methods.

The first group of methods (i) uses histogram characteristics, such as peaks, valleys or curvatures. For our kind of images, such methods can only help in the identification of the background, and only if the histogram is multimodal. For the estimation of thresholds separating the membrane stacking level classes, these approaches are not suitable: in most cases, there is a strong overlapping between the pixel gray level distributions of those classes, and the resulting image histogram does not have the characteristics used by methods (i). The second group of methods (ii) can only be used to identify two clusters, which correspond to the two lobes of the histogram. The classical Otsu [9] algorithm belongs to this group. Modifications have been proposed for fast multilevel thresholding, for instance by Liao et al. [10] or by Hammouche et al. [11] who use a genetic algorithm. However, multimodal histograms are required. In entropy-based thresholding methods (iii), the background and the foreground are considered as two different signal sources. Thresholding methods based on attribute similarity (iv) require the objects to have identifiable specificities, such as shapes, or textures. Crystals do have specific textures due to the arrangements of proteins within the membranes, but it can only be observed at very high magnification (about 0.5 nm per pixel). The structure cannot be observed at our magnification (about 15 nm per pixel), and membranes can take any shape. Therefore, most of the methods (i) to (iv) are used to distinguish the background from the foreground. The last two classes seem more appropriate to our difficult context. Spatial thresholding methods (v) have the advantage of using both the pixels gray level and their spatial relationships. These methods allow the insertion of a priori knowledge, such as context probabilities, local linear dependence models of pixels. Locally adaptive thresholding methods (vi) compute thresholds considering parameters which vary in the image. White and Roher [12] propose an interesting approach, which uses local contrast for character recognition. This method is compared to other methods in [13]. The benefit of local approaches is significant in noisy and heterogeneous images having nonhomogeneous illumination.

3. Comments on Image Characteristics

Membrane protein crystallization experiments are conducted with lipidic components in a liquid environment. The lipids aggregate to create an artificial membrane and when the right experimental conditions are met, proteins are densely and regularly inserted within the membrane, forming the crystalline structure which can be detected at high resolution. The analysis of resulting images at medium magnification has two objectives: the global evaluation of the experiment in terms of membrane class distribution, size distribution, and so forth, and the selection of potentially crystalline regions to be analyzed at high resolution. The stacking level criterion is therefore the important characteristic to achieve both goals.

3.1. Image Content

Figure 4 shows three examples of medium magnification TEM images. They illustrate the diversity of the membrane objects, in terms of shapes, sizes and contrasts. Protein membranes appear in various shapes. For example, sheets have generally no particular geometric attributes, except when they are crystallized where they tend to present rectilinear edges (Figure 2(a)). Vesicles appear as spherical objects collapsed on themselves (Figure 2(b)). Figure 2(c) shows a very crowded image, with numerous objects, consisting mostly of sheets and a few vesicles. Artifacts such as drops, overstained regions, and so forth, can also appear in some cases and need to be identified in the classification phase.

The superposition of membranes is a natural random process that cannot be avoided or controlled, which appears when the sample is dropped on the carbon support film covering the 3 mm grid typically used in TEM. The stacked membranes can be compared to sand piles, the amount of membranes being higher at the center than at the periphery. This property will be used as prior knowledge in our classification algorithm.

The classification of membrane regions according to their stacking level is also complicated by the variability induced by the formation of the images.

3.2. Image Formation

Images of biological samples in TEM are very noisy and low contrasted. To enhance the contrast and to protect the sample from being damaged by the electron beam, negative staining using heavy metals is used. This process of diffusion leads often to an heterogeneous repartition of the stain. High concentration of stain on some parts of the grid leads to artefacts in the observed image. The way stain deposits itself on membranes is dependent on their structure. For aggregated membranes, stain can even sometimes infiltrate between layers of membranes.

Also, image quality may vary greatly according to the settings retained by the microscopist. Parameters which may vary from experiment to experiment include the illumination intensity, its coherence and the defocus. (i)The illumination intensity influences the contrast of the objects, better imaged with high illumination (limited in practice to avoid damaging the sample). (ii)The illumination coherence factor affect the background of microscope images which can be heterogeneous even when the support film is empty. (iii)The defocus parameter is linked to the contrast transfer function of a TEM and is another parameter which can lead to heterogeneities.

Because of these numerous parameters the evaluation of the stacking level based on the gray level image is made all the more difficult. Therefore, the characteristics of the stacking level should be evaluated for each image, since the thresholds and classes defined in one image are not applicable to others. In addition, heterogeneities of the illumination imply that the stacking level cannot be associated to an absolute gray level. It is therefore better to rely on the notion of locally contrasted objects as will be shown in the next section.

4. Stacking-Level Extraction Algorithm

In the proposed method, different thresholds are extracted in an iterative manner. Each threshold is computed in a two-step process. During the first step, regions having a high probability of belonging to the considered stacking level are selected based on a priori knowledge. The second step is a generalization step during which all the regions of the considered class are identified. Local contrasts are used to cope with the gray level heterogeneities generated during the image formation. This iterative two-step approach can be used until the number of thresholds is sufficient to label each object. The stacking level algorithm is presented in Figure 3.

4.1. Segmented Images and Main Background

The proposed algorithm takes partitioned images as input. Each region is represented by its average gray level and will be labeled according to its stacking level 𝐿. This is a region-based labeling, and therefore any appropriate image segmentation that partitions the image in homogeneous regions in terms of stacking level can be applied. Level 𝐿=0 defines the background regions: image regions where no object has been dropped on the support film. The corresponding background gray level is needed to initialize the recursive algorithm. It is therefore essential to first determine a characteristic background region. The observation of the organization of the sample in the image leads to a simple algorithmic rule: membrane objects never fully cover the support film, a contiguous large high intensity region always exists that is easy to identify and will be designated as main background 𝐵. All other regions are initially assumed to be foreground and are noted 𝑅𝑛 (with 𝑛, from 1 to 𝑁, the index of the region). Aside from the main background 𝐵, the algorithm will also identify all the remaining background regions.

4.2. Step 1: Knowledge-Based Selection

Each stacking level 𝐿 is characterized by an average contrast value 𝑄𝐿 obtained by averaging the contrast of automatically selected reference regions 𝑅, as presented in Section 4.2.1. This contrast 𝐶𝑚 between the region 𝑚 and the background is defined by:𝐶𝑚=𝐺𝑅𝑚𝐺𝐵𝑚,(1) with 𝐺𝑅𝑚 being the average gray level of the region 𝑅𝑚 and 𝐺𝐵𝑚 the average gray level of the associated background. The local approach is discussed in Section 4.2.2. We hence obtain𝑄𝐿=𝑀𝑚=1𝑡𝑅𝑚𝐶𝑚𝑀𝑚=1𝑡𝑅𝑚,(2) where 𝑡𝑅𝑚 represents the size of 𝑅𝑚.

4.2.1. Automatic Selection of Reference Regions 𝑅

Regions used to extract 𝑄𝐿 must be carefully chosen since there should be a high probability that they belong to the class 𝐿. To avoid bias in the computation, a priori knowledge is introduced (see Section 3). (a)Neighborhood. Between adjacent regions, the one that will be labeled as nonstacked is the brightest one. Background 𝐵 is not considered. An algorithm looking for local minima in regions is used. Figure 4(a) (obtained from gray level image in Figure 2(b)) shows: in white, regions corresponding to this criterion during the first iteration; in black, the identified background; in gray, nonselected foreground. (b)Background Proximity. In case of membrane superposition and due to the nature of our biological objects, the more piled-up regions are situated in the center of the membrane pile rather than at its periphery. Therefore, among regions selected by step (a), only those that are adjacent to the background or to regions having a lower stacking level are considered. Figure 4(b) shows these regions at the first iteration.

Very small regions are rejected even though they meet the characteristics of both previous criteria. Noise in the images may lead to a bias in the positioning of the contours extracted during the segmentation. Confidence in those small regions is low, and they are therefore ignored in the computation of 𝑄𝐿.

The selection of the regions is based on statistical characteristics; some regions belonging to stacking levels above level 𝐿 may therefore be selected. Those erroneous selections are however rare, and they do not bias the value of 𝑄𝐿 enough to compromise the final classification of the objects.

4.2.2. Local Approach for the Contrast Measurement

In (1), the contrast is measured regarding 𝐺𝐵𝑚, the average gray level of a portion of the background taken near region 𝑅𝑚. This local approach limits the bias that would be induced by averaging the whole background region. Globally, the images are often corrupted by spurious intensity variations. These imperfections are due to various reasons discussed in Section 3.2 (image formation process, negative staining artifacts, carbon film support). This problem must not be ignored since it leads to false classifications, like misclassifying very low contrasted membranes as background instead of nonstacked membranes.

The correction of the fluctuations of the background intensity, known as shading, is very difficult. Several methods have been considered in the literature. Tomaževič et al. [14], for example, describes the problem of optic devices and proposes a comparative evaluation of retrospective shading correction methods. Most of the methods rely on a simplified model of the shading, by considering it as the sum of two components: one additive and one multiplicative. These methods can easily be transposed to electron microscopy. One of them, the surface fitting technique seemed the most appropriate and was also applied to TEM images [15]. This method has been tested on our images, but it did not obtain good results. Surface fitting uses a few points of the background to model gray level variations with a least-square-based approximation. Fitting methods deal with additive or multiplicative issues. However, the model, most of the time a second-order polynomial, is not satisfying. Furthermore, the proportion of background and its repartition in the image can be limiting factors.

Since none of the above methods are satisfying, a local approach has been developed. Indeed, undesirable fluctuations are generally spread all over the image. At a local scale, fluctuations are less significant. 𝐺𝐵𝑚 defines the mean gray level of a 50×50 pixel area (for a 1024×1024 image with 15nm/pixel field of view) neighboring the considered region 𝑅𝑚. Similarly, if the region is large enough, the average gray level of this region is also measured on a 50×50 pixels centered window. With this method, the measure is less influenced by the gray level fluctuations due to possible stain concentration near the contours, or due to a bias in the contour positioning. Figure 4(c) shows regions 𝑅 with their associated background regions. The 50×50 pixels area is a compromise that allows having a good evaluation of the region gray level, that is sound considering the large sizes of the membrane regions, and that locates the background region closest to the associated region.

4.3. Step 2: Threshold-Based Selection

After the computation of 𝑄𝐿, a contrast range is defined; this range corresponds to the average contrasts representing stacking level 𝐿. The contrast 𝐶𝑛 of each nonlabeled region 𝑅𝑛 is compared to this range to identify whether it can belong to stacking level 𝐿 or not. The remaining regions are therefore more stacked, and will be used during the next iteration for the computation of 𝑄𝐿+1.

The limits of each class, 𝑄𝐿 and 𝑄+𝐿, are defined as follows. For 𝐿>1:𝑄+𝐿=1.5𝑄𝐿,𝑄𝐿=𝑄+𝐿1.(3) For 𝐿=1 we have: 𝑄+1=1.5𝑄1,𝑄1=0.2𝑄1.(4) The 0.2 and 1.5 factors have been experimentally adjusted to obtain a convenient compromise. For 0.2, for instance, based our representative test images, we have observed that if this factor is too small, more background regions are classified as nonstacked, and it is too big, membrane regions are labeled as background.

4.4. Concluding Remarks

Figure 5 shows the color code used to present the labeling results. The various thresholds and average contrast values are also represented. Five classes are distinguished: “level 1” to “level 4”, respectively, in red, green, cyan and magenta, and “background” in yellow. “level 1” corresponds to the class of nonstacked membranes; “level 2” corresponds to the class of bistacked membranes; “level 4” corresponds to everything that is above “level 3” and corresponds to overstacked regions. If 𝐶𝑛<𝑄1, then 𝑅𝑛 is considered as a “background” region. It is therefore possible to refine the background detection, especially when it is composed of several parts, as illustrated by Figure 6.

The superiority of the local approach is illustrated by comparing Figures 7(a) and 7(b). Indeed, in the left image, labeled without using the local approach, some regions have been labeled as background (in yellow) although they normally belong to stacking level 1, the nonstacked membranes. Let us take region A in Figure 7(a) as an illustration: region A (mean gray level: 5.42104) appears clearer than main background 𝐵 in black (mean gray level: 5.34104). Using the local approach, in Figure 7(b), this same region is rightly labeled: the local background is clearer than the region (local mean gray level: 5.66104).

5. Experimental Validation of the Algorithm

In this section a number of experiences are introduced and implemented to assess the validity of the proposed method. Biological images are complex and their evaluation is not easy. The performances of the algorithm are therefore presented both qualitatively and quantitatively.

In the experiences presented, the stacking level evaluation algorithm has been applied to images of 2D-crystals acquired with a 1024×1024 CCD camera on the Tecnai T12 (FEI) transmission electron microscope at the Biozentrum, Basel. Images have been acquired under various conditions, using samples from various 2D-crystallization experiments.

The images have been partitioned using a split-and-merge method specifically developed for such images [16, 17]. Based on a multiresolution edge detection for the splitting step, and a statistical transition validation for the merging step, the output is an image where the foreground is identified and split into relevant membrane regions, that means adjacent membranes as well as membranes of different stacks are well identified and separated. The background detection algorithm is presented in [18]. It is based on the hypothesis that the largest and brightest region corresponds to the main background region.

Three experiments are presented to prove the validity of the prososed algorithm. To start with, the results are reviewed visually by specialists using a qualitative evaluation and a comparative study. Then, a more objective study will is introduced based on the analyzis of the diffracting patterns of the one-stacked and bistacked regions. Finally, the robustness of the method is tested on another series of experiments and concisely discussed.

5.1. Performances Evaluation by Specialists

The performances have been evaluated at two levels: the capacity of selecting interesting targets for high magnification assessment, and the stacking level selection itself.

To qualitatively assess the stacking levels, regions of segmented membranes images were manually labeled as “Bkg” for background, “Level 1” for nonstacked membranes, “Level 2” for bistacked membranes, “Level 3” and “Level 4” for overstacked membranes, according to the color code presented in Figure 5.

Considering the complexity of the images, the qualitative comparison leads to a good correspondence between the manual and automatic classification (see Figure 8): 68% of the foreground regions have been similarly classified, and 28% of the remaining regions were classified in the next upper or lower class (meaning a small error). We underline that the 68% of good classification represent 97% of the pixels: therefore, the differences in classification mainly occur on the small regions, of minor interest to the biologist. One can therefore conclude that the automatic classification is very similar to the manual expert one, especially for the large regions, which are the regions of interest to the biologist: large nonstacked membranes.

The analysis of the results on representative 2D-crystalline membrane images confirmed that the proposed method is suitable to consider that large nonstacked regions constitute interesting targets. When comparing a visual selection of regions of interest with the stacking level selection, it can been concluded the following. (i)The nonstacked regions constitute mono-layer membranes. It has been assessed that around 90% of the manual selections would lay in the large nonstacked regions (this number varies from experiment to experiment from 70 to 100%). (ii)The intermediate bistacked regions constitute regions that appear twice more stacked than 1-stacked regions, or 1-stacked regions that are overstained. Those intermediate regions are often difficult to classify and may constitute interesting regions in less than 10% of the cases. (iii)The overstacked regions appear to be regions where the membranes are very superposed, and that hardly any manually selected targets appear in those regions (less than 1% of the selections).

5.2. Objective Validation of the Method

A methodology based on the diffraction patterns of the membrane crystals has been devised to achieve an objective evaluation of the stacking level approach. The aproach is based on the consideration that mono-layer crystals display a single diffraction pattern at high magnification, while bilayer crystals display a double diffraction pattern, one for each layer, under certain conditions. One condition is that the crystal layers are not oriented in the same direction, and the second is that the negative stain, which preserves the samples and allows the visualization of the diffraction pattern by enhancing the contrast, is deposited on both the upper and lower layers of the membranes.

By carefully selecting a good number of such samples, the none-stacked and bistacked levels have been clearly validated, as is shown in the example presented in Figure 9. Proteins used for this experiment lead to an hexagonal diffraction pattern visible once (first order, i.e. the first ring of the diffraction pattern) on mono-layer crystals (well labeled as nonstacked, Figure 9, pattern A, C, and D), and visible as two shifted hexagonal patterns, the first in red and the second in green, on bilayer crystals (well labeled as bistacked, Figure 9, pattern B and E).

5.3. Robustness Overview

The robustness of the stacking level algorithm has been evaluated by comparing its results on several images of the same object acquired in similar conditions. It was tested (influence of the noise) by comparing its results on several images of the same object acquired with different conditions (exposure time).

The statistical repartition of the noise seems of negligible impact on the results (Figure 10): 95% of the regions representing more than 99% of the pixels are similarly labeled for the different acquisitions.

Similarly, the exposure times tested on our images (from 0.1 to 1 s) appeared to have a low impact on the results of the labeling (Figure 11): 63% of the regions representing more than 97% of the pixels are similarly labeled for the different acquisitions. As expected, these results show again that the smaller regions are more affected. The labeling of the larger regions, where the gray level statistics tend to vary less from acquisition to acquisition, is robust.

6. Conclusion

In this paper, we propose an algorithm to characterize the stacking level of 2D crystalline membranes acquired in TEM. The goal of this automatic method is twofold: first, assist the microscopist in his analysis and interpretation of thousands of images; second, contribute to the automation of a TEM, one-stacked membranes being interesting biological targets.

This algorithm is applied on images where the foreground has been partially identified and partitioned into coherent regions (adjacent objects are separated). The two-step iterative method has two main advantages: first, stacking level thresholds are computed, and therefore adapted, for each image and does not require any manual input to determine the number of stacking levels; second, the proposed local approach handles the different sources of heterogeneities of the background and allows therefore the algorithm to successfully label most of the regions.

The qualitative and quantitative tests performed show the robustness of the method, and its ability to assess the stacking levels of the regions. It has been applied on thousands of images acquired under different conditions. Furthermore, due to those good performances the algorithm has been integrated in an online process realizing the detection of targets in a TEM control process.

Finally, it should be emphasized that the proposed approach is not limited to the analysis of 2D crystalline membranes, but could be applied to any thin biological samples acquired in TEM.

Acknowledgments

This work was supported by the EU 6th framework (HT3DEM, LSHG-CT-2005-018811), in collaboration with the Biozentrum of Basel and FEI company who provided the TEM images.