#### Abstract

3D measurement and reconstruction of rigid microstructures rely on the precise matches of structural feature points (SFPs) between microscopic images. However, most of the existing algorithms fail in extracting and tracking at microscale due to the poor quality of the microscopic images. This paper presents a novel framework for extracting and matching SFPs in microscopic images under two stereo microscopic imaging modes in our system, that is, fixed-positioning stereo and continuous-rotational stereo modes, respectively. A 4-DOF (degree of freedom) micro visual measurement system is developed for 3D projective structural measurement of rigid microstructures using the SFPs obtained from microscopic images by the proposed framework. Under the fixed-positioning stereo mode, a similarity-pictorial structure algorithm is designed to preserve the illumination invariance in SFPs matching, while a method based on particle filter with affine transformation is developed for accurate tracking of multiple SFPs in image sequence under the continuous-rotational stereo mode. The experimental results demonstrate that the problems of visual distortion, illumination variability, and irregular motion estimation in micro visual measurement process can be effectively resolved by the proposed framework.

#### 1. Introduction

Micro stereovision technology makes it possible to achieve 2D/3D information extraction and 3D reconstruction. It has been extensively used in micromanipulation, microassembly, microrobot navigation and bioengineering, and so forth. 3D reconstruction of the small objects in microscopic image is a challenging work according to the small sizes. Existing methods as structured-light-based 3D reconstruction [1, 2], and binocular stereo methods [3]. are not suitable directly for micro applications and then definitely need to be modified. For rigid 3D microstructures, the accurate matching of structural feature points between microscopic images is one of the most important steps in micro stereovision computation. In this paper we define the structural feature points (SFPs) as those key points that can fully represent the rigid microstructures, which are those intersection points of 3 planes on the micro 3D structure (Figure 1(a)). Although the vision method based on tracking of feature points plays an important role in normal-scale computer vision applications as 3D reconstruction, segmentation, and recognition, there is little research focused on the microscale applications except for medical applications. This is due to poor contrast and worse quality of microscopic image mainly resulting from the imaging complexity of optical microscope. The complex structure of microscopic lens system [4] requires the variety of optical elements that can introduce a wide array of image distortions.

**(a) SFPs**

**(b) Harris**

**(c) Harris-affine**

**(d) SIFT**

In this paper we present a framework for extracting and matching SFPs in microscopic images for dealing with *fixed-positioning and continuous-rotational* stereo imaging modes, respectively. We believe that the proposed framework perfectly plays an important role in 3D reconstruction based on microscopic image.

*First*, since there exist some specific drawbacks in microscopic images as blurred edges, geometrical distortions, serious dispersion, and disturbance by noises (especially coming from illumination changes), which result in more troubles for SFPs’ detection and matching, many popular key point detecting and matching methods [5–9] failed to achieve similar accuracy in microscopic images (Figures 1(b)–1(d)) in comparison with those results obtained when employed on normal-scale images. New methods that are suitable for SFPs’ extracting and matching in microscopic images are highly required. We develop a novel illumination-invariant method based on similarity-pictorial structure algorithm (similarity-PS) to solve the SFPs’ extracting and matching problem in the *fixed-positioning stereo mode*.

*Second*, existing feature point tracking algorithms for continuous image sequences are usually classified as methods based on template, motion parameters, and color patch [10], none of them meet the needs of SFPs’ tracking in microscale images. The representative research achievements include: continuously adaptive mean shift (Camshift) [11] taking color histogram as object mode in rich-color images for the tracking tasks, which performs poorly when tracking the feature points in complicated background with areas of similar color. Yao and Chellappa first designed a probabilistic data association filter with extended Kalman filter (EKF) [12] to estimate the rotational motion and continuously track the feature points in frames, which effectively resolved occlusion problem. However, the real location must be predicted by probability analysis during arbitrary moving process of the object.

Buchanan proposed a combining local and global motion models with Kanade-Lucas-Tomasi (KLT) tracker to accurately track multiple feature points of nonrigid moving objects [13]. But if motion predictions cannot be made for the subsequences consisting of the initial frames, this strategy must fail. Kwon et al. presented a template tracking approach based on applying the particle filtering (PF) algorithm on the affine group [14, 15], which can accurately track a large single object, but it performs well only in single template tracking. We extended their method to the multiple-points tracking case in the proposed monocular microvision system in the *continuous-rotational stereo mode*. Since the images of the measuring rigid micro structure keep fixed spatial relationship of global structures and local features affine invariabilities during the rotating process, the tracking problem is greatly simplified with affine transformation and covariance descriptor in our framework.

#### 2. Proposed Framework

##### 2.1. Imaging Modes of Proposed Micro Stereoscopic Measurement System

We developed a 4-DOF micro stereoscopic measurement system for 3D micro objects measurement. The system consists of a 4-DOF stage, a fixedmounted illumination and a SLM-based optical imaging system. A monocular tube microscope is used in the system for reducing the imaging complexity. In our system the stereoscopic imaging relationship can be realized in two modes.

*
(i) Fixed-Positioning Stereo Mode ( Mode 1) *

The rotational transform by a fixed rotation angle is performed, followed by a tilting motion. Images are captured at the end of each movement.

*Problem*: The changes of illumination direction bring huge contrast and intensity changes to the microscopic images captured at different position.

*
(ii) Continuous-Rotational Stereo Mode ( Mode 2) *

Tilting movement is performed firstly, followed by a continuously rotational movement. Image sequences are captured during the rotation process.

*Problem*:The motion blur caused by continuous rotational motion will decrease the quality of microscopic image sequences.

##### 2.2. Similarity-PS Method for SFPs Matching of Mode 1

###### 2.2.1. Pictorial Structure Method

A PS method is represented by a collection of parts which has spatial relationships between certain pairs. The PS model can be expressed by a graph , where the vertices correspond to the parts and present the edge for each pair of connected parts and . An object is expressed by a configuration where represents the location for each part . For each part , the appearance match cost function represents how well the part matches the image when placed at location . A simple pixel template matching is used for this cost function in [16]. The connections between the locations of parts present the structure match cost. The cost function represents how well the locations of and of agree with the object model. Therefore, the cost function for PS includes 2 parts (the appearance cost function and structure cost function). The best match of SFPs can be obtained by minimizing .

###### 2.2.2. The Local Self-Similarity Descriptor

Self-similarity descriptor is proposed by Buchanan and Fitzgibbon [13].

Figure 2 illustrates the procedure for generating the self-similarity descriptor centered at with a large image. is a pixel in the input image.

The green square region is a small image patch (typically centered at . The lager blue square region is a big image region (typically ) centered with , too. First, the small image patch is compared with a larger image region using sum of square differences (SSDs). The CIE color space transforming is needed for color images. Second, the correlation surface is normalized to eliminate illumination influences. Finally, the normalized correlation surface is transformed into a “correlation surface”
where is the normalized correlation surface and var_{noise} is a constant number that corresponds to acceptable photometric variations (in color, illumination or due to noise, which is 150 in the paper). is the maximal variance of the difference of all patches within a very small neighborhood of (of radius 1) relative to the patch centered at .

The correlation surface is then transformed into log-polar coordinates centered at and portioned into bins ( angles, radius). We choose the maximal value in every bin (it can help the descriptor to adapt to nonrigid deformation). We choose all the maximal values to form an vector as a self-similarity descriptor centered at . Finally, this descriptor vector is normalized to the range by linearly stretching its values.

###### 2.2.3. Similarity-PS Algorithm

We introduce the local “self-similarity” descriptor-matching approach for SFPs’ detection into a simplified PS model. This subsection covers 3 main steps as follows

*Extraction of the “Template” Description T*

Manually marked SFPs’ coordinates on the micro structure are used for the training process. We describe the marked points by self-similarity descriptors. For every point, the average values of all the examples of descriptor as a trained descriptor are calculated, acting as the appearance description for the model.

*Dense Computation of the Local Self-Similarity Descriptor*

These descriptors are computed throughout the tested image *I* with 2 pixels apart from each other in this paper. The higher precision will be obtained if the searching process covers every pixel.

*Detection of Similar Descriptors of T within I*

The region centered at SFPs is chosen as the interesting region in , which has the smallest weighted Euclidean distance between self-similarity descriptor vector within and the one from the training descriptor. The coordinates of all interest points in are recorded as the locations for the candidate key points. We usually find out many candidate key points for one marked point. In our experiment there are 200 points being chosen (but it varies for different templates), and their Euclidean distances between the candidate key points descriptors and the trained descriptors are recorded as which are linearly normalized to the range of . Therefore the appearance cost function is determined. We substituted the appearance model in (2.1) with the obtained self-similarity model; then, the best-matching SFPs for the PS model can be obtained by minimizing .

Since the microscopic images are not always of the same size, to satisfy the scale invariance, we calculate the self-similarity descriptors at multiple scales both on the patterns and testing images. Moreover, the scale-invariant structures are obtained by multiplying a scale factor to all the mean and variance value of structural distances of SFPs.

##### 2.3. PF-Based Multiple SFPs Tracking of Mode 2

###### 2.3.1. The Affine Motion of Tracking Points

A single tracked point , representing the number of tracked points and frames, resp.) cannot satisfy the accurate tracking in micro sequences

therefore it is necessary to take advantage of templates. Initializing multiple points with a set of given center coordinates , a set of small regions point templates are denoted as a group of windows which surround each tracked point for describing their main characteristics. The coordinates of each are obtained via multiplication, the homogeneous coordinates with affine transformation matrix so the most important step in tracking process is to calculate the state , for all point templates in each frame. An affine motion can be divided into a translation and a rotation all the points of a rigid object are rotating with same angular velocity. The translation vectors are denoted by , and the affine motion action of a point can be written as and where is a linear function representing the affine motion of points , is an invertiblerotation matrix, and , is a translation vector . The second form of (2.1) using the exponential map is for the convenience of denoting the affine transformation of the imaging process.

###### 2.3.2. Tracking by Particle Filter with Affine Motion

*State estimate and sample for tracked points in each frame*: the efficiency of the particle filter tracking algorithm mainly relies on the importance of random sampling and calculate the weights . The measurement likelihood is independent of the current state particles . The measurement state equation can be expressed in the discrete setting as
where v_{t} is a measurement zero-mean Gaussian noise. is the AR process parameter. *τ*_{t} is zero-mean Gaussian noise. represents obtaining 12 frames per second. The affine transformation basis elements denoting the templates have produced translation, shearing or scaling transformation.

It is necessary to approximate the particles and resample according to their weights at every timestep. To optimize the computational procedure and avoid directly calculating, all the resample particles are expected to be quite similar to each other. Denote as the arithmetic mean of , the sample mean of can be approximated as
*Calculate covariance with descriptor for the tracked point templates*: the spatial structure constraint vector is added in our method for minimizing the influence of illumination, distortion, motion-blur, and noise interference. Here represent the distance and the angle between origin (or polar axis) and each tracked pixel in polar coordinate system. denotes the image pixel intensity. , , , represent the first- and second-order image derivatives in the Cartesian coordinates system [14, 16]. is the size of template window is the mean value of . The covariance descriptor of the point templates patches can be given as
*Measure relative distance*: for ensuring the covariance descriptors are changing successively, it is necessary to collect image covariance descriptors and calculate the principal eigenvector and the geodesic distance between two group elements . The measurement function is defined using the distance-from-feature space, distance-in-feature space, and similarity comparison purposes [18] then the measurement equation in [19] can be more explicitly expressed as
where c_{n} is the projection coefficient and *ρ*_{n} is the eigenvalues corresponding to principal eigenvector, and the two parameters are used to calculate the distance-in-feature space, and represents the point mean intensity. The measurement likelihood is described as
where is the covariance of zero-mean Gaussian noise . When the measurement likelihood is gained, we can calculate and normalize the importance weights for the tracked points and then realize multiple SFPs’ tracking in long microscopic sequences.

#### 3. Experimental Results and Discussion

The SFPs’ detecting/matching experiments are implemented on a standard microchip with 3D microstructures. Some of the results by both PS algorithm and proposed method are shown in Figure 3, while the contrast accumulated error measured by the Euclidean distance in pixels of the tracking points to the ground truth positions is shown in Figure 4. It proves that our method enhances the correct detecting rate of SFPs’ for images under large changed illumination in Mode 1. The reason for this improvement is that the changed gray value is largely caused by the varying illumination while the local structure keeps static. Therefore, it is a good choice to introduce the local self-similarity descriptors into PS for our application.

We also provide some SFPs’ tracking results in Mode 2 by our proposed method and KLT for comparison, as shown in Figure 5. The tracking results by the proposed method obviously show a higher localization accuracy of the SFPs’ in long micro image sequence.

For further demonstrating the effectiveness of our method, we present some of the 3D projective reconstruction results based on the tracking results in Figure 6. Obviously the results of the proposed method perfectly reconstruct the 3D structure from the microscopic sequences, while the KLT method failed.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

#### 4. Conclusions

This paper developed a framework of SFPs’ extracting and matching in microscopic images for 3D micro stereoscopic measurement in two stereo imaging modes. The proposed SFPs’ tracking framework ensures the illumination invariance and the robustness in the fixed-positioning stereo mode and continuous-rotational stereo mode, respectively. The effectiveness of our tracking framework has been empirically verified in visual 3D projective reconstruction with microscopic images. In Mode 2 of our system, there is an inevitable tracking error caused by motion blur. Therefore we plan to use the method described in [20] to deal with this problem in our future work. Our future research will also focus on the micro visual measurement planning (similar to the method described in [21]) and optimal 3D micro model representation [22].

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC-60870002, 60802087, 60573123, 60605013), NCET, and the Science and Technology Department. of Zhejiang Province (2009C21008, 2010R10006, Y1090592, Y1110688).