Abstract

In the field of noninvasive sensing techniques for civil infrastructures monitoring, this paper addresses the problem of crack detection, in the surface of the French national roads, by automatic analysis of optical images. The first contribution is a state of the art of the image-processing tools applied to civil engineering. The second contribution is about fine-defect detection in pavement surface. The approach is based on a multi-scale extraction and a Markovian segmentation. Third, an evaluation and comparison protocol which has been designed for evaluating this difficult task—the road pavement crack detection—is introduced. Finally, the proposed method is validated, analysed, and compared to a detection approach based on morphological tools.

1. Introduction

The evaluation of road quality is an important task in many countries, like in France, where the national roads are inspected each three years in order to estimate the needed reparations. To estimate the quality, these aspects can be taken into account: the adherence, the microtexture, the macrotexture, and the surface degradations. Before 1980, all these inspections were accomplished manually. Since 1980, this task can be automated with noninvasive techniques to be more comfortable, less dangerous for employees and users of the road but also more efficient and less expensive than manual methods. Many systems have been proposed, based on ground-penetrating radar [1] or laser system [2]. However, for noninvasive evaluation of surface degradations, the recent research results seem more promising with optical image-processing approaches for these reasons [3]:(1)The acquisition systems based on optical devices are easier to design and to use than other kinds of systems (they are less sensitive to movement and to vibrations than other systems). (2)They also allow a dense acquisition (each millimeter), that is, the acquisition can be realized for the whole road surface, whereas for the other systems, like laser, the measurements are available every 4 millimeters at normal speed (90 km/h).1(3)The measurement of the defects is more precise than with other systems because, as explained in (2), enough information is available. (4)Even if the images are not always well contrasted, they are more contrasted than the images/signals that can be given by other devices, that is, the ratio between noise and signal is greater with optical sensor than with other kind of sensors.

Nowadays, many acquisition systems are available [3, 4], see Table 1 (interested readers can find details about the evaluation of such systems in [5, 6]). Moreover, to the best of our knowledge, many methods for semiautomatic detection of road defects can be found in the literature but only one is commercialized (by INO2). Among all the existing approaches, it is difficult to know which one is the most adapted to the task and what is the actual method that is the most used. This is why the first goal of this paper is to present a state of the art of assessment methods in noninvasive control based on image processing.

The detection of crack is difficult in the context of road surface evaluation because the signal to detect is weakly represented (1.5% of the whole image) and weakly contrasted (the road possesses a texture that hides the crack). Recent methods have shown their limits; the detection contains a lot of false detections (induced by the particular texture of the road), and the detection is not enough precise (the given result is a region of detection and not the skeleton with the width of the crack). The main default of the existing methods is the fact that the specific geometry of the crack—it is a thin and linear object—is not taken into account. In consequence, the second aim of this work is to introduce a new method that takes into account some geometric properties of the cracks.

Even if this problem is hard and very important in the field of civil engineering, as far as we are concerned, there is no protocol for evaluating and for comparing existing methods, and it is difficult to know what kind of methods has to be chosen for this task. In consequence, with the multiple methods proposed in the literature, it seems important to evaluate and to compare the various methods in order to validate previous work and to identify the approaches that can be employed and/or the methods that need improvements. So, the third aspect discussed in this paper is the introduction of such a protocol.

In consequence, the objectives are as follows: first, to give a state of the art of the existing methods in noninvasive control based on image processing for estimating the quality of the road surface, second, to present our method, and, third, to introduce a protocol of evaluation and comparison that allows to highlight the advantages and drawbacks of each method.

2. Automatic Road Crack Detection

In the literature, many methods have been introduced to detect thin objects in textured images, like that in medical imagery, for the detection of blood vessels [27], and satellite imagery, for road network detection [28]. Since 1990, algorithms have been proposed for the semiautomatic detection of road cracks (interested readers can see [29] for details about road imaging systems and their limits). For the detection of cracks, three components have to be taken into account (1)acquisition (see Table 2 for details),(2)storage, and (3)image processing.

In this paper, only the last step is studied, but the choices for the two first steps are important for the success of the image treatment. Moreover, most of the references are given in the field of road quality assessment, but some of them come from different applications, like cracks and defects in concrete (for bridges or pipelines), on ceramics or on metallic surfaces (for industrial applications). For road cracks, most of the time, these hypotheses can be exploited. (1)Photometric hypotheses()The crack pixels are darker than the road pixels. ()The gray-level distributions of road crack and road surface are independent. (2)Geometric hypotheses()A crack is a thin continuous object. ()A crack is a set of connected segments with different orientations. ()A crack does not have a constant width on the whole length. (3)Photometric and geometric hypotheses()The points inside a crack can be considered as points of interest, from a photometric and/or a geometric point of view.

These different hypotheses can be complementary, like () and () or () and (), but some of them are opposite, like () and (). The hypothesis () combines two kinds of constraint because the definition of a point of interest (POI), that is, a significant point in a scene, can be expressed both with photometric constraints (some hypotheses about the distribution of gray levels near POI can be made) and geometric constraints (a point of interest can be a corner, an edge, or any kind of geometric structure).

In the field of image processing, the semiautomatic methods and the automatic detection approaches are considered, and these five families can be distinguished, see Table 3. (1)Based on histogram analysis (hypotheses and ), these methods are the most ancient and the most popular. They use a thresholding based on an histogram analysis [7, 30, 31], with Gaussian hypotheses [9] and/or an adaptive or a local thresholding [32, 33]. These approaches are simple and not time consuming, but they also give many false detections. In fact, these methods assume that the two gray level distributions (the road pavement distribution and the crack distribution) can be separated based on a global level statistics (histogram3). In Figure 1, we can see that most of the time, this hypothesis is not valid.(2)Based on mathematical morphological tools [15, 3338] (hypotheses and ), an initial thresholding is needed, and the results contain less false detections than methods based on histogram analysis. However, the major drawback of this kind of techniques is that the quality of the results is highly dependent on the parameter choices.(3)Based on a learning phase in order to alleviate the problems of the two first groups of methods [39, 40] (hypotheses and ), most of the approaches are based on neural networks [8, 41, 42]. The drawback is the learning step that cannot allow a fast and fully automatic analysis. (4)Based on filtering, the most recent ones (hypotheses , , and ). Edge extraction by filtering with fixed scale is not adapted to the task of the detection of road cracks because the width of the crack is not constant, and this is why many methods are based on wavelet detections [17, 25, 43, 44] with adaptive filtering [27, 45, 46] (these approaches will be detailed in the Section 4), contourlets [47], Gabor’s filters [48], finite impulse response filter (FIR) [26], and methods using models based on partial differential equations (PDE) [49, 50]. Some techniques also use autocorrelation filtering [51, 52] (a similarity score is estimated between some targets that simulate cracks and all the targets of the original image). An other kind of algorithms is based on texture analysis [53, 54] (the crack is considered as a noise inside a texture). (5)Based on an analysis of a model [55, 56] (hypotheses , , and ). Most of these approaches are based on a local analysis versus a global analysis in order to take into account the local properties and the global properties of a crack, by multiscale analysis of texture combined with an algorithm of minimal path [55] or by local detection of points of interest combined with geodesic contours [56].

In conclusion, we can notice the following. (i)Many methods have been proposed, but the problem is not still solved. The results contain many false positives, and the detections are incomplete. Moreover, most of the existing techniques can give interesting results for only one given class of road pavement, that is, the performance of the method is dependent on the road texture. (ii)Methods based on histogram study, even those that are local, do not express correctly the problem, that is, they do not take into account geometric characteristics of the cracks and photometric characteristics of the road pavement. (iii)Learning methods are efficient,s but the learning step is expensive (the time and the investments from the users that are not expert in image processing).

For all these reasons, even if learning methods have been used in our previous work, this paper focuses on the presentation of two methods proposed to alleviate the limits of the old ones by obtaining a dense detection with a low rate of false detections.

3. Proposed and Compared Methods

Before introducing the proposed method, we briefly present the preliminary works that motivate and justify our proposition. First of all, a neuron-based method has been tested [13], on the real images of size presented in Section 4.2. Results are interesting, but learning methods are not easy to use for a nonspecialist in image processing, and the users have to spend a lot of time for setting the parameters and for building the database for the learning step before using the method. The main goal is to propose a system that facilitates the work of users and not a system that induces a lost time by including a learning phase and a maintenance each year in order to maintain the performances of the system.4 In consequence, we have now focused our work on methods that allow automatic processing, and, in particular, we present two approaches as follows. (1)The first, Morph, belongs to the families (1) and (2) because it combines thresholding and refinement by morphological analysis. (2)The second, GaMM, of families (4) and (5), is based on the advantage of multiscale analysis and local modelling of the crack.

Morph has been proposed before GaMM and is quite equivalent to the method presented in [15]. The contributions of this section are about GaMM; we propose a new model for the sites and the potentials used in the Markovian model. The advantages of this new method will be illustrated with qualitative and quantitative results in Section 5.

3.1. Morphological Method (Morph)

The chosen approach is based on hypotheses , , and , and it follows these steps. (1)Preprocessing of the images: to reduce the influence of the texture and to increase the contrast between the road pavement and the crack. (2)Binarization by thresholding (the threshold is different in the various variant, and a local threshold can be used). (3)Refinement by closing. (4)Segmentation with shape analysis. (5)Extraction of the characteristics of the cracks.

For step (1), three variants are developed, based on the combination of these local tools: an erosion in gray levels, a conditional median filtering, a histogram equalization, a mean filtering (these preprocessings are detailed in Section 5.1). The step (4) is realised in two stages. First, a labeling by analysis of the connected components is realized. Second, the size and the shape of each component is determined in order to remove components which have a shape that is not similar to a crack; the shape of a crack has to be a thin object. It induces constraints on the width and the height of the component. More precisely, from an expert point of view, a crack is not significant if  cm, but we can suppose that we manage to detect only a small part of the crack, and this constraint becomes  mm. Moreover, the mean width and the maximal width of the crack have to respect these rules:  mm and  mm. All these thresholds are empirically set. In Figure 2, we illustrate the kind of results obtained at each step for the 3 variants. The next step of the method Morph merges the 3 results (with a weighted sum, and the weights are chosen with a learning phase). The final stage refines the result by computing the closing in gray levels of the fusion result.

3.2. Adaptive Filtering and Markovian Modelling (GaMM)

More recently, our work focused on the field of wavelet decomposition. As it is difficult to chose the mother wavelet5 well adapted to the detection of road cracks, the adaptive filter theory seems convenient, and, in particular, it allows to build a mother wavelet adapted to our task. We present the first step of the algorithm based on adaptive filtering (hypotheses and ) and the second stage on Markovian segmentation that can take into account the particular geometry of the crack ( and ).

3.2.1. Algorithm

The goal of this algorithm, presented in Algorithm 1, is to obtain, step (1), a binarization (black pixels for the background and white pixels for the cracks) and a refinement of this detection by using a Markovian segmentation, step (2). Using adaptive filtering is important in order to allow the detection of the crack with nonconstant width (hypothesis ).6 The number of scales for the adaptive filtering has to be chosen and depends on the resolution of the image. By supposing a resolution of 1 mm per pixel, by choosing 5 scales, a crack with a width from 2 mm to 1 cm can be detected. Moreover, the number of directions (for the filtering) also has to be chosen, and it seems natural to take these four directions: that correspond to the four usual directions used for crack classification. The adaptive filtering is applied in each scale, each direction, and then all the results are merged on each scale (mean of the coefficients). The results of this filtering is the initialization of the Markovian segmentation step.

Input
Road images
Initialization
Number of scales and angles
Steps
(1) For each scale do
(1a) For each direction do
  Estimate Adaptive Filter (AF)
(1b) Merge AF in all the directions
(2) For each scale do
(2a) Initialization of the sites (Markov)
(2b) While not (stop condition) do
  Updating of the sites
(3) Fusion of the results on each scale

3.2.2. Adaptive Filtering

Some details are provided in order to realize step (1a) and (1b) in Algorithm 1. The 7 function is a wavelet if where is the Fourier transform of . Equation (1) induces that . The wavelet family is defined for each scale and for each position , by where is a rotation of angle .

One of the main difficulties for applying a wavelet decomposition is the choice of the mother wavelet . Numerous functions are used in the literature: the Haar wavelet, the Gaussian derivatives, the Mexican hat filter, the Morlet wavelet. It is very hard to determine which one is the best for a given application. In the case of crack detection, two elements are present: the crack (if there is a crack) and the background (the road surface can be viewed as a repetitive texture). The goal of the crack detection is to recognize a signal (its shape is known up to a factor) mixed with a noise whose characteristics are known. Consequently, adaptive filtering is well designed for the problem: extracting singularities in coefficients estimated by a wavelet transform. If is a discrete and deterministic signal with , the number of samples, and is a noisy observation of , is supposed to be an additive noise: . The main hypothesis is that this second-order noise is centered and stationary, with the autocorrelation function of terms , independent of the signal . The adaptive filter of is defined by The crack signal depends on the definition of the crack. In this paper, like in most of the papers of this domain, crack pixels correspond to black pixels surrounded by background pixels (road pixels). This is why, in [46], a crack is a piecewise constant function , defined for each position by: where the factor and the threshold have to be determined. It does not correspond to a realistic representation of the crack. Because of subsampling, lights, and orientation of the camera, the signal is more like a Gaussian function with zero mean where is the size of the crack and depends on , the deviation of the Gaussian law, that is, . Consequently, the term allows to fix the width of the crack (like threshold in (4)). Finally, for step (1), is estimated for each of the 5 scales, as explained in the beginning of Section 3.2.1, and is interpolated in order to have the same size. Then the filter is rotated in order to cover the 4 orientations.

3.2.3. Segmentation

The goal of this part is to extract shapes, that is, cracks, using the detection maps estimated at the first stage of the algorithm (step (2a) of Algorithm 1). For the first step of segmentation (initialization), the sites are of size , consequently, a regular grid is considered in the image. In [46], four configurations are possible and represented in Figure 3 (the part inside the rectangle with low gray levels). The initialization of the sites is based on the configuration that maximizes the coefficients obtained with the adaptive filtering. More formally, if we denoted , , , and , the four configurations, the best configuration is: where is the mean of the coefficients on the considered configuration . These four configurations do not represent all the possibilities and are not realistic configurations. In fact, all these four configurations are centered, whereas it is possible to have some noncentered configurations. Consequently, we use the set of sixteen configurations illustrated in Figure 3 (all the presented sites). By modifying the number of configurations, we need to adapt the initialization of sites, and (6) becomes where is the mean of the coefficients on the considered configuration .

The image is considered as a finite set of sites denoted . For each site, the neighborhood is defined by: . A clique is defined as a subset of sites in whose every pair of distinct sites are neighbors.

These random fields are considered as follows. (1)The observation field with . Here, is the mean of the coefficients on the site. (2)The descriptor field with . If there is a crack elsewhere .

At each iteration, a global cost, or a sum of potentials that depends on the values of the sites and the links between neighborhoods, is updated. This global cost takes into account the coefficients of the sites (computed from the coefficients estimated during the first part of the algorithm: adaptive filtering) and the configurations of each site and its neighbor sites (the 8 neighbors). More formally, the global cost is the sum of all the potential functions of the sites. This potential function contains two terms as follows. The first term, , corresponds to the data term, and it evaluates how a site is similar to a crack from a photometric point of view (hypotheses and ). This term is based on the results given by the adaptive filtering. The second term, , represents the constraints induced by the neighbors of the site. More precisely, it estimates the consistency between a site and each neighbor site, and it takes into account the geometric hypotheses and . The choice of the value depends on the importance of each part of (8), and it will be discussed in Section 5.1.1.

The function is given by, The parameters , , and have to be fixed.8 For the definition of , we have to determine the number of cliques. In [46], 4 cliques are possible and the 8-connexity is considered. The potential function proposed in the precedent work only considers the difference of orientations between two neighborhoods and not the position between the two sites of the clique, see Table 4. Some cases are not penalized with the old configuration. For example, these two unfavorable cases are not penalized as follows: (i)two sites with the same orientation but with no connection between them, (ii)two sites with the same orientation, but their position makes them parallel.

This is why, with the sixteen configurations that are presented in Figure 3, the potential has to take into account the differences of orientations between two sites (there are possibilities) and the position of the two sites (there is 8 possibilities because we consider the 8 neighbors). Consequently, the new potential function follows these two important rules. The lower the difference of orientations between two sites, the lower the potential. The lower the distance between two sites, the lower the potential (in this case, the distance means the minimal distance between the extremities of the two segments).

More formally, if (i) denotes the Euclidean distance between the two closest extremities of the sites, with ,9(ii) and are the orientations of, respectively, , and where, , respectively , is the pixel of the , respectively , pixels that composes the site , respectively , (iii) is the angle between the two sites; the function is defined by where NbC indicates the number of connected pixels between the two sites and , and equals 1 if and 0 elsewhere. The first term is induced by the rule about the orientations, . This term equals zero when the sites have the same orientation, and this orientation is the same as the orientation between the sites, that is, . This first term penalizes the configurations where the sites do not have the same orientation but also the particular case where they are parallel, see example (a) in Figure 4. The second term and the third term express the rule about the distances. Two aspects have to be distinguished: the number of connected pixels, when the sites are connected, and, on the contrary, that is, when the sites are not connected, the distance between the sites. It allows to give low influence at disconnected sites and also to increase the cost of sites that are parallel but connected, see example (b) in Figure 4. To study the influence of all these terms, the equation has been normalized, and the different terms have been weighted (using , the choices for will be discussed in Section 5.1.1).

4. Evaluation Protocol

For the evaluation of automatic crack detection by image processing methods, to the best of our knowledge, no evaluation and comparison protocol has been proposed in the community. However, in all the countries, for estimating the quality of the road surface, it is important to know exactly the size and the width of defects, that is, to detect precisely the defect. This is why it seems important to characterize quantitatively the performances of the methods. For building this kind of protocol, it is necessary, first, to choose the tested images, second, to choose how to build reference segmentations, and, third, to determine the criteria used for the quantitative analysis. For estimating the reference segmentations, two approaches can be used (1)To compute synthetic images with synthetic defects. The exact position of the defects is known, and the reference segmentations can be considered as ground truth. (2)To propose reliable segmentations of real images. It supposes that we are able to provide a segmentation that is reliable enough to be employed as a reference. For evaluation, these segmentations can be called “pseudoground truth.”

The two solutions are studied, and, we explain how the manual segmentations (that are our references) are computed. Before, we briefly describe the acquisition system.

4.1. Acquisition

The acquisition system used for the dataset of our experiments is described in Figure 5. It contains 4 video cameras with 3 sensors in gray levels in the backside of the car and 1 color in front of the car. The first camera is needed to determine the environment conditions (weather, location, traffic) whereas the three other ones are used for the crack detection. The resolution of this one is smaller than the 3 others, and; moreover, the optical axis is not perpendicular to the road surfaces, on the contrary of the 3 others. The 3 cameras have been physically synchronized directly during the acquisition. To be independent of the illumination problems, nine stroboscopic lights have been added. The position of the lights is perpendicular to the road plane, and they are distant from the surface of 1 meter. The light power has been chosen in order to not deteriorate the visualisation of the road pavement and the defects.

4.2. Reference Images

The most difficult is to propose images with a reference segmentation. On the first hand, we introduce synthetic images with a simulated crack (the size of these images is ).10 As shown in Figure 6, the result is not realistic enough. It does not seem realistic because the contrast is too important between the road and the crack. Moreover, the interruptions of the crack, the changes in the direction, the presence of many paths, and so forth, in the default, are not simulated. In order to be more realistic, it seems that we have to design and to implement a complex heuristic to simulate the crack, and it represents too much effort for having only a synthetic default. This is why, on the other hand, we have simulated different defects on real images that previously contain no defect (the size of these images are and ). The result is more realistic, but the shape and the photometric aspect of the cracks (which are randomly chosen) does not seem realistic enough. This is why it appears important to propose a set of real images (size and ) with manual segmentations that are reliable enough to be considered as reference segmentations. To summarize, the two first kinds of images allow to propose an exact evaluation and to illustrate theoretically the behavior of the method whereas the last kind of images allows to validate the work on real images with a pseudoground truth.

4.3. Reference Segmentations

For real images, we briefly explain how the manual segmentations are validated. Four experts have manually segmented the images with the same tools11 and in the same conditions. Then, the four segmentations are merged, following these rules. (1)A pixel marked as a crack by more than two experts is considered as a crack pixel. (2)Every pixel marked as a crack and next to a pixel kept by step (1) or (2) is also considered as a crack.

The second rule is iterative and stops when no pixel is added. Then, the result is dilated with a squared structuring element of size . To evaluate the reliability of the reference segmentations, we estimate, first, the percentage of covering between each operator and, second, the mean distance, , between each pixel (detected by only one expert and not kept in the reference image) and the reference segmentation.

Table 5 shows some results for 5 of the 42 images manually segmented. We have distinguished 5 families: the first one contains images acquired in static whereas the four other ones are acquired in dynamic. Moreover, we have 4 different kinds of road pavement acquired in dynamic. The 5 images have been taken in order to show results on each of these families. We can notice that the first 2 images are the most reliable because the mean error is less than 2 pixels. The precision of these results is satisfactory. On the contrary, the last 3 images show the important variabilities between operators and how it is difficult to extract a segmentation for these images and, in particular, in the image 936, where the error is due to a bad interpretation of one of the four operators who finds a defect that does not exist.

By analyzing the results for the criterion , presented in Table 5, we can classify the 42 tested images in 3 categories, that is, images with the following. (1)A reliable segmentation: the criterion . It means that all the operators have built segmentations that are quite near to each other. (2)A segmentation that is moderately reliable: the criterion . It means that some parts of the crack are not easy to segment, and there are local errors. (3)An ambiguous segmentation: the criterion . It clearly shows that the images are difficult to segment, and, in most of the cases, it means that some parts are detected as a crack whereas they are not and reversely.

The threshold have been empirically chosen: and . In Figure 7, first, we present the mean distance, , , between the final reference segmentation, , and each manual segmentation, (obtained with each operator), and, second, the criterion for each real image of our protocol. The first graph illustrates how it is important to combine the four manual segmentations instead of using only one manual segmentation. Indeed, we can notice that each operator, alternately, gives an interpretation that is different from the three others. The second graph explains how the thresholds are chosen for determining the detections that are “accepted” for the evaluation, see Section 4.4 for explanations about accepted detections.

The three categories of reference segmentations are illustrated in Figure 8. Overall, the four segmentations are near to each other, and, if the segmentations are combined, it permits to detect the width of the crack. However, these examples also present some difficulties of the crack segmentation: areas where the cracks are less visible and regions where the texture elements that have the same size and/or the same gray levels as the crack pixels. Thus, in some cases, one operator extends the crack or gives a different shape. In some extreme cases, the operator can even confound a crack and another object of the scene (a piece of wood, e.g.). In another way, these examples highlight the interest in combining different segmentations in order to obtain reference segmentations as reliable as possible.

4.4. Criteria of Efficiency

In this section, we introduce how the reference segmentation and the estimated segmentation are compared. In Figure 9, we present common evaluation criteria that are used for segmentation evaluation: (1)the percentage of correct detections (true positives) (TP), (2)the percentage of false positives (FP), (3)the percentage of false negatives (FN), and (4)the similarity coefficient (DICE).

This last criterion seems to be the most significant because it evaluates the ratio between the FP and the FN, and it resumes the results of all the criteria. Moreover, it directly expresses what is important to evaluate: how the method can reduce errors of detection whilst increasing the density of good detections.

For real images, the detections that are “accepted” have been added in order to tolerate a small error on the localization of crack pixels. This criterion is needed because perfect detection seems, for the moment, difficult to reach, see the results in Table 5. In consequence, these accepted pixels have been included in the estimation of the similarity coefficient or DICE. The threshold for accepted detections equals 0 for synthetic images whereas it depends on the mean distances, see in Table 5, for the real images.

5. Experimental Results

In this section, two aspects are studied and presented: (1)the evaluation of the method based on an adaptive filtering and a Markovian modelling in order to characterize its behavior, to estimate the best parameters and to determine the best variant; (2)the comparison to the Morph method.

5.1. Adaptive Filtering and Markovian Modelling

We want to determine, first, how to fix the different parameters, second, the preprocessing steps that are necessary, and, finally, which variant is the most efficient. In consequence, these points have been studied. (i)Parameter values. The weights , (8), and , (10), are tested from 0 to 1 with a step of 0.1. (ii)Preprocessings. These preprocessings have been experimented to reduce noises induced by texture, to increase the contrast of the defect and to reduce the light halo in some images as follows. (1)Threshold. This preprocessing has been proposed in order to reduce the light halo in the last 4 images presented in Figure 6 and in all the images of the last four categories in Figure 7. Each pixel lower than a given threshold is replaced by the local average of the gray levels. (2)Smoothing. A mean filter of size is applied to reduce the granularity of the texture. (3)Erosion. An erosion (in gray level) with a square structuring element of size is also applied to reduce the granularity of the texture. (4)Restoration. It combines the advantages of all the previous methods in three steps: a histogram equalization, a thresholding (like Threshold), and an erosion (like Erosion).

In order to preserve the crack signal, each pixel under a given threshold is not filtered.12(iii)Algorithm variants. Four variants are compared as follows. (1)Init. This is the initial method proposed in [46]. (2)Gaus. This variant supposes that the distribution of the gray levels inside a crack follows a Gaussian function, see Section 3.2.2. (3)InMM. This is the initial version with an improvement of the Markovian modelling (new definition of the sites and of the potential function), see Section 3.2.3. (4)GaMM. This is the method Gaus with the new Markovian modelling. (iv)Comparison. We have compared this method with the method based on morphological tools and that is quite similar to [15], Morph.

5.1.1. Influence of Parameters

Among all the results, two conclusions can be done. (1)For each variant and each preprocessing, the weights between the term for adaptive filtering and the term for the Markovian modelling should be the same, see (8), that is, . However, when the weight given to adaptive filtering is the largest, the quality of the results is lower than that when it is the reverse. It means that in this kind of application, the geometric information is more reliable than the photometric information. It seems coherent with the difficulties induced by the acquisition. (2)For the Markovian modelling, we have noticed that the results are the best when the weights are the same between the orientation term and the distance term, see (10), that is, . However, better results are obtained when the weight of the orientation is greater than the weight of the distance one instead of the reverse. It means that the orientation characteristics are more reliable than the distance ones, and this remark is coherent with the fact that cracks present strong spatial constraints. Moreover, it is also linked with the difficulties induced by the acquisition (the lighting system makes the photometric information less reliable).

5.1.2. Preprocessing

These tests have been done with real images, because the synthetic images do not need preprocessings. These conclusions can be made for the needed preprocessing per method: (i)Init: restoration; (ii)Gaus: restoration; (iii)InMM: threshold; (iv)GaMM: erosion.

However, for the first dataset (acquired with lighting conditions more comfortable than the lighting conditions of the next 4 ones), the preprocessing is not significant for increasing the quality of the results. Moreover, with the new Markovian modelling, the preprocessing step does not significantly increase the quality of the results.

5.1.3. Variants

The results are presented for (1)synthetic images, (2)real images.

For the first category, the ground truth is available whereas, for the second category, a pseudoground truth is used and the detections which are accepted are taken into account in the evaluation, that is, a threshold is applied to the distance between the segmentation estimated by the evaluated method and the pseudoground truth segmentation. The thresholds applied on the distance for the accepted detections are determined with the results given in Table 5, column D.

In Figure 10, the evolution of the similarity coefficients, or DICE, is presented for the 11 synthetic images, in Figure 10(a), and 10 of the real images, in Figure 10(b). With synthetic images, the method GaMM is clearly the best for most of the images. However, for one image (the fifth), the results are worse than the results of the method Gaus, but they are still correct (DICE = 0.72). On the contrary, for the most difficult images (the 3 first ones that contain a real road background), the method GaMM obtains acceptable results (DICE > 0.5) whereas the other methods are not efficient at all. Illustrations are given in Figures 11 and 12; they show how the method GaMM can reduce false detections.

5.2. Results and Comparison with Morph

Finally, we have compared the results of GaMM on each of the complementary dataset (32 images) with Morph. The mean DICE is 0.6 with GaMM whereas it is 0.49 with Morph, see Figure 13. It shows how GaMM can outperform Morph. However, if we compare image per image, the results show that in 50% of the cases GaMM is the best, see illustrations of these results in Figures 14 and 15. More precisely, GaMM seems more efficient with ambiguous images, whereas Morph is the best with reliable images. Finally, we can also precise the execution time for the two methods: about 1 minute for GaMM and 5 seconds for Morph with a processor Intel core 2 duo of 2 GHz. These execution times give only some indications because the implementation, in particular for GaMM, has not been optimized.

6. Conclusions

In conclusion, this paper gives a review about image-processing methods for the crack detection of road pavement. It can help the researchers who want to choose and to adapt an auscultation method to the constraints of the transport structure that is studied (it depends on the quality of the surface, the needs of the auscultation). Moreover, a new method for the detection of road cracks has been introduced, and we have presented a new evaluation and comparison protocol for automatic detection of road cracks. As far as we are concerned, we proposed real images with ground truth for the first time in the community. Actually, this dataset is available on this website: http://perso.lcpc.fr/sylvie.chambon/FISSURES/. The new method, GaMM, has been validated by the proposed protocol and compared to a previous one, Morph. This evaluation shows the complementarity of the two methods: the Morph method obtains more true positives than the GaMM method whereas this one reduces the percentage of false positives.

Our first improvements of this work will focus on the evaluation and comparison protocol. We want to increase our data set by taking into account the different qualities of road surface or road texture (because for the moment, each proposed method seems very dependent on the quality of the road texture). In a second step, our future work will include new experiments about the acquisition system. Indeed, the acquisitions, and the results obtained with the acquisition system presented have shown its limits for example, in Figure 6, some parts of the crack are not “visible.” It comes from the fact that, to highlight the crack, it depends on the orientation of the lights and of the sensors. Using one single sensor and one light always in the same position/orientation, we can sometimes miss some defaults in the acquisition. So, it seems important to study other kinds of systems to improve the quality of the automatic treatments. Finally, we want to improve the GaMM method by adding the extraction of the crack characteristics, like that in Morph.

Endnotes

  1. It has been determined from the most recent systems.
  2. http://www.ino.ca/fr-CA/Realisations/Description/project-p/systeme-laser-mesure-fissures.html.
  3. This separation would be possible based on local statistical analysis around the crack.
  4. This maintenance is necessary because the conditions and the systems of acquisition can changed every year, and the road pavement also evolves.
  5. It is useful for generating the wavelet family for multiscale analysis.
  6. This hypothesis is realistic for this application.
  7. is the square integrable space.
  8. The choice of is related to the maximal number of pixels that belong to a crack (it depends on the resolution of images and hypothesis about the size and the configuration of the cracks). We have chosen in order to consider at most 5% of the image as a crack. Moreover, our experiments have brought us to take = = 100.
  9. As the sites are of size , .
  10. The road is a random texture, that is, each intensity is randomly chosen by supposing a uniform distribution of intensities in . Then, the user gives the position of the beginning, the length, and the orientation (vertical, horizontal, or oblique) of the crack. The crack points are built by randomly selecting the next point in the neighborhood and the intensity in .
  11. We use a “home-made” software that proposes an interface that helps the person to segment the default. The principle is that the user has to select points on the crack. These points have to be close enough (from 5 to 20 pixels of distance). Then, the path between two close points is automatically detected by using a simple heuristic; the path that minimizes the mean intensity is selected. The interface is complete enough to allow the displacement of the points, and the removing of some points, the removing of some cracks. The user can also select the width of the path (crack). Some filters are also proposed to improve the contrast between the crack and the road in order to help the user.
  12. Experimentally, this threshold equals 40.