Warm Mix Asphalt (WMA) and Hot Mix Asphalt (HMA) are prepared at lower temperatures, making it more susceptible to moisture damage, which eventually leads to stripping due to the adhesion failure. Moreover, the assessment of the adhesion failure depends on the expertise of the investigator’s subjective visual assessment skills. Nowadays, image processing has gained popularity to address the inaccuracy of visual assessment. To attain high accuracy from image processing algorithms, the loss of pixels plays an essential role. In high-quality image samples, processing takes more execution time due to the greater resolution of the image. Therefore, the execution time of the image processing algorithm is also an essential aspect of quality. This manuscript proposes a parallel k means for image processing (PKIP) algorithm using multiprocessing and distributed computing to assess the adhesion failure in WMA and HMA samples subjected to three different moisture sensitivity tests (dry, one, and three freeze-thaw cycles) and fractured by indirect tensile test. For the proposed experiment, the number of clusters was chosen as ten (k = 10) based on k value and cost of k means function was computed to analyse the adhesion failure. The results showed that the PKIP algorithm decreases the execution time up to 30% to 46% if compared with the sequential k means algorithm when implemented using multiprocessing and distributed computing. In terms of results concerning adhesion failure, the WMA specimens subjected to a higher degree of moisture effect showed relatively lower adhesion failure compared to the Hot Mix Asphalt (HMA) samples when subjected to different levels of moisture sensitivity.

1. Introduction

The image processing method has been widely used as a nondestructive system to evaluate 2D or 3D geometry in numerous scientific fields [1]. In the field of civil engineering, image processing has been successfully applied in multiple applications such as pavement distress assessment, site evaluation using satellite imaging, and analysis of crack propagation and microstructures in cement-based materials [2]. Concerning asphalt pavement binders, a number of computer-vision based system has been developed, which is broadly categorized into field assessment and laboratory applications. Concerning field assessment, algorithms were developed for the identification of pavement distress type and size [25]. Nonetheless, these algorithms also allow identifying the fractured surfaces of asphalt mixtures such as broken aggregates or the adhesion and cohesion failures at the interface of failed specimens [13].

In this work, the type and degree of failure due to the moisture effect is evaluated by image analysis for a series of Warm Mix Asphalt (WMA) mixtures. WMA technology uses lower temperatures in comparison with conventional Hot Mix Asphalt (HMA). Even though in contrast with conventional Hot Mix Asphalt (HMA), the WMA technologies are more environmentally friendly and cost-effective; however, due to the lower production temperature, the WMA mixtures have shown less resistance against moisture damage [68]. Concerning distress or stripping identification using a computer vision-based system, it is essential to know that the distressed/stripped area pixel always possesses a lower pixel intensity compared to the unstripped area [9]. Conventional tests carried out to identify moisture damage region of interest (ROI) in asphalt mixtures include the use of visual assessment depending on the perception of the investigator [10], [11]. In such a scenario, the investigator might miss out to select some part of the ROI, which finally hampers the end result. Therefore, to further understand the mechanisms, it seems necessary to use an image segmentation algorithm that can quantify these failures more accurately. It is deemed necessary to note that these types of image processing involve high-performance computation because the high-end image processing cannot be handled efficiently on single computing node. In this regard, a framework is needed that allows the researchers to concentrate on the image processing tasks and refrains by getting them involved into complicated details of distributed computing. Additionally, the framework should provide the researchers with the familiar image processing tools.

This work proposes an implementation of clustering on high-resolution images of WMA broken samples using a parallel k means for image processing (PKIP) algorithm. To implement the k means algorithm parallelly, multiprocessing on a single node and Map-Reduce based programming on multiple nodes have been used [12].

An essential factor in the visual assessment of the stripping is concerning to its representation. In terms of pixel value intensity, the stripped and the unstripped area must have a significant difference. In order to classify the pixels with respect to different intensities or color similarities, the k means clustering algorithm is widely used. Using the k means algorithm, n observations can be segregated into k classes based on a particular mean value. The proximity of each observation to the cluster is iteratively processed using the nearest mean. The variant of k means comprises k median and k medoid (see Appendix section). It is worth to be noted that k median is best suited for local optimization problem, whereas k means algorithm is tailored for both, i.e., global and local optimization [13]. On the other hand, k medoid has high algorithmic time complexity (see Appendix section), and thus, it is compute intensive compared to k means [14, 15].

After this short introduction, the manuscript is arranged as follows, Section 2 describes the previous work carried out on asphalt mixture analysis using conventional image processing techniques and also describes the challenges in terms of processing time required to analyse the high-end image datasets using sequential k means clustering. Section 3 starts with WMA specimens’ preparation and proposes the PKIP algorithm for central processing unit- (CPU-) based multiprocessing execution and Map-Reduce based distributed computing, followed by a thresholding process to assess the adhesion failure. Section 4 presents the results in terms of the accuracy and performance of the proposed PKIP algorithm. Section 5 presents the discussion related to the outcome of the results. Finally, Section 6 summarizes the most significant conclusions.

2. Background

WMA technology was developed to place the asphalt mixture at a lower temperature compared to the conventional mixture, and, in recent years, WMA has gained prominence over conventional HMA approaches due to its sustainability factor. Nevertheless, due to the lower production temperature of WMA, the characteristics of probable moisture damage are more essential to be considered [7], [8]. In this sense, it is necessary to understand the effects occurring at the interface of the mineral aggregate and the asphalt binder. As per the observation of the National Center for Asphalt Technology (NCAT), the reduced mixing temperature used in the preparation of WMA leads to improper drying of the aggregates, and less aging takes place. Figure 1 shows the schematic diagram of moisture entrapment at the asphalt/aggregate interface during WMA production. Digital image processing has become a powerful tool to accurately accomplish the assessment of this type of damage [11].

Generally, image processing experiments are divided into destructive and nondestructive evaluations. In the case of destructive testing, experiments are ought to be carried out on specimen’s failure to assess the performance of the material under different conditions by breaking it, whereas in nondestructive testing, the sample is kept intact during its analysis using digital image processing. In the work carried out by Li et al. [17], an artificial neural network was used to detect the cracks in the pavement whereby classification and preprocessing step was performed, followed by Gaussian filter utilization to smoothen the background. At the final step, histogram transformation was applied to highlight the region of crack. The results obtained by image processing were compared with the experimental parameters, which were obtained in lab, and an accuracy of approximately 80% was achieved [17]. Previous studies dealt with the modeling of the microstructure of the asphalt mixture also included sensitivity exploration of aggregate size inside sand mastic [18]. Image analysis is also used for the characterization of air void distribution in asphalt mixes utilizing X-ray Computed Tomography [19, 20]. Moreover, the 3D microstructure of the asphalt concrete can also be reconstructed from pieces of 2D X-ray computed tomography graphics to obtain the required macroscopic specification [21, 22].

Numerous studies and analysis present how to build up and enhance the image processing technique research-based commercial applications such as ImageJ, Image Processing, and Analysis (IPas) or Photo Pro Plus [20, 23, 24] to be implemented on specific domains. Generally, the sizes of modern image collections are large (terabytes and petabytes of data); such vast collections of data cannot be stored and processed efficiently on a single machine. In addition, current image processing algorithms are becoming very challenging and, hence, computationally intensive. There are many compute intensive steps involved in processing large and complex image datasets to derive modified products, and several efforts are required towards integration of high-performance computing models with image processing algorithms. Although we can process the images batch-wise in these single-processor systems, there will be problems with such processing due to limited capabilities. Henceforth, a parallel programming paradigm for high-end image processing has become deemed necessary.

In image processing, the input image undergoes from several modifications [25, 26]. Implementation of image processing performs four steps: (i) image acquisition, (ii) image enhancement, (iii) image restoration, and (iv) multiresolution processing. In order for an image to be segmented and examined, initially it has to undergo a thresholding process. In image processing of asphalt sample, the image sample undergoes from several preprocessing methods focused on image smoothening to remove the noise from the image [27]. For the analysis of asphalt samples, acquisition of high-quality image is a necessary parameter and so is its intensive computation which is considerably time-consuming using sequential processing. In order to determine ROI in asphalt image sample, initially, the sample image has to be smoothened to segregate the object that determines fractured surface and the remaining unwanted background. For example, to implement this process of segregation, several image processing functions were applied by Dong et al. [28] to analyse the cracks present in asphalt samples. These image processing functions comprised of implementation of filter (i.e., Kalman, Gaussian, Sobel, polynomial, bilateral filter) to remove the noise and then the implementation of thresholding process with Canny edge detection [2] was done followed by morphological implementation operations to analyse contours. Thresholding on grayscale images were also applied to estimate stripping in untreated and treated asphalt mixture [2]. There are limitations of simple thresholding as it is a manual process and it does not resonate well with the dark color aggregate sample. Moreover, simple thresholding does not classify the shadow area accurately [29]. In order to overcome these shortcomings, Lantieri et al. [30], proposed a method of color space conversion, where they converted their sample image from Red-Green-Blue (RGB) color space to YUV color space (Y denotes luminance component, U and V are the chrominance color component).

The conventional image segmentation algorithm primarily contains the segmentation method depending on threshold value, the segmentation method based on the border, and the segmentation method based on the region [31, 32]. However, in order to determine the dominance of any pattern in the image, the k means clustering has turned out to be an efficient method compared to other clustering method as highlighted in Section 1. Cluster analysis is an important aspect for pattern/behavioral analysis [33]. Using k means algorithm, one can find out how to distinguish different kinds of elements by continuously enhancing the subliminal clustering pattern. Different clustering methods are continuously proposed and enhanced with the help of k means algorithm which are widely being used in medical image analysis using both CPU and graphical processing unit (GPU) computing [34]. The clustering process in k means algorithm is time-consuming and increases substantially with the increase in dataset size if implemented sequentially. However, due to the high efficiency of k means algorithm, it is widely utilized at the clustering of large-scale data using GPU [35, 36]. At present, many algorithms are extended and improved by keeping k means algorithm at the pivot. Compared with the traditional k means procedure, the adaptive k means algorithm used by Zheng et al. [29] transforms an image to the LAB color space before segmentation and places the luminance (l) into an adjusted value to reduce the disturbance resulting from the background. In one of the recent studies, sequential k means clustering and machine learning-based classification was used to estimate the stripping in asphalt coating [2]. It is essential to note that with respect to machine learning algorithm, the training process could be time-consuming. However, machine learning-based classification could be helpful to predict the stripping in asphalt mixture sample. It could be inferred from previous research that k means algorithm has the capability to be executed in parallel to cut down the execution time [37, 38]; therefore, a parallel implementation of k means algorithm can considerably enhance the execution time of image segmentation and additionally optimize the algorithm structure to a specific extent on multiple cores. In order to estimate the average count of pixels associated to region of interest (ROI) on multiple image dataset, Akhtar et al. [39] implemented parallel image segmentation using Map-Reduce framework. In addition to this, Akhtar et al. [12], also made detailed analysis of execution pattern with respect to CPU run time and accuracy for the multiple input image dataset to be segmented in batch. It was evident from their results that higher-order dataset size scaled well with increasing number of compute nodes or compute cores.

In contrast to supervised learning, clustering is seen as an unsupervised learning method, as we do not have the ground truth data in comparing the output of the cluster algorithm with real labels to assess its performance. Numerous k means-based clustering algorithms use distance measurements to determine the similarity between data points [4042]; however, it is advisable to normalize the data to have a zero average and standard deviation, as the characteristics in each data set would almost have different measurements. The k means algorithm is good at capturing data structures if clusters have a spherical shape [43, 44]. Moreover, in the field of artificial intelligence, k means clustering is used for hard clustering problems. Before discussing the methodology with respect to the k means clustering, Sections 3.1 and 3.2 give a brief description regarding WMA sample preparation and moisture conditioning.

3. Methodology

3.1. Asphalt Mixture Preparation

Conventional asphalt binder PG-76 was utilized in this study. In Table 1, the rheological properties of the PG-76 binder is summarized. Aggregate-type granite provided by Kuad Quarry Sdn. Bhd., Penang, was utilized in the preparation of the mixtures. The midrange aggregate gradation, type AC 14, used was in agreement with the Malaysian PWD gradation determinations [45, 46] for asphalt concrete. To evaluate the impacts of filler as anti-stripping agent in asphalt mixture, conventional Ordinary Portland Cement (OPC) and Pavement Modifier (PMD) were incorporated. PMD is a greyish-black powder mineral filler created in Malaysia which is utilized as anti-stripping agent. An incorporation of 5% of PMD by total weight of mixture, act as mineral filler in mixtures prepared [27].

Along with two HMA mixtures (with filler types OPC and PMD), WMA mixtures were prepared at different temperatures. The test specimens were compacted to the desired dimensions (height 63.5 mm and diameter 100 mm) by using the Servopac Gyratory Compactor. Air voids of the specimens for moisture sensitivity evaluation were kept at 7 ± 1%. All the specimens were prepared according to the procedures defined by Asphalt Institute Manual (MS-2) [47].

In this study, the HMA mixtures taken as reference material were prepared at 180 degree Celsius and compacted at 170 degree Celsius. However, the WMA specimens were produced at a temperature of up to 30 degree Celsius lower than HMA as shown in Table 2 along with mixture designations.

The optimum binder content was determined as per the procedure of Marshall Mix design ASTM D1559 [48] which follows the the Malaysian Public Works Department (JKR) guidelines [46] for mix type AC-14. Moreover, a gyratory compactor was utilized corresponding to an assumed 30,000,000 equivalent single axle load [27]. Number of gyrations corresponding to initial compaction (Ninitial), design compaction (Ndesign), and maximum compaction (Nmax) was respectively 8, 100, and 125. In mixture design, the target air voids of (4 ± 0.1) % were used for all the specimens. The designed optimum binder content for both HMA and WMA were obtained as 5.2%. Moreover, for WMA, Cecabase® was used as warm mix additive at an application rate of 0.3% by mass of binder.

3.2. Moisture Conditioning and Assessment

On compacted asphalt specimens, moisture conditioning was performed to assess the effect of accelerated water conditioning followed by freeze-thaw cycles. For all compacted specimens, the conditioning was carried out as per the guidelines of ASTM D4867 [49]. The only exception in the guideline was the usage of distilled water with an addition of sodium carbonate at 6.662 grams concentration. This addition was used to make the pH level high in order to enhance the stripping rate/damage inside asphalt specimens [11]. The specimens were immersed in the solution and vacuumed for 15 minutes to achieve saturation levels between 55% and 80%. Afterwards, these specimens were exposed to freezing condition at (−18 ± 3) degree Celsius for 16 hours and thawing at 60 degree Celsius for 24 hours as one cycle according to ASTM D4867 [7, 49]. Three sets of specimens, unconditioned dry, conditioned after one freeze-thaw cycle (1FT), and conditioned after thee freeze thaw cycles (3FT), were separated.

The prepared specimens were then subjected to moisture sensitivity test according to American Society for Testing Materials (ASTM D4867) [7, 11, 45, 49]. The indirect tensile strength (ITS) test was used to assess the moisture susceptibility of mixtures at a test temperature of 20 degree Celsius as per the guidelines of ASTM, 2006.

After WMA sample preparation and moisture conditioning assessment, the next step was to develop a parallel image processing-based methodology to determine the adhesion failure in the WMA sample in the lowest possible execution time. For the proposed experiment, we have developed a parallel k means for image processing (PKIP) algorithm to perform the clustering on the WMA sample. Sections 3.3, 3.4, and 3.5 will illustrate the feature space, linearization model, and the implementation of the PKIP algorithm.

3.3. Features of the Input Image (Features Space)

In image processing, a kernel function is used as a linear classifier to solve a nonlinear problem. For the classification of the kernel function-based algorithm, input space comprises of the original image, and feature space comprises the features of corresponding input image. The objective of applying the kernel-based k means clustering on the WMA specimens to distinguish between adhesion failure region and non-adhesion failure region is to bring down the value of its expected square distance d (x) of the point from its center of cluster:where d (x) is the distance from the center of the cluster, Q is the site of the training set, k is the number of clusters, is the current pattern, and Ci is the cluster center within the cluster i.

Now features of the input asphalt specimen could be computed from input space using

At the ground reality, the value of X is unknown. However, the inner product is known as kernel function (k) shown below:where represents the feature space in x and y direction.

3.4. Apply Kernel Function to Linearize the Model

Assuming in cluster , the average features space could be rewritten as

Now there may exist a point which lies at the outer segment of by appearing as an image to the input space.

In that condition becomes inaccessible, but it is possible to compute its norm.where k = .Now, assuming that, .Therefore,

Then the distance dij when the input space is mapped to the features space could be computed as

It is possible to kernelize the function in equation (7) by using a median filter given by the function: k′ (m, n) = median {x [i, j], (i, j) } (median filter function) where represents a neighborhood defined by the user which is centered around location (m, n) in the image and x [i, j] is 2D array of pixels comprising of ith row and jth column. In addition to the aforementioned filters, there are other filters as well, i.e., Gaussian filter, Bilateral blur filter, and Sigmoid filter which may be used as per the applications requirement [50, 51]. For the proposed experiment, the median filter has been used to kernelize the equation (7).

3.5. Implementation of PKIP Clustering Algorithm

This section intends to highlight the execution of a streamlined sequential and multicore CPU variant of the k means clustering algorithm. Therefore, the proposed algorithm was customized using multiprocessing programming to obtain comparative outcomes between parallel and sequential execution. The algorithm was tried with different asphalt mixture image dataset. The image datasets were acquired using a Nikon D800 model. The dimensions of the image samples were 5520 pixels (length) and 3680 pixelsl (width), and the color space of each sample was maintained in RGB prethresholding process. It is deemed necessary to detail out the system configuration in terms of number of logical cores, threading mechanism, and memory on which the PKIP algorithm has to be implemented. The work relied on OS X 10.1 64-bit operating system with a dual-core i5 (Hyperthreading support) at 1.8 GHz with a turbo boost of up to 2.9 GHz having 3 MB shared L3 cache and 8 GB RAM. The proposed PKIP implementation was tested using 4 threads. The version of the C++ compiler used was 4.6.4, while the version of OpenCV (image processing library) used was 3.4.2.

A clear sequential execution was initially performed using the algorithm shown in Pseudocode 1; however, this could not be viewed as an ideal execution on the grounds that the algorithm had a few matrix-like relations which were basic for getting better execution. The matrix computation was involved in the kernelization phase and in the distance computation of each pattern with respect to center of cluster as shown in the Pseudocode 1. In order to get substantial speedup, iterative matrix multiplications could be implemented in a computer environment where it could be executed in parallel.

In the proposed algorithm shown in Pseudocode 1, phase no. 5 holds the key to our algorithm. Phase number 5 could be categorized into two stages, stage I and stage II, using which substantial speedup is obtained for the proposed algorithm. Stage I involves the kernelization phase, and stage II involves an iterative procedure to compute the distance of every pattern to the different center of clusters. Stage I includes registering of the kernel matrix, which uses the kernel function that is connected for each matching pattern. This calculation is similar to matrix multiplication with the exception that for the proposed experiment, several task operations are involved rather than basic matrix multiplication. Moreover, these task operations require dependencies on the utilized kernel functions comprising of subtraction, squaring, and division of the median filter kernel function (Pseudocode 1).

Input sample
 Kernel function:
Phase no
(1) Apply kernel function to transform the input sample
(2) Compute distance matrix
(3) List pattern to the cluster randomly using label matrix,
(4) Change ← 0
(5) Distance computation b/w each pattern w.r.t center of cluster)
while change do
  for to do
   compute cluster size
    for to do
     compute distance of each pattern from the cluster using:
    end for
   end for
  Previous L = L
  update label L
  L = min. val of d along column
  if Previous L = L then
   change = 0
  end if
 end while

The distance computation between each pattern with respect to the cluster center comprises of three terms as

For equation (8), the first term is computed in Stage I. The computation of Stage I comprises of Phase no. (1) to Phase no. (4). The second and third term were computed in stage II. The second term is a direct multiplication of matrices, i.e., L and followed by multiplication by . The third term in the above equation depends upon the number of cluster and is continuously computed for every cluster. Since the third term is an iterative procedure and requires immense matrix multiplication, it was parallelized using multiprocessing programming. As a result, for equation (8), serial execution was exploited to compute the first term, whereas the second and the third term was implemented parallely using multiprocessing as it involved higher degree of iterative matrix multiplication.

In order to assess the performance of the PKIP algorithm for the proposed experiment, we have implemented it using C++ programming language by importing Open Computer Vision (OpenCV) library, which is used for image processing functions. The sequential execution of k means clustering can be further parallelized on multicore CPU. There are numerous application programming interfaces (API) available for parallel processing; however, for the proposed experiment, we have used Open Multi-Processing (OpenMP) for parallelizing the execution. OpenMP is an API for multiprocessing programming which supports shared memory architecture [52]. The prime feature of OpenMP is its vast instruction set i.e., OpenMP pragmas which is used for auto parallelization. For the proposed experiment, OpenMP pragma omp for was used to parallelize the intensive matrix computation involved in stage II. In order to divide the execution of Stage I between multiple cores, OpenMP shared construct was used.

3.6. Parallelization of K-Means Using Map-Reduce

In order to parallelize the k means clustering algorithm using Map-Reduce program, Hadoop framework with Hadoop Interface for Image Processing (HIPI) was used. Hadoop is an Apache open source framework written using java programming language by enabling distributed processing of large datasets across clusters of computer nodes using simple programming models [53]. The environment in which Hadoop works comprises of distributed storage and computation across clusters of computer nodes. Hadoop is designed to scale up from single node to thousands of nodes, of which each node offers local computation and storage.

Maps are the individual tasks which converts input records into an intermediate record. Any given input pair may map towards a minimum of zero or a maximum of n output pair. A Hadoop Map-Reduce structure spawn single map task for each input split was produced by the Input-format for the job. The Reduce function of Hadoop framework is the second phase. Reducer minimizes the set of intermediate values passed by mapper and shares a similar key. The quantity of reduces for the job is set by the client by means of the reduce task function. In general, Reducer executions are passed the job configuration for the job through the job configurable class. The framework then calls the function reduce () for each key-value pair for the gathered inputs.

The Map function operates on every point x on the given image dataset. On a given point x, the squared distance between x and every mean is calculated and subsequently the mean Mi is determined which minimizes this distance. On the basis of these parameters, a key-value pair gets emitted with an index of mean “i” as key and (x, 1) as the value. Therefore, a map function could be framed as

On the other hand, the Reduce function is just a pairwise summation performed on the values associated to each key. For instance, if two value pairs [(x, q), (y, r)] are associated to a particular key, then a combined formation is possible by adding all elements in the given pair. In this regard, k means reduce function could be formed as follows:

Set of k values could be formed using the Map-Reduce characteristic of two functions shown aswhere denotes the set of points closest to mean Mi. Now the updated means could be computed as

For the proposed experiment, initial cluster centres (means) were determined using the canopy algorithm [54] whereby a set of initial distance threshold, i.e., T1 and T2 were assigned for each sample image such that T1 > T2. Post threshold initialization, the mapper function selects a sample vector randomly from the sample image and assigns it as a central vector of the canopy, and accordingly, it traverses the entire sample image. The distance between the scene image and the canopy central vector has to be less than T1 in order to get classified as canopy. If it is less than T2, then it gets discarded from the dataset. Finally, the output of the mapper function gets processed by the reducer to integrate the central vectors of the canopy. This generates a new canopy of the central vector which is considered as the initial clustering center.

In order to calculate the distance between a point x and each of the means using Map function, every compute node must possess set of current values of means. Therefore, in this regard, new means are circulated to all the compute nodes after the completion of each iteration. If the convergence is achieved after a specific iteration, then the execution gets completed; else, the new means/centroid are computed again using iterate () function.

Pseudocode 2 shows the implementation of k means using Map-Reduce.

(1)Choose k initial value for the input image.
(2)Apply Map-Reduce ∀ k.
 iterate (point, centroids)
 {Assign points => clustersmap {(point => (point nearest to centroids) => (point, 1), (point n, count n))}reduce {((point1, count1), (point2, count2)) => (point1 + point2, count1 + count2)}Total_mapValues {(point_Total, count_Total) => point_Total/count_Total}
(3)Record the new means (Updated Centroid) post Map-Reduce.
(4)Circulate the updated centroid to each node in the cluster.
(5)If (converged (centroids, Updated_Centroids))
Then finish execution
Else iterate (points, Updated_Centroids)

Our experimental setup consists of 4 machines comprising of 1 master and 3 slaves. The master node is responsible to take the input image files. For the proposed experiment, Hadoop (version 2.7.1) framework was used in order to implement the parallel computation using 4 nodes. Figure 2 shows the setup configuration. The setting of Hadoop parameters was amended as shown in Table 3. Table 4 shows the specs of the nodes.

3.7. Thresholding of Processed Samples

For the analysis of asphalt mixture specimens post k means clustering, HSV thresholding was used (see Appendix section) [55].

In the proposed experiment, background and foreground segregation was significant to gain clear visibility of the ROI. For instance, the HSV color space for the boundary condition {[0, 1], [0, 1], [0, 255]} belongs to {Hueinput, Satinput, Valinput}, the region of interest (ROI) being a specific tuned color object. Then the computation of the threshold pairs, i.e., (Huelower, Hueupper), (Satlower, Satupper), and (Vallower, Valupper) could be utilized to convert the HSV color space image to the binary form using the below equation:

Here C (x, y) is the segmented part. The equation (13) illustrates that if the HSV values for the pixels of the input image lies within the range of lower bound to upper bound values, then its associated output pixel belongs to class object 1, otherwise it gets designated to null (0) [56].

The flowchart in Figure 3 illustrates the HSV thresholding along with computation of zero and nonzero pixels. Figure 4 shows different images of specimens from the WMA dataset (Dry, 1FT, and 3FT) and Figure 5 shows its raw RGB model whereby it could be observed that the variable intensities of the brown component represents the adhesion failure in WMA samples. Figure 6 shows the different images of specimens from the HMA dataset and Figure 7 shows its associated raw RGB model. From the visual assessment point of view, the white region within the sample represents broken aggregates. If the RGB model is observed minutely for WMA and HMA samples, then a slight difference could be observed with respect to the brown component pattern for WMA and HMA samples, i.e., higher degree of the brown component is visible for HMA samples if compared to WMA for all the three categories, i.e., Dry, 1FT, and 3FT. (Figures 47).

4. Results and Discussion

The proposed work highlights the efficiency of PKIP algorithm for CPU-based execution by testing several datasets of asphalt mixtures. Analysis was done for the performance of proposed algorithms in terms of execution time and accuracy. For this study, three categories of dataset were evaluated, namely, dry, 1FT, and 3FT. At the same time, the influence of the mixing temperature (i.e., 180°C for HMA and 170°C, 160°C and 150°C for WMA) and type of filler (OPC and PMD) were analysed as well. The details of the different datasets are shown in Table 5.

In order to remove the noise, median filter was applied, and subsequently, the images of WMA along with its RGB color model and HMA samples along with its RGB color model is shown in Figures 8, 9, 10, and 11. After having a clear observation of the filtered images and its RGB color model, it could be seen that substantial noise elements have been blurred and the sample images has been polished. Post the application of median filter, the process of k means clustering becomes fine-tuned and thus helps in providing the accurate results.

Before initiating the explanation of CPU-based performance, it is required to highlight the accuracy of its execution timing result. With respect to execution time, it is important to note that there were fluctuations in the reading after each run of PKIP algorithm for every sample; therefore, the recorded errors were the best-obtained ones post four runs of the proposed algorithm. This performance comparison is shown in Figure 12. The fluctuation in the obtained values was within the range of from the best recorded value. Moreover, it is necessary to clarify that this error is due to the communication overhead among the threads (virtual version of CPU core) which results is extra waiting time.

As mentioned earlier in Section 3.5, with respect to the PKIP, the algorithm constitutes two stages, i.e., stage I involves the kernelization phase and stage II involves iterative procedure to compute the distance of every pattern to the different center of clusters. As stage II is an iterative process, it has to complete a predefined maximum number of iterations to attain the convergence. With respect to the proposed experiment, for each value of k, the cost of k means function was computed using vegas-viz plot library for Scala (see Appendix section) [57]. The best choice for k was chosen by observing the value which got minimized with little return gain. For the proposed PKIP algorithm, the best segregation was obtained by setting the cluster size to 10. Figure 13 shows one of the specimens (WP170 subjected to 3FT) with varying number of clusters (k). It is observed from each clustering stage that the different intensities of brown color (adhesion failure) are getting normalized. By keeping the value of k = 10, a fine segregation is achieved with respect to different intensities of color component which makes it a favorable factor for HSV thresholding. For the proposed experiment, the noise was removed using median filter function. From the visual assessment also, it is observed from Figure 13 that post cluster 5, maximum degree of segregation is achieved, or the intra-cluster variation has reduced within the sample region. However, we have chosen k = 10 as per the k value and cost of k means function as the best case.

Figure 14 shows the RGB color model along with histogram in RGB and HSV color space for the sample input image. In the RGB model, the encircled portion represents the adhesion failure. It is important to note that before initiating the clustering process, histogram equalization was applied on input image to maintain the uniform luminosity. It is observed from the histogram that most of the variation in the RGB components corresponds to a uniform lower range value of the pixel (low luminosity). However, at the extreme right of the histogram, higher peaks of RGB is to be seen, which clearly signifies clustered noise with high luminosity. It is also essential to note that the more the luminosity is, the higher the value in HSV histogram is.

4.1. Discussion on CPU Implementation

In regard to the CPU implementation of the proposed PKIP algorithm, several ideas are noteworthy, especially with respect to the parallel code which leverages on the number of CPU thread(s) which is considered to be a virtual version of CPU core. Figure 15 shows the effect of incrementing the threads to execute PKIP clustering on different asphalt mixture datasets for k = 10. As shown in Figure 15, increasing the number of threads from 1 to 2 brings down the computational time in a nonlinear manner. This is due to the fact that all stages of computation involved in equation (8) do not scale directly with the available cores. For equation (8), stage I which does not possess iterative computation showed higher degree of linear scaling compared to stage II. This particularly signifies that all the components of stage II were not completely parallelized, and there was some composition of serial execution remaining in it. It is evident from the execution time of C program (serial execution) and C + OpenMP (see Appendix section) program (parallel execution) that there is an approximately 25% to 30% improvement in terms of execution time with respect to C + OpenMP program, i.e., parallel execution of k means clustering.

Moreover, when the number of threads is increased from 2 to 3, then it could be observed from the execution timing results that there is not much difference between single thread and triple thread results. In addition to this, when the number of threads was increased from 3 to 4, then there was an approximately 25% to 30% increment in the execution time of C + OpenMP. Therefore, it is evident from the obtained results that increasing the number of threads does not always increase the execution time subjected to the specification of the CPU. With context to effect of hyper-threading, refer to Appendix section.

Tables 6, 7, and 8 show the execution timing (sec) results of PKIP algorithm implemented both sequentially and parallelly on the WMA and HMA samples. Table 9 shows the gain in speedup between sequential implementation using C program (Serial Execution) and the parallel implementation using C + OpenMP for k = 10.

It is inferred from Tables 6, 7 and 8 that when the number of clusters (k) increases then PKIP algorithm scales well if compared with the serial or sequential programming for all the datasets. In particular, the best timing difference was obtained for k = 2, between parallel k means and sequential. It can be observed that for dry conditions, a wider gap by considering lower execution time for C + OpenMP is 9.49 sec, similarly for k = 4, 6, 8, and 10, the associated timing difference is 14.84 sec, 22.30 sec, 30.73 sec, and 37.17 sec, respectively. For 1FT condition specimens, the best timing difference for k = 2, between parallel k means and sequential k means is 9.43 sec while for k = 4, 6, and 10 the associated timing difference is 18.87 sec, 28.43 sec, 38.34 sec, and 47.47 sec, respectively. Finally, for the specimens conditioned with 3FT, the best timing difference for k = 2, between parallel k means and sequential k means is 9.03 sec, whereas for k = 4, 6, 8 and 10 the associated timing difference is 18.07 sec, 22.34 sec, 31.66 sec, and 45.18 sec, respectively.

4.2. Map-Reduce-Based Implementation

Tables 10, 11, and 12 compare the execution timing result of Map-Reduce-based implementation of PKIP algorithm over single node and multiple nodes for Dry, 1FT, and 3FT image datasets. Table 13 shows the speedup comparison of Map-Reduce-based PKIP algorithm implementation with its sequential implementation using C-program and parallel implementation using C + OpenMP. It could be inferred from the results that the Map-Reduce based PKIP algorithm scales well with the increase in nodes. For the execution with respect to single-node implementation of PKIP algorithm, an overhead resulted post splitting of algorithm in Map and Reduce phase. It is also essential to note that Map-Reduce-based PKIP algorithm implementation can outperform C + OpenMP-based PKIP implementation by increasing the number of compute nodes due to the fact that C + OpenMP cannot be implemented over distributed nodes. After performing the initial experiments, we observed that Hadoop block sizes should depend upon the input image dataset. In Hadoop, the data files are divided into blocks. When the block size is too small, the number of data block collaborations involved may increase, thereby affecting the overall results [58].

If we closely observe the execution timing in Tables 1013, then we can see some similarity between the execution timing of C + OpenMP and single-node implementation of PKIP algorithm for k = 10. Moreover, up to 26% reduction could be observed in execution timing for k = 10, implemented over 4 nodes if compared to single-node execution timing. However, only up to 4.3% reduction could be observed in execution timing for k = 4, implemented over 4 nodes if compared to single node execution timing.

Obtained results illustrate the superiority of the proposed Map-Reduce-based PKIP algorithm when applied to different image datasets. Map-Reduce-based PKIP algorithm enhanced the clustering response time by leveraging on Hadoop framework characteristics. Impact of the number of compute nodes used on the clustering algorithms can also be studied for 3 categories of image dataset by increasing the number of compute nodes (i.e., from 2 to n). Scalability of Map-Reduce based PKIP algorithm can be tested by researchers by maintaining the same type of sample image datasets but varying the cluster size.

With respect to the measure in accuracy of execution timing for PKIP implementation using Map-Reduce, the recorded errors were the best-obtained ones post four runs similar to the case of C + OpenMP implementation. Figure 16 shows the fluctuation or the accuracy in execution time reading for Map-Reduce-based PKIP implementation. For the PKIP implementation using Map-Reduce, the fluctuation in the obtained values were within the range of from the best recorded value. The probable reason for this fluctuation is due to the communication cost of mapping and reducing. Moreover, in this regard, researchers are free to explore the possibility of reducing this fluctuation by optimized scaling and tight scheduling of Hadoop parameters. It is also to be observed that the fluctuation in the execution time reading is lower if compared to C + OpenMP readings. This shows the stability of Hadoop’s Map Reduce framework when scaling is taken into account. However, in terms of single-node implementation, OpenMP may possess lesser degree of fluctuation if compared to Hadoop subjected to the system specification.

4.3. Precision, Recall, and F-Measure Evaluation

Precision, recall, and F measure are known as the influencing parameters for the evaluation purpose of the proposed PKIP algorithm. The dataset contains 600 image samples comprising of HMA and WMA. The samples were categorized as 300 positive samples and 300 negative samples. The positive samples were the one with adhesion failure, whereas the negative samples were the HMA samples with less than 0.5% adhesion failure. The sample images were cropped into 500 × 500 pixels and its labelling was done using MS Paint. For the training purpose, the sample images were randomly divided.

Precision is the proportion of genuine positives to the cases that are anticipated as positive. For the proposed experiment, the genuine positives are the pixels that are correctly identified. It is the level of chosen cases that are correct and is computed as follows:

Recall is the proportion of genuine positives to the all ground truth pixels. The ground truth pixels were the all segmented pixels. Recall is computed as follows:

To determine the weighted average of precision and recall, the parameter of F-measure is used which is computed as follows:

For the initial tests, different patch sizes considered as the container of pixels were used with the dimensions 3 × 3, 5 × 5, and 9 × 9. Table 14 shows the impact of patch size on the influencing parameters. It could be clearly observed that the precision, recall, and F value is almost same for 3 × 3 and 5 × 5. For 9 × 9, the precision values are slightly greater than the other two; however, the execution time is substantially high if compared to 3 × 3 and 5 × 5. Therefore, for the training evaluation, we have used 3 × 3 patch for different values of k. The analysis of influencing parameters has been implemented over single node using sequential processing.

Table 15 shows the impact of k value on the influencing parameter, and it could be clearly observed that k = 10 gives the highest F value.

4.4. Analysis of Adhesion Failure Using HSV Thresholding

Tables 16, 17, and 18 represent the percentage of adhesion failure after moisture conditioning (freeze-thaw cycles) and indirect tensile tests for all the k means clustering-based asphalt mixture image samples (i.e., WMA and HMA prepared with PMD and OPC). The images processed using HSV image thresholding for a WMA sample image dataset and HMA sample image dataset for all three moisture sensitivity test is shown in Figures 17 and 18, where the number of clusters has been set to ten (k = 10), and the composition of nonzero and zero pixels were computed using equation (13) to determine the percentage of adhesion failure for all asphalt mixture datasets. For the proposed experiment, Scala version 2.11 programming language has been used for the zero and non-zero-pixel computation.

From Tables 1618 it becomes evident that the percentage of adhesion failure in HMA and WMA sample subjected to different level of moisture conditioning process depends on mixing temperature and also the type of fillers used. The effect of lower mixing temperature on percent adhesion failure of WMA mixtures has consistent increment in adhesion failure at different levels of moisture conditioning. Moreover, for both the categories, i.e., HMA and WMA, the PMD filler comes out to be better if compared with OPC filler, and in this regard, a difference in adhesion failure for up to 4-5% could be observed in 3FT condition. Section 5 discusses some of the key elements regarding the outcome of the results pertaining to PKIP implementation.

5. Discussion

With respect to the analysis of execution timing of PKIP using C + OpenMP and sequentially, it was inferred from Tables 68 that the best execution timings are obtained from the processing of WP150, WP160, and WP170 samples. Further research is required to find the reasoning behind the faster execution time for these samples. From one of the recent studies [2], it is essential to note that in terms of stripping analysis using k means algorithm, the larger execution time arises due to classification of dark shaded areas in the image sample. This reason prevails for the proposed experiment also. Moreover, in order to reduce the execution time of the k means clustering, it is not recommended to downsize the resolution of image sample as it results in loss of pixels which will ultimately affect the accuracy of the final results significantly. From the speedup Table 9, it could also be inferred that a maximum speedup of 1.41X is obtained for WP150 and WP170 for the case of 1FT dataset which in turn implies that execution time using C + OpenMP got reduced up to 30% for these datasets.

After analysing the Map-Reduce implementation of PKIP, it could be inferred from speedup Table 13, that for single node implementation, C + OpenMP-based k means implementation is a viable option if k is lesser than or equal to 5 subjected to the specs of the system. However, if we want to increase the number of clusters up to 10+, then Map-Reduce-based implementation of PKIP algorithm is highly suitable as it can give speedup up to 1.86X, i.e., 46.5% reduction in execution timing for the cases of WP150 and WP170 in 1FT dataset category if compared directly with sequential implementation of k means using C Program. Higher gain in speedup is possible if more compute nodes are added to the cluster. For the proposed experiment, only 3 additional compute nodes were added. It is essential to note that Hadoop framework has a vast set of scheduling configuration. Therefore, users can exploit these configurations to exploit the maximum resource benefits from Hadoop and thus explore the possibilities of better execution timing results compared to default scheduling algorithms. This also highlights the fact that more room of optimization prevails in the Map-Reduce algorithms. It is deemed necessary to understand that the execution timing plays a prominent role in clustering process. The proposed Map-Reduce-based PKIP algorithm is well suited for clustering large datasets compared to sequential k means algorithm.

It was also be depicted from Tables 1618 that the WMA specimens prepared using PMD fillers when subjected to 3FT condition show relatively less damage evolution in percentage of adhesion failure if compared to 1FT and dry condition. However, when HMA prepared using PMD filler was subjected to 3FT condition, the evolution of the percentage of adhesion failure was only slightly higher if compared to 1FT and dry conditions. It is to be noted that an environment with indirect light condition should be used in order to prevent light reflections. If such condition is inevitable to avoid, then a heavy usage of advanced filters will be required which will consume higher processing time. A light blue or cream background is suitable to use as a background for the image analysis, as it becomes easier to differentiate between the foreground and the background prior to the implementation of thresholding process. For the proposed experiment apart from differentiating the stripping using adhesion failure, fractured surfaces due to broken aggregate were also observed. Resultant image sample subjected to different freeze-thaw cycles show that large areas of non-zero pixels (brown color pixels) are reduced with the addition of the Cecabase warm mix additive in general. The results therefore indicate that the surfactant-based chemical additive has worked well to substantially reduce the exposure to moisture in mixes in particular with the use of PMD filler. In addition, the average stripping rate of WMA blend treated with PMD anti-stripping agents was less than 5.5 percent compared to the maximum level of 11% attained by HMA mixture using OPC filler after 3FT cycles. Amongst the different level of moisture conditioning (dry, 1FT, and 3FT) cycles, both HMA and WMA mixtures with OPC filler have shown the highest percentage of adhesive failure (Figure 1(a)). On the other hand, the use of PMD anti stripping agent in both HMA and WMA has shown relatively lower amount of percent adhesion failure in comparison with OPC filler using different conditioning cycles. The analysis of the results indicates that the amount of coating has been increased in mixes that contain PMD filler using WMA additives irrespective of the conditioning cycles used.

6. Conclusion

From the experimental results, it is concluded that the proposed PKIP algorithm is a viable and effective alternative to assess the percentage of adhesion failure in asphalt mixture and thereby encourages an approach to identify the extent of moisture damage in warm mix asphalt. It is important to be noted that in this experiment, high-resolution (5520 × 3680) images of asphalt mixtures were used to keep the image quality intact. Therefore, when PKIP algorithm was applied on high-resolution asphalt mixture image using OpenMP and Map-Reduce, the execution time got reduced up to 30% and 46% if compared with sequential processing of k means clustering. Nonetheless, researchers are encouraged to apply the proposed PKIP algorithm on quadcore and octacore compute nodes to exploit its efficiency to the best possible extent. Moreover, a distributed version of k means clustering can also be applied in 3D image analysis of asphalt mixture by experimenting with different types of filters [39].

From the results obtained in terms of adhesion failure, with respect to HMA, WMA specimens prepared using PMD filler in dry condition have higher moisture resistance. It is also evident that HMA mixtures with OPC filler after three freeze-thaw (3FT) cycles were more prone to moisture damage. However, the WMA mixture with PMD filler was considered more efficient if compared to 1FT and Dry specimens, as the increase in percent adhesion failure was not too high if compared to Dry and 1FT. The percent adhesion failures in WMA mixtures subjected to 3FT cycles were more prevalent than unconditioned mixtures (Dry) and mixtures subjected to 1FT conditioning, confirming that PG-76 based WMA mixtures when subjected to 3FT conditioning compared to HMA were more efficient. Therefore, to analyse the adhesion failure in asphalt mixtures, PKIP algorithm could potentially be used to quantify moisture damage due to its high accuracy and lower execution time.


Hue Saturation Value (HSV) Threshold. The hue (H) component of any color represents its similarity to any of the purest form of color, while the saturation (S) component of any color shows the non-white element present in it. The value () component of any color shows its subsequent darkness or lightness.k medians is a variation of k means where instead of calculating mean to compute centroid of the cluster, median is computed.K-medoid minimizes the sum of dissimilarities between points labelled to be in a cluster and a point designated as the center of the cluster.Note on Hyperthreading. It is essential to note that the CPU having two physical cores have the capability to process four threads parallelly using hyper threading. However, practically, one physical core is designated to run only single thread but with the help of hyper threading, the CPU is capable to exploit the idle pipeline stages to execute another thread. By doing this, it gives an illusion of a separate physical core to the Operating System. Certain sections of the processor are duplicated to store the architectural state, but the main execution resources are not. A basic RISC processor instruction pipeline comprises of five stages, namely, instruction fetch (IF), instruction decode (ID), memory access (MEM), execute (EX), and register write back (WB). One thread can be in the MEM stage and another thread can be in the IF stage. There are some hazards though. If, for example, the MEM stage is dependent on the EX stage, then it could cause problems. Hyper threading aims at minimizing the number of dependent instructions so that another thread may be run in the idle pipeline stages without dependency issues. With respect to OpenMP code parallelization, data points and cluster centres were marked within the shared construct. Moreover, under the shared construct, the iterative matrix multiplication as mentioned in equation (8) third term was implemented using omp for construct.OpenMP. Library to implement parallel programming for shared-memory-processors.Otsu’s threshold. An algorithm used for image segmentation which returns a single-intensity threshold which separates foreground and background pixels.. Here n is the number of objects, k is the number of cluster and i is the number of iterations.Scala. A general purpose programming language which supports both object oriented and functional paradigm of programming.

Data Availability

The input image samples of asphalt mixtures used in this study are available in the Mendeley data repository via the link: https://data.mendeley.com/datasets/x8ptnsjd7x/1 DOI: 10.17632/x8ptnsjd7x.1. The source code for PKIP is available at https://github.com/nishatakhtar/PKIP.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


The proposed experiments have been carried out in School of Aerospace Engineering and School of Civil Engineering of Universiti Sains Malaysia. The authors would like to acknowledge technical staff of Highway Engineering Lab, School of Civil Engineering, Universiti Sains Malaysia, for their kind support in fulfilling this project. The authors would like to acknowledge the RU-Top-Down grant (1001. PAERO.87052) and RUI grant (1001.PAERO.8014035) provided by the Research Creativity and Management Office, Universiti Sains Malaysia, to support this research. In addition, the authors would like to acknowledge Universiti Teknologi Petronas and its Industrial Grant (015MD0-052).