Abstract

Most building structures that are built today are built from concrete, owing to its various favorable properties. Compressive strength is one of the mechanical properties of concrete that is directly related to the safety of the structures. Therefore, predicting the compressive strength can facilitate the early planning of material quality management. A series of deep learning (DL) models that suit computer vision tasks, namely the convolutional neural networks (CNNs), are used to predict the compressive strength of ready-mixed concrete. To demonstrate the efficacy of computer vision-based prediction, its effectiveness using imaging numerical data was compared with that of the deep neural networks (DNNs) technique that uses conventional numerical data. Various DL prediction models were compared and the best ones were identified with the relevant concrete datasets. The best DL models were then optimized by fine-tuning their hyperparameters using a newly developed bio-inspired metaheuristic algorithm, called jellyfish search optimizer, to enhance the accuracy and reliability. Analytical experiments indicate that the computer vision-based CNNs outperform the numerical data-based DNNs in all evaluation metrics except the training time. Thus, the bio-inspired optimization of computer vision-based convolutional neural networks is potentially a promising approach to predict the compressive strength of ready-mixed concrete.

1. Introduction

Structures like buildings, bridges, highways, and dams are currently built using concrete as their construction material, owing to its numerous advantages, such as strength, durability, and versatility. Its compression capacity, adaptability, and resistance to climate-induced erosion and corrosion make concrete one of the best construction materials. Compressive strength is one of the principal mechanical properties of concrete that is directly related to the safety of the structures that are built from it. The compressive strength of concrete must comply with relevant standard codes, which vary among countries.

To determine the compressive strength of concrete, a cubic or cylindrical sample is typically tested using a compressive testing machine after the required curing time. These tests are labor-intensive and time-consuming. Methods such as regression methods and numerical simulation have been proposed to solve this problem and to predict the compressive strength of concrete. However, the complex nonlinear correlation between relevant variables makes the obtaining of accurate values of compressive strength very difficult.

With the advances of artificial intelligence (AI) and increases in computing power [1, 2], deep learning (DL) is being applied in an increasing number of fields. DL, which is a form of AI, has been shown to be effective in making more accurate predictions than conventional methods in many situations. One DL technique, computer vision, is often used to extract information from visual media, such as images and videos. Used in various fields, computer vision-based technique is effective for image classification, object detection, and semantic segmentation.

Several studies of the prediction of concrete compressive strength have involved the use of DL techniques [3] to improve model performance, but few have involved image recognition. The latest study of the use of image recognition to determine compressive strength had good results, but it used only 74 sets of concrete data and a single-layer convolutional neural network [4]. To examine and improve the effectiveness of image recognition, in this study, a large dataset of ready-mixed concrete is used with convolutional neural networks (CNNs) that involve a prediction model with deep layers to extract high-level features from inputs.

Model accuracy is often evaluated with the use of a cross-fold validation or random split method to partition the source data for the testing of the training model [5]. Such methods are often called into question as overfitting occurs owing to the information leakage within the original dataset in the training process. Therefore, when putting the model into practice, it often shows a relatively poor forecast performance. Because the concrete data are accumulated over time in ready-mixed plants, the built model should be tested via a latest dataset to reflect its reasonable prediction accuracy in future use.

In this investigation, the effectiveness of the computer vision-based approach in predicting the compressive strength of ready-mixed concrete by converting numerical data to images is tested. In much research, the prediction of the compressive strength of concrete uses numerical data as inputs. The results thus obtained using those computer vision-based techniques are compared with those obtained using numerical data. With this logic, a collection of numerical values that are represented as images are the inputs to a DL technique that uses CNN-based models, which have been shown to provide accurate image classification in the domain of computer vision.

The effectiveness of the computer vision-based technique was tested by comparing the results with those of another DL technique that uses deep neural networks (DNNs) with numerical data for model construction. To maximize the accuracy, a metaheuristic optimization algorithm was used to finetune the hyperparameters of the best DL models. Instead of using the cross-fold validation or random split method within the original dataset, a newly collected dataset in the upcoming year was used for the testing of the training model. This approach meets the practical needs and operations in estimating compressive strength at ready-mixed concrete plants.

This paper is organized as follows. Section 2 reviews the relevant literature. Section 3 describes the methodology and performance metrics that are used herein. Section 4 presents the collection and preprocessing of data, implementation of the DL models, the experimental results obtained using the optimized DL models, and sensitivity analysis of modeling performance. The final section summarizes the findings and limitations of the method and makes recommendations for future studies.

2. Literature Review

2.1. Conventional Compressive Strength Prediction of Ready-Mixed Concrete

Ready-mixed concrete is typically manufactured in a concrete plant before being transported to a construction site. In a concrete plant, ready-mixed concrete is manufactured by combining several raw materials with a specific design mix ratio to create concrete with certain desirable properties. Figure 1 presents the manufacturing process of ready-mixed concrete.

The compressive strength of concrete is commonly tested using a compression test machine, which performs a mechanical test to measure the maximum compressive load that can be borne by a concrete sample [6]. Before testing, the sample must be cured for a specified curing period. Non-destructive tests (such as ultrasonic or pulse velocity tests [7] and conductivity tests [8]) have also been proposed to determine the compressive strength due to the lack of correlation between the standard compression test value with the real strength of concrete in a structure. These tests, however, have disadvantages with respect to time, cost, and labor.

Owing to the disadvantages of mechanical tests, empirical models [9, 10] for calculating the compressive strength of concrete have been developed. Empirical methods (e.g., multiple linear regression), however, have been shown to be somewhat ineffective for calculating the compressive strength of concrete because of the nonlinear behavior in relevant concrete variables. The compressive strength of concrete is influenced by numerous factors, as it is formed by complex reactions among concrete materials (such as cement and aggregates) and the environment (as in curing) [11].

2.2. Deep Learning to Determine Concrete Compressive Strength

In recent years, the field of artificial intelligence (AI) has grown very rapidly. AI methods are used in a wide variety of fields, including seismology [12], energy systems [13], and civil engineering [14]. In several studies, AI has been used to determine the concrete compressive strength, using real data for concrete to build a prediction model. During the training of the prediction model, various composite materials of concrete, such as cement, water, sand, and gravel, are used as predictors to yield a model that best fits the given training data. After validation, the model is then used to predict the compressive strength.

An advanced branch of AI, deep learning (DL), has performed excellently in fields such as computer vision [15]. Many studies [1618] have shown that DL exhibits outstanding prediction performance, especially in image and video recognition. In this field, the commonly used DL techniques include those based on convolutional neural networks (CNNs) [19]. A recent study confirmed that the CNN model (Visual Geometry Group, VGG) achieved a 98% accuracy in concrete compressive strength prediction, which was 2% and 12% greater than the machine learning models, random forest (RF) and support vector regression (SVR), respectively [20].

2.3. Hyperparameter Optimization with Metaheuristic Algorithm

In the training of the DL models, additional optimizers are often required, as the models have several hyperparameters (such as the epsilon of batch normalization, batch size, epoch, learning rate, and dropout rate) that influence their predictive performance [21, 22]. To find the values of hyperparameters that yield the best prediction model, optimization algorithms (such as the greedy algorithm [23]) that are based on iterative methods (such as gradient descent [24]) or heuristic methods [25] are often used. However, such methods may not always lead to the optimal solution and consume a significant computational time compared to modern metaheuristic algorithms.

The metaheuristic algorithm, with its ease of implementation and effectiveness in various fields, is becoming increasingly popular for use in solving optimization problems. Recently, several newly developed metaheuristic optimizers have outperformed the well-known metaheuristic algorithms [26, 27]. The Jellyfish Search (JS) algorithm [27], in particular, has great efficacy because it requires little tuning of algorithm-specific parameters. Consequently, the JS algorithm was used in this study to optimize the DL models.

3. Methodology

3.1. Deep Learning and Computer Vision-Based Techniques
3.1.1. Deep Neural Networks

Artificial neural networks (ANNs) consist of information processing units that are arranged in layers similar to neurons in the human brain. An ANN typically comprises layers of three types: an input layer, hidden layers, and an output layer. The architecture that is used in a deep learning model typically consists of more than four hidden layers. Figure 2 displays a simple ANN model architecture.

An input layer receives data and an output layer generates a prediction. In the hidden layers, inputs are processed and the information that is obtained from the processes is passed to the next layer. Values from the input layer are transformed by multiplying them by weights and adding bias values.

Several types of ANN vary in implementation. A fully connected neural network is an ANN that consists of connected neurons. In such an ANN, all neurons in a layer are connected to the neurons in the next layer. Likewise, standard feedforward neural networks (FNNs) consist of numerous connected neurons, and each connection transmits information to other neurons in one forward direction [28].

Notably, internal hyperparameters affect the learning of an ANN model. A hyperparameter is a constant parameter that is set before the training begins. Some examples of hyperparameters in ANNs are the number of hidden layers, learning rate, batch size, and epoch. In contrast, parameters such as weights and bias values change throughout the learning process.

A deep neural network (DNN) is a neural network that differs from a typical ANN with respect to architecture. DNNs have multiple hidden layers (Figure 3) that are used to extract high-level features from the input data. Additional layers typically correspond to additional parameters (such as weights and biases) in a model. Accordingly, DNNs can capture complex nonlinear relationships [28].

3.1.2. Convolutional Neural Network-Based Models

A convolutional neural network (CNN or ConvNet) is a connected neural network that is generally effective for solving computer vision problems, such as image feature extraction, classification, object detection, and semantic segmentation. A CNN commonly learns patterns by processing image or video data. It can detect objects, identify the locations of the objects, and differentiate or segment them inside an image.

A generic CNN usually comprises an input layer, multiple hidden layers, which include convolutional layers, pooling layers, fully connected layers, and dropout layers, and an output layer (Figure 4). In the input layer, the model receives images as inputs and creates input tensors that contain the pixel values of the images. Input matrixes of dimensions  × h × c are then fed to the hidden layer, where represents the width of the image, h represents the height of the image, and c represents the number of channels. A standard colored image typically has three channels for red, green, and blue.

The convolutional layer in the CNN model processes the previous matrixes into smaller forms without losing any feature by generating weight values of a filter or kernel of a certain size (m × m) and then multiplying the filter (n × n) by the input matrixes. Convolution operation is defined as follows [29]:Here, I is the input image data; F is the filter; ⊗ denotes the convolution operation; C is the convolution map of size (o × o), in which o =   + 1; s is the stride and denotes the number of pixels by which F is sliding over I; and zp is the zero padding. Usually, it is necessary to add a bounding of zeros around I to preserve complete image information. The values thus obtained are summed (Figure 5). Sliding over all parts of input matrixes, the convolutional layer generates, as an output, a new feature map of certain features in the image.

After the multiplication processes, a CNN model typically applies an activation function that introduces nonlinearity to the model to help it learn complex patterns in the data. A general form of activation function is defined as follows:

is the convolution map after applying the nonlinear activation function . Of the many available activation functions, the rectified linear unit (ReLU) is commonly used, as it provides better training results than other activation functions [30]. A ReLU function is a simple calculation that returns the original input values or sets the value to zero if the input is less than or equal to zero (Figure 5).

The pooling layer in the model reduces the size of the input matrixes by reducing the number of parameters and the amount of computation in the network, preventing overfitting. Similar to a convolutional layer, a pooling layer takes several input values inside a filter from the previous layer and the filter is shifted over some pixels at a time until all parts of the input matrix are processed. Common pooling layer types are average pooling or max pooling (Figure 6). The pooling operation also called downsampling operation is expressed as follows:where is the pooling map and is the pooling operation.

After the operation of several convolutional layers and pooling layers, a CNN model typically flattens the output matrix of the previous layer into a single vector of values. The single vector of values is input to a fully connected layer to extract the features that were learned in the previous layers and to classify the input images. In this layer, the probabilities that an object in the input image is a member of the possible classes are calculated. The model output of the ith fully connected hidden layer is expressed as follows [29]:where the weight sum vector is

is the connected weight of the artificial neurons. is a nonlinear activation function (e.g., sigmoid, Tanh, ReLU, etc.). The bias value defines the activation level of the artificial neurons.

In neural networks, when the parameters of a layer change, so do the distribution of inputs to subsequent layers. These shifts in input distributions can be problematic for neural networks. To alleviate this concern, many normalization operations, such as Batch Normalization (BN), Layer Normalization (LN), and Instance Normalization (IN), have been proposed. For example, given an input batch of height h and width with n samples and c channels x ∈  , BN normalizes the mean and standard deviation for each individual feature channel during training [31].where are referred to as the scale and the shift parameters for the channel; are the mean and standard deviation, respectively, computed across batch size and spatial dimensions independently for each feature channel.

Adding a dropout layer is an effective regularization technique to improve the generalization capability and mitigate overfitting of models. Dropout function can be formulated as follows [32]:where ∗ denotes the element-wise product and and are the original feature and distorted features, respectively. In addition, ∈ is the binary mask applied on feature map in which is the dimension of the feature map of l-th layer, and each element in is drawn from Bernoulli distribution and set to 1 with the dropping probability. Undoubtedly, implementing dropout on the features in the training phase will force the given network paying more attention on those non-zero regions, and partially solve the overfitting.

In this decade, various CNN models and their advanced variants have been developed. Some common and popular CNN models are VGG [33], residual neural networks (ResNets) [34, 35], Inception [36, 37], extreme inception (Xception) [38], MobileNet [39, 40], DenseNet [41], NASNet [42], and EfficientNet [43]. These CNN-oriented models have different architectures, which are briefly introduced as follows:

(1) VGG. VGG [33] uses a very small kernel (3 × 3) rather than one of a previously common size, 5 × 5 or 7 × 7, which would have a wider scanning area. The small kernel is used uniformly throughout all layers. Although the overall architecture is simple, the VGG has an enormous number of parameters. Figure 7 displays the architectures of two common VGG models, VGG16 and VGG19, which comprise 16 and 19 deep layers, respectively. In the figure, the convolutional layer is denoted as “<kernel size> Conv, <filter>.”

(2) ResNet. Increasing the depth of a CNN by adding layers to its architecture up to a certain limit should help the corresponding CNN model to learn more complex features, but a vanishing gradient problem typically prevents the effective training of a CNN model in many-layered networks. A vanishing gradient problem can prevent the weights in the network from being updated, potentially stopping the training of the CNN model. To solve this problem in residual neural networks (ResNets), the network implements “residual connections” or “skip connections.”

A residual connection refers to a shortcut connection that is added inside a CNN architecture to allow information to be passed or added through layers of the convolutional block (Figure 8). In the original ResNet, a shortcut connection is added before the activation function is implemented, while in ResNet v2 [34], activation functions are implemented before the convolutional layer and the shortcut connection is added after. Figure 9 presents the architectures of ResNet50, ResNet101, and ResNet152, which comprise 50, 101, and 152 deep layers, respectively.

(3) Inception. Inception architecture [36] is the first CNN model architecture that exhibits the advantages of branching a convolutional path into multiple paths. In Inception, the CNN model uses filters of various sizes in various paths. At the end of the block, the model concatenates the outputs of the paths. In Inception-v3 [36], the Inception model is improved by changing the original 5 × 5 and 7 × 7 convolution kernels to two 3 × 3 and three 3 × 3 convolutional kernels, respectively. These changes in the architecture help the model reduce the amount of computation that is required during the training process.

In Inception-ResNet-v1 [37] and Inception-ResNet-v2 [37], the original inception blocks are converted into residual inception blocks. The Inception-ResNet-v2 model differs from the Inception-ResNet-v1 model in that it is more computationally burdensome. However, it outperforms the original Inception and ResNet models. Figure 10 displays the Inception-v3 and Inception-ResNet-v2 models’ architectures.

(4) Xception. The Xception (or Extreme Inception) [38] architecture (Figure 11) is inspired by the Inception model. In Xception, the original inception blocks are replaced by depthwise separable convolutions. A depthwise separable convolution consists of a depthwise convolution and a 1 × 1 convolution. A depthwise convolution is a spatial convolution that performs convolutional multiplications independently over each channel. In depthwise convolution, a convolutional kernel only iterates one channel of the input, not all channels.

(5) MobileNets. MobileNets [39] refer to a type of CNN model whose objectives are to reduce the number of parameters and the number of computations while maintaining accuracy. Accordingly, MobileNets use depthwise separable convolutions. They are typically used in mobile devices or embedded applications, and so have a small architecture. In MobileNets, width multiplier and resolution multiplier hyperparameters are implemented to thin the network and to rescale the input image, respectively.

Similar to the original MobileNet, MobileNetV2 [40] is built for mobile devices. In MobileNetV2, an inverted residual structure, which consists of linear bottleneck layers, is used. An inverted residual structure expands a low-dimensional feature map to a high-dimensional one, uses depthwise convolutions, and projects back features to a low-dimensional representation using a linear convolution. MobileNetV2 has fewer parameters than the original MobileNet. Figure 12 displays the original MobileNet and MobileNetV2 architectures.

(6) DenseNet. The main intent of a dense convolutional network (DenseNet) [41] is to use short connections between layers by connecting the network layers to every other layer in the forward direction. Therefore, the inputs of each network layer include the feature maps of all preceding layers. This approach has been shown to improve the accuracy of a CNN. Figure 13 displays the DenseNet architecture.

(7) NASNet. The neural architecture search network (NASNet) [42] is used to solve the problem of finding a good CNN architecture by finding a neural network architecture or the best combination of parameters in a CNN with a recurrent neural network (RNN) acting as a controller. Figure 14 presents the neural architecture search method that is used in a NASNet model. Figure 15 displays one of the model architectures, NASNet-A, for the mobile version, which is found using the neural architecture search method.

(8) EfficientNet. EfficientNet is a type of CNN model that uniformly scales all depth, width, and resolution dimensions using a compound scaling coefficient. A total of eight CNN models are developed based on this idea. The models are named EfficientNets followed by B0, B1, B2, B3, B4, B5, B6, and B7. The EfficientNet architecture includes a total of seven network blocks (Figure 16). The number of subblocks inside varies with the EfficientNet models that are used [43].

3.2. Metaheuristic Optimization Algorithm: Jellyfish Search Optimizer

One of the challenges that is associated with the deep learning models is the finding of optimal hyperparameters. To solve this hyperparameter optimization problem, a metaheuristic optimization algorithm is frequently used. Considerable research has been done on the development of metaheuristic algorithms, and some of them have become well known for their effectiveness in solving optimization problems [4446]. The metaheuristic algorithms primarily vary in the balance between their two main phases—exploration and exploitation [47].

A newly developed metaheuristic optimization algorithm, the Jellyfish Search (JS) optimizer [27], has considerably outperformed many other well-known metaheuristic optimization algorithms and it requires less algorithm-specific parameter tuning than some well-known metaheuristic algorithms. The optimizer requires the setting of only two controlling parameters, which are the number of iterations and population size. In a JS optimizer, the population of jellyfish is initialized using a logistic map, which generates varying initial populations.

Since the optimization algorithm is inspired by the behavior of jellyfish as they search for food in the ocean, the objective function of the JS optimizer is the location of jellyfish where it has the most food. In a JS optimizer, the exploration phase involves the movement of jellyfish as they follow ocean currents in search of food, while the exploitation phase involves the passive and active motions of the jellyfish inside a jellyfish swarm. Figure 17 presents the six phases of jellyfish in the ocean [27], including phase 1: jellyfish in the ocean; phase 2: following the ocean current; phases 3–5: passive and active motions inside the jellyfish swarm that are switched to each other according to a time control mechanism; and phase 6: reach the jellyfish bloom.

3.2.1. Movement Following Ocean Current

Ocean currents carry a large amount of food, attracting jellyfish to them, and thus jellyfish follow them. The following equation represents the direction of the ocean current, (), and the new location of a jellyfish after it moves, [27].Here, is the jellyfish at the best location, is the average location of all jellyfish, are the current locations of the jellyfish at time t, and are the updated locations of the jellyfish at time (t+1).

3.2.2. Motions Inside Jellyfish Swarm

The motions of jellyfish in a swarm can be grouped into passive motion (type A) and active motion (type B). Passive motion signifies a movement of a jellyfish around its original position, and active motion signifies its movement to another position. Initially, most jellyfish exhibit type A motion, but after some time, more jellyfish exhibit type B motion [27]. The new location of a jellyfish that exhibits A motion is formulated as follows:where is a random number between 0 and 1, is the upper bound on the search space, and is the lower bound on the search space.

For type B motion, one other jellyfish, , is randomly selected for use in determining the new location of the jellyfish of interest, . If the amount of food at the location of exceeds that at the location of , then will move toward . Otherwise, will move away from . The direction of type B motion and the updated jellyfish location are given by the following equations for minimization problems:where and denote the objective functions at locations and , respectively.

3.2.3. Time Control Mechanism

A time control mechanism in a JS optimizer determines the type of jellyfish motion and controls the switching between the phases of the JS optimizer (following an ocean current and moving inside a jellyfish swarm). The equation below provides the time control function, .Here, is the time specified as the iteration number and is the maximum number of iterations.

If the value of exceeds 0.5, then the jellyfish will follow the ocean current; if it is less than or equal to 0.5, the jellyfish will move in a jellyfish swarm [27]. To determine the type of jellyfish motion inside a jellyfish swarm (passive motion and active motion), the function is used. When exceeds , the jellyfish will exhibit passive motion (type A). When is less than , the jellyfish will exhibit active motion (type B). As increases, the value of also increases [27].

3.2.4. Algorithmic Flowchart and Pseudocode

The algorithmic flowchart and the pseudocode of the JS algorithm, starting from problem definition, controlling parameters’ definition, initialization, to the loop of phases, are presented in Figures 18 and 19, respectively.

3.3. Validation and Performance Evaluation

Validating the capability of the DL model that classifies data or analyzes datasets to predict a new dataset is essential. In neural network models, a loss function usually refers to the minimization of the prediction error. The training error, which is the average loss of the training sample, is not useful for evaluating the performance of the model because a low training error may indicate that the model is overfitting the training data, and so will generally perform poorly given new data [48]. The validation, therefore, should be conducted using a separate sample error.

During the development of a DL model, a dataset is typically split into three sets–the training set, the validation set, and the test set. The training set is used to learn the pattern of the inputs that correspond to a certain output; the validation set is used to evaluate the prediction error of the training model and to tune its hyperparameters; the test set is used to assess the error of the final model. No exact rule for splitting the dataset exists, as the split depends on the number and complexity of the available data.

3.3.1. Validation Method

A validation set is used as the input of a previously trained prediction model to evaluate the performance of the model when used with new, never-seen-before data. The validation process is repeated multiple times with various hyperparameter combinations, and thus the purpose of using a validation set is to assess the performance of the training model and to find the optimal hyperparameters.

Two of the most popular methods for evaluating the generalization ability of the prediction model are holdout method and cross-validation. The holdout method randomly splits the data into a training set, a validation set, and a test set. The cross-validation method partitions a dataset into several subsets, implements the learning process on all but one of those subsets, and evaluates the performance using the left-out subset in turn. The cross-validation method is particularly suitable for a small dataset to enhance model validity.

For practical use in the ready-mixed concrete plant, the model is built based on the accumulated historical data, and subsequently will be used for a new concrete dataset in the prediction of compressive strength. To fairly reflect the prediction accuracy on-site, this study adapted the holdout method by training/validating the model with the whole historical dataset and testing it with newly collected concrete data in the upcoming year. By doing so, one would not overestimate the model performance in practice and could prevent information leakage from model training.

3.3.2. Performance Metrics

The performance metrics that are used in this study are the mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), training time, and synthesis index (SI). The MAE is the average of the absolute differences between the actual and predicted values. Taking the absolute difference makes all error values positive, avoiding the false determination of an accurate prediction when negative and positive differences are summed.

Mean squared error (MSE) is the average of the squared differences between the actual and predicted values. The square root of the MSE, called the RMSE, is taken to the lower order of the MSE. MAPE is the average of the absolute errors divided by the actual values. The training time of various modeling techniques is compared to examine the implementation practicability.

A low value of MAE, RMSE, or MAPE indicates good performance; a short training time is desirable. SI is the mean of the sum of normalized values and indicates the overall model performance; it ranges from zero to one and zero indicates the best performance among all models. Table 1 provides the formulas for the performance measures.

4. Analytical Results and Discussion

4.1. Experimental Settings
4.1.1. Software and Hardware

Model building and testing were implemented in Anaconda software with the Python programming language on a machine (computer) with an NVIDIA GeForce RTX 2080 Ti graphics card. The Jupyter Notebook application in Anaconda [49] was used to display the inputs and outputs of the prediction models. Python packages support specific programming tasks and protect against their incompatibility. Numerous Python packages, which are available for use with Anaconda, are used (such as NumPy, pandas, and matplotlib). For building and testing the DNN models, the TensorFlow package [50] is used. For building and testing the CNN-based models, the Keras Application package [51] is used.

In particular, the Keras Application package supports the implementation of CNN models for prediction, fine-tuning, and feature extraction. It provides CNN models with pretrained weights from “ImageNet.” The package also provides a transfer learning feature that helps solve the practical problem of a lack of data resources and improves the accuracy of prediction using pretrained weights. Table 2 presents information about the models, with accuracies that were obtained using the 2012 ILSVRC ImageNet validation set [51]. The depth refers to the number of layers in the Keras Applications’ CNN model, including the activation layer, batch normalization layer, and other layers.

4.1.2. Collection and Preprocessing of Data

A total of 8,310 data samples about ready-mixed concrete, relating to 32 variables, were collected from 2001 to 2019 by the Taiwan Construction Research Institute (TCRI). The data were split at the time of data sample collection to enable a prediction model to be built using historical data and tested using new data.

Accordingly, 339 data samples that covered one year (2019) were used in the testing process and the remaining 7,971 data samples were used in the training process. Of the 339 data samples for testing, 15 were removed because the value of compressive strength was missing, creating a test set of 324 data samples. The 7,971 data samples for training were further preprocessed according to the practical recommendations by a panel of domain experts in TCRI.

Among the 32 variables, the manufacturer’s name, category of data, and date of collection were removed because the corresponding data were apparently irrelevant to the variability of concrete compressive strength. Ten other variables were removed because data were incomplete; these were the amount of admixture, the surface moisture content of sand (from a computer report and sieve analysis, respectively), silt charge, fineness modulus of sand, the strength of cement, specific surface area of cement, percentage of active blast furnace slag, fineness of blast furnace slag, and the ratio of water-reducing admixtures.

Totally, there are 19 concrete variables to be used for the prediction of the concrete compressive strength. One output variable is the test value of ready-mixed concrete compressive strength, and the other 18 input variables are the design strength of concrete, target strength of concrete, slump test, chloride ion content, temperature (temperature of the concrete taken on site), water-binder ratio, the water content of concrete, cementitious material consumption, cement ratio, amount of cement, amount of slag powder, amount of fly ash, amount of fine aggregate, amount of coarse aggregate, sand ratio, location (north), location (middle), and location (south).

The preprocessed data were processed again to yield three sets of data with different variables for use in numerical experiments for various purposes. Dataset 1 included 13 variables that are recommended by the TCRI; dataset 2 included 7 variables that are frequently used in prior studies [5255] on the prediction of compressive strength; and dataset 3 included the resulting 18 variables after preprocessing. Tables 35 display the variables in the dataset, the number of data points in the datasets, and the descriptive statistics of variables in the datasets, respectively.

4.1.3. Converting Numerical Data into Images

The original numerical data were converted to images to be used as inputs to the CNN-based models. Each collection of values in a data sample for concrete was represented as an image. To create the image, the numerical data were first normalized to values between 0 and 1. These normalized data were then multiplied by 255 to encode them as grayscale values between 0 and 255 (Figure 20). Black represents 0 and white represents 255.

For each of datasets 1 and 2, a total of 6705 images were created. For dataset 3, a total of 5856 images were created. Figure 21 presents the example (dataset 3) of the labeling of the image data. Each image is labeled with the corresponding continuous output value, the compressive strength value of the ready-mixed concrete.

4.2. Implementation and Comparison

Prediction models and sensitivity experiments with various purposes were carried out (Table 6). Baseline models were used with the hyperparameters set to default values in the TensorFlow and Keras Applications. In the DNN, numerical data are input, while for the CNN-based models, the input numerical data are converted to image data. In this study, the size of the image input to each CNN-based model was the minimum possible size to meet the practical needs.

4.2.1. Deep Learning Models and Performance

Since the same model and hyperparameters yielded different model performance values in different runs, each model was tested five times and the average model performance value was taken as the actual. For both the CNN and DNN models, the loss function was set to be the MSE. In the DNN model, 50 hidden layers with selected numbers of hidden nodes (Table 7) had the best prediction accuracy in comparison with other numbers of hidden layers and other numbers of hidden nodes. The architecture was thus used to build the baseline DNN prediction model.

Tables 810 compare the performances of the DL models in predicting the compressive strength of ready-mixed concrete when they are trained and tested using the given data. The results indicate that the CNN models, ResNet50V2, MobileNet, and DenseNet121, with their default parameters, all performed best on the three datasets, respectively. The CNN models, ResNet50V2, MobileNet, and DenseNet121, with image data, outperformed the baseline DNN model with numerical data. The results also indicate that the best CNN models on each dataset outperformed the DNN in terms of each performance metric, except for the training time.

4.2.2. Optimized Convolutional Neural Network-Based Models

As CNN models, ResNet50V2, MobileNet, and DenseNet121, performed best in the corresponding datasets, a metaheuristic optimization algorithm, the jellyfish search (JS) optimizer, was used to optimize them. The CNN models were optimized to minimize the errors of prediction of the compressive strength of ready-mixed concrete using the best values of the hyperparameters.

The JS optimizer was used to find the best hyperparameter values in a set of ranges. Several hyperparameters of a CNN, such as the epsilon of batch normalization, batch size, epoch, learning rate, and dropout rate, were selected to be adjusted during the search herein. For DenseNet121, two additional hyperparameters were optimized—the growth rate and the reduction value. Table 11 presents the default values of hyperparameters in the reference papers [34, 41] and the range of hyperparameters to be finetuned in this study.

Table 12 compares the performances of best CNN models using default hyperparameters and optimized by JS in predicting the compressive strength of ready-mixed concrete. The results indicate that using the JS optimizer on the hyperparameters improved the accuracy of the prediction models. Table 13 shows the best hyperparameter settings for each optimized CNN model.

4.3. Influence of Feature and Image Pixel Orientation on Modeling Accuracy

To examine the sensitivity of the generalization ability of a prediction model, the resulting 18 variables (features) were experimented using the best CNN model (DenseNet121) by removing one of the variables and using the remaining variables for model training. For the location variables, three variables were removed simultaneously and the remaining variables were used for sensitivity analysis. These tests were conducted to investigate the effect of each feature (attribute) on the generalization ability of model prediction. Table 14 displays the performance results with MAPEs, in which the lower value of the MAPE stands for the better model performance without the specified attribute. The experiment demonstrated that the MAPEs do not differ much from one another. However, the slight increase of MAPE in each numerical experiment comparing to the baseline MAPE (11.72%) implies the inclusion of those variables (X1–X3, X5, and X16–X18) has a positive impact on the prediction accuracy of ready-mixed concrete compressive strength.

Another numerical experiment was conducted to examine the influence of image pixel orientation (pixel row order) on the computer vision-based modeling performance. Two types of image pixel orientation (IPO) formed by the input attributes (pixels) were tested, namely, the original pixel array and the correlated pixel array, according to the correlation values between the input attributes and the compressive strength. Specifically, the input image data were shaped by arranging the input attributes (pixels) in random order and descending the pixels order by their correlation coefficients, respectively.

Table 15 displays the correlation coefficients between the input variables and the compressive strength of ready-mixed concrete. Ordering the IPO based on the magnitude of the correlation coefficients, two new datasets were created. One IPO is arranged by descending the original values of the correlation coefficients and the other IPO is arranged by descending their absolute values.

Table 16 presents the sensitivity analysis of image pixel orientation on the computer vision-based modeling performance. It is observable that all metrics with the correlated order of image pixel orientation show worse performance than that obtained using the original ordered image by the same optimized CNN model (JS-DenseNet121). Therefore, the analytical results indicate that the correlated order of image pixel orientation for the image converting of ready-mixed concrete data does not significantly influence the performance of the prediction model.

5. Conclusions

The effectiveness of computer vision in predicting the compressive strength of ready-mixed concrete was analyzed to improve the predictions thereof. Deep learning (DL) models were constructed by imaging the numerical data as inputs to predict the compressive strength of ready-mixed concrete. Various prediction models were compared and the best DL prediction models were identified for different sets of input concrete-related features and optimized after their performances were further analyzed.

The models for the prediction of concrete compressive strength are frequently built with the use of cross-validation or random split in-sample data for evaluating prediction accuracy, which often gives optimistic results (overfitting) in the training/test process while exhibiting poor performance in future use. It’s mainly because the processes, materials, machines, and technicians that are involved to manufacture ready-mixed concrete in batch plants are being continually improved and replaced periodically. Up-to-date samples for ready-mixed concrete might be derived differently from the evolving development of batch processes.

A prediction model is built using historical data; it uses newly collected data, which should be irrelevant to the training data samples, to make predictions; therefore, the optimality of using random split in-sample data to test models in the prediction of concrete compressive strength in the literature is now doubted. To capture the actual performance of predicting the compressive strength of concrete, out-of-sample data (newly collected data) should be used for model testing to avoid potential information leakage. Although the model accuracy may be decreased in comparison with that obtained by in-sample cross-validation or randomly split data for training and test, using such an approach for the out-of-sample test reflects the real predictive performance in practice.

Furthermore, CNN-oriented models are often trained without tuning the hyperparameters. This study adopts a metaheuristic optimization algorithm to optimize the prediction model. The predictive accuracy of computer vision-based deep learning models was improved herein using the jellyfish search (JS) metaheuristic optimization algorithm. The JS optimizer finds the best hyperparameters, optimizing the performance metrics of the CNN models. This study contributes to the novel application of the computer vision-based method, which integrates the latest CNN models with a newly developed JS optimizer to predict the compressive strength of ready-mixed concrete. The analytical experiments show that modeling with image-converting data outperforms the models using the original numerical data.

In this investigation, the training data were samples on ready-mixed concrete only. Using data on high-performance concrete or more complex engineering data would improve this work of the computer vision approach to predicting a numerical output like the compressive strength of concrete. More cases should be studied to confirm the effectiveness of imaging data on ready-mixed concrete and other types of concrete to identify patterns of compressive strength by the bio-inspired metaheuristic optimization of computer vision-based deep learning models.

Future studies could consider environment-oriented factors that may affect the ready-mixed concrete compressive strength, such as the type of manufacturing equipment, transporting process of concrete, and the handling speed of on-site operators in addition to the material-oriented attributes herein. A fair comparison between laboratory-determined concrete compressive strength and on-site evaluation of concrete compressive strength should be investigated.

Abbreviations

AI:Artificial intelligence
ANN:Artificial neural network
BN:Batch normalization
CNN/ConvNet:Convolutional neural network
DenseNet:Dense convolutional network
DL:Deep learning
DNN:Deep neural network
FNN:Feedforward neural network
IN:Instance normalization
IPO:Image pixel orientation
JS:Jellyfish search
LN:Layer normalization
MAE:Mean absolute error
MAPE:Mean absolute percentage error
MSE:Mean squared error
NASNet:Neural architecture search network
ReLU:Rectified linear unit
ResNet:Residual neural network
RF:Random forest
RMSE:Root mean squared error
RNN:Recurrent neural network
SI:Synthesis index
SVR:Support vector regression
TCRI:Taiwan Construction Research Institute
VGG:Visual geometry group
Xception:Extreme inception.

Symbols

:Width of image × height of image × number of channels
m:Dimension of an input image
n:Filter size
C:Convolution map
I:Input image data
:Convolution operation
F:Filter
o:Dimension of convolution map
s:Stride
zp:Zero padding
:Nonlinear activation function
:Convolution map after applying the nonlinear activation function
:Pooling map
:Pooling operation
:Model output of the ith fully connected hidden layer
:Weight sum vector
:The activation level of the artificial neurons
:Batch normalization at a given layer from x
:Scale parameter for the channel
:Shift parameter for the channel
:Mean of the batch
:Standard deviation of the batch
∗:Element-wise product
:Original feature
:Distorted features
:Binary mask
:Dimension of the feature map of the l-th layer
:Direction of the ocean current
:Jellyfish with the optimal location
:Average location of all jellyfish
:Jellyfish of interest
:Time specified as an iteration number
:Current location of a jellyfish
:New location of a jellyfish after a movement
:Random number between 0 and 1
:Upper bounds of the search spaces
:Lower bounds of the search spaces
:Jellyfish other than the jellyfish of interest
:Quantity of food at the location of
:Quantity of food at the location of
:Direction of active motion (type B) of jellyfish
:Time control function
:Maximum number of iterations
:Number of predictions
:Actual value
:Predicted value
:Number of performance metrics
:Value of performance metric
:Minimum value of performance metric
:Maximum value of performance metric.

Data Availability

The datasets, codes, and replication of results generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to thank Taiwan Construction Research Institute and the Ministry of Science and Technology, Taiwan, for financially supporting this research under grants NTUST-TCRI-No.109-0139-9257 and MOST 109-2221-E-011-040-MY3, respectively.