Several natural and human factors are responsible for the defacement of the external walls and tiles of buildings, and the related deterioration can be a public safety hazard. Therefore, active building maintenance and repair processes are essential for ensuring building sustainability. However, conventional inspection methods are time-, cost-, and labor-intensive processes. Therefore, herein, this study proposes a convolutional neural network (CNN) model for image-based automated detection and localization of key building defects (efflorescence, spalling, cracking, and defacement). Based on a pretrained CNN VGG-16 classifier, this model applies class activation mapping for object localization. After identifying its limitations in real-life applications, this study determined the model’s robustness and ability to accurately detect and localize defects in the external wall tiles of buildings. For real-time detection and localization, this study applied this model by using mobile devices and drones. The results show that the application of deep learning with UAV can effectively detect various kinds of external wall defects and improve the detection efficiency.

1. Introduction

Changes in customer preference may negatively affect building sustainability, well-being, and safety and may eventually increase competitiveness in the market. For proactive and prompt building maintenance and repair work, customers seek quick, effective building monitoring approaches to avoid severe damage and unnecessary expenditure [1]. Conventional approaches for examining building structures typically require the involvement of building surveyors who conduct assessments of building elements. These assessments include lengthy site inspection for systematic recording of the building elements’ physical condition on the basis of note-taking, photographs, drawings, and customer-supplied information [2], followed by analysis of the collected data and writing of a health assessment report of the building. The components of this report include the assessed building’s current state, recent updates, maintenance and repair records, and future long-term repair cost estimates [3]. However, this approach is a time-, labor-, and cost-intensive process and can endanger the surveyors’ health and safety, particularly when the building to be assessed is a mid- to high-rise structure.

Convolutional neural networks (CNNs) have been applied to detect the deterioration of many structures such as roads, bridges, and tunnels but have rarely been employed to detect deterioration of building external walls [46]. Moreover, unmanned aerial vehicles (UAVs) have wide applications in deterioration detection. Consequently, a UAV-CNN combination for external wall deterioration detection could have practical applications, ensuring surveyor safety.

In this study, we focused on the automated image-based detection and localization of key defects (efflorescence, spalling, cracking, and defacement) in the external wall tiles of buildings. However, this study was only a pilot study and thus has a few limitations: (1) the model could not consider multiple defect types simultaneously; in other words, all the considered images belonged to only one category; (2) the model considered only images with visible defects.

Herein, this study reports a CNN application for the automated assessment of the external wall tile condition of buildings, with a brief discussion of the method for selecting the most common defects of these tiles. First, we provide a brief overview of various applications of CNNs, including deep learning techniques, for resolving computer vision-related problems, followed by a description of the theoretical basis for the current study. This research proposes a model for detection and localization that is based on transfer learning, involving the use of VGG-16 to execute feature extraction as well as feature classification. Next, the localization problem and the class activation mapping (CAM) technique—incorporated within the defect localization model—are discussed. Subsequently, we discuss the employed dataset, the developed model, and the obtained results, finally followed by conclusions and directions for future studies.

2. Literature Review

2.1. Factors Leading to Building Deterioration

Building lifespan can vary from decades to centuries. In general, building durability can be increased through constant protection, repair, and maintenance activities [7, 8]. The deterioration rate and degree differ among building components, with construction design, material, method, construction quality, and environment being the crucial influencing factors [9]. Several factors leading to building deterioration may be divided into the following categories: natural environment (temperature, relative humidity, sunshine, wind, and water), natural disasters (earthquakes and typhoons), and human factors (design, construction, users, management, and maintenance) [1013].

2.2. Building External Wall Tile Defects and Their Types

External wall tile defects not only influence the overall appearance of buildings but also endanger public safety; for instance, they may lead to injuries due to their falling. External wall tile defects can be roughly divided into five types: defacement, efflorescence, cracking, spalling, and bulging. Of these, defacement, efflorescence, cracking, and spalling have been the main focus of most studies:(1)Defacement. Defacement, the most significant and common type of external wall tile deterioration in buildings, is closely related to the architectural shape and design of a building and long-term influence of wind and rain on it [14]. Several major factors result in the defacement of external wall tiles. For instance, when rebar is exposed due to external wall cracks, water containing rust from the corroded iron flows out of the walls, defacing the affected areas. Moreover, installation of accessories can damage external wall tiles, thus promoting algal and fungal growth on the affected walls.(2)Efflorescence. Efflorescence—commonly known as whiskering, saltpetering, or “wall cancer”—often affects the hollow bricks of building finishes, joints of external wall tiles, or joints of stone veneers. Efflorescence prevention in cement mortar or concrete-based structures is impossible.(3)Cracking. The main causes of external wall cracking include overloading of buildings, uneven land subsidence, and violent shaking during earthquakes [15]. The drying shrinkage of external wall concrete, corrosion expansion of rebar, secondary construction of external wall accessories, and man-made disasters of fire and explosion can aggravate this cracking. Furthermore, tile breakage can lead to entry of rainwater into the main bodies of buildings, resulting in internal and external structural deterioration. Hence, cracks on a building’s facade can influence the building’s appearance and cause rainwater invasion, possibly leading to inconvenience in daily life or loss of property or even affecting building safety and durability.(4)Spalling. Spalling is characterized by falling off of surface decorative materials (e.g., tiles and coating) due to reduction in adhesive strength, aging of cement mortar and concrete, poor tile quality, high temperature caused by fire, or natural forces (e.g., strong wind and violent shaking during earthquakes) [1618].(5)Bulging. Bulging mainly occurs between concrete and the base cement mortar. Gaps form between the layers of cement mortar and surfaces of external wall tiles, resulting in material separation. Long-term changes in temperature or humidity lead to a reduction in adhesive strength and separation of adhesive interfaces for various adhesives.

2.3. UAVs for Building Deterioration Detection

Currently, UAVs are widely used in construction for applications that can be broadly divided into six areas: (1) building inspection, where UAVs are used for data collection to assess the current building condition [1922]; (2) damage assessment, where the data collected by UAVs are used to assess the damage to buildings after disasters [2326]; (3) site survey and drawings, where UAVs are used to obtain the spatial scope of a survey to make two- or three-dimensional drawings [2729]; (4) safety inspection, where construction sites are frequently assessed according to safety standards [30, 31]; (5) schedule monitoring, where the data (mainly visual data) collected by UAVs are used to monitor construction schedules [32, 33]; and (6) other applications, which include building maintenance [34], 3D building reconstruction [3538], material tracking, and air volume measurement [39]. Many studies have reported that UAVs improve work efficiency, reduce cost incurred, and increase convenience [4043].

The methods of building deterioration detection include visual assessment, percussion-based identification, rebound intensity assessment, ultrasonic wave propagation assessment, pull-out testing, infrared thermography, and UAV use [4446]. Compared with other methods, the application of UAV is a more efficient method to collect huge amount of building data [47, 48].

In addition to deterioration detection, UAV can be used in environment monitoring, traffic management, pollution monitoring, and security [4951]. UAV is also an important emerging technology to develop sustainable communities [52].

2.4. CNN Use for Building Deterioration Detection

With the development of deep learning, the applications of automatic defect detection on community infrastructures and built environment are increasing. CNNs have been used for rapid structural damage detection and maintenance cost estimation after a serious earthquake so as to provide a reference for owners and decision-makers to make accurate and timely risk management decisions [53]. Region-based CNN (R-CNN) and faster R-CNN have also been used for road damage detection and classification [54]. Other CNN applications include the detection of concrete cracks [5557], automated detection of deformation at the bottom of steel box girders of long-span bridges [58], and automated detection of building types in street images [59]. Besides, CNNs have also gradually used in building external wall defect detection. Agyemang and Bader applied a CNN for detecting cracks on the building external walls and assessing the defects therein [3]. Perez et al. also used CNNs to detect the building defects [9]. As shown in the related researches, VGG-16 and CAM are the commonly used methods in the application of building defect detection.

In summary, although deep learning has been used in many engineering fields [60, 61], it has less been used for detecting external wall deterioration. Moreover, integrating UAV and deep learning applications may increase the practical value of automated external wall deterioration detection.

3. Materials and Methods

This study developed a deep learning model with the ability to classify defects, namely, efflorescence, spalling, cracking, and defacement, in the external wall tiles of buildings. By applying CNNs, we identified the related limitations and challenges based on the nature of not only the defects to be investigated but also the surroundings: images showing the defect types of different external wall tile sources were collected first, and then, the data were appropriately cut and resized; the obtained dataset was used to train the network model after completion. Next, by using a transfer learning technique with a pretrained VGG-16 model in ImageNet as our model, this study customized and initialized the weights. Subsequently, this study used a separate set of images, not seen by the trained model thus far, to validate and examine the trained model’s robustness. Finally, this study applied CAM and addressed the localization problem.

3.1. Dataset

All external wall tile images were obtained using mobile phones, handheld cameras, and drones; thus, they had differences in resolution and size. Accordingly, to increase the study dataset size, the obtained images were sliced into images with a resolution of 224 × 224 and 3024 × 4032 pixels. In total, 5680 images were used as the training dataset for our model, all of which were labeled and categorized as efflorescence (n = 1382), spalling (n = 1386), cracking (n = 1551), and defacement (n = 1361) images (Figure 1). Additionally, of the images in the dataset, 10% randomly selected were used to form a validation dataset. To prevent overfitting, this study applied a wide variety of image augmentation processes, namely, rescaling, rotation, height, and width shift, to the training dataset. The datasets could be viewed in the public website:Defacement dataset: https://drive.google.com/file/d/1EFYwA3GCD5gbWoQR4P_n7t6Z-haE0IjF/view?usp=sharing.Efflorescence dataset: https://drive.google.com/file/d/1l5eBPtT1HnBCGNLKawZUq8_tbjoLK-D4/view?usp=sharing.Cracking dataset: https://drive.google.com/file/d/1YzHglz4f6sKu-Pw8D2PBbHQNkJU3wuwT/view?usp=sharing.Spalling dataset: https://drive.google.com/file/d/1ktIzpu2u3fakRTEh5H_IF1FSoKMxio1f/view?usp=sharing.

3.2. Method for Automated Defect Detection

This study used a modified model as the feature extractor (Figure 2) and applied fine-tuned transfer learning to an ImageNet-pretrained VGG-16 network [62]. The mentioned transfer learning is to first conduct training under big data to ensure that the deep learning network has the basic ability to recognize objects. Subsequently, the classification layers of the network are replaced with the required categories to make the network more robust.

This study used VGG-16 because it is powerful yet has simple architecture with relatively few layers. This architecture comprises five convolutional layer blocks with max pooling for feature extraction; next, three fully connected layers and one final 1 × 1000 Softmax layer come after the mentioned layer blocks. Moreover, in the CNN, the input comprises 224 × 224-pixel RGB images, and the first block consists of two convolutional layers with 32 filters, each size 3 × 3. The second, third, and fourth convolution blocks use filters of sizes 64 × 3 × 3, 128 × 3 × 3, and 256 × 3 × 3, respectively. This simple architecture eases model modification processes for transfer learning and CAM while preserving the model’s accuracy.

In the determination of hyperparameters, some of the default values are directly used and some of them are determined by training data testing and modifying. The default values of optimizer (as SGD), momentum (as 0.9), and weight decay (as 5e−4) are directly used without modifications [63]. The range of 1r is from 0.001 to 0.01, and the convergency efficiency is better on 0.01 after testing. Although there are many loss functions, the cross-entropy loss method is used owing to the research objective of basic classification. Batch size is usually justified by the multiples of 2, and 25 is determined by the system performance. To fine-tune the VGG-16 model, the initial four convolutional layer blocks were first used as the generic feature extractor, and then, the final 1 × 1000 Softmax layer was replaced with a 1 × 4 classifier (for efflorescence, spalling, cracking, and defacement). Finally, the newly modified model was retrained to enable only the weights of the fifth convolutional block to update during training.

3.3. CAM-Based Object Localization

Problems in object localization differ from those in image classification. Algorithms can determine the class of image features or objects and detect and label the objects within the image usually by placing a rectangular bounding box, indicating the algorithm’s confidence of existence [64]. Moreover, for a detected object, a neural network provides four numbers as the output; these numbers function to parameterize the aforementioned bounding box.

For the identification of discriminative regions in the image, CAM can be combined with classification-trained CNNs. In CAM, the height of image regions, which are relevant to a specific class, is determined by reusing CNN classifier layers so as to obtain optimal localization results. In this study, the application of CAM to the current study model increased the accuracy of image localization.

4. Results

4.1. External Wall Tile Prediction

Figures 3(a) and 3(b) illustrate the loss and learning curves derived for our model for the training dataset. Epoch, presented on the horizontal axis in both curves, represents the training cycle for in which the entire dataset was entered into the network. Therefore, when the loss curve presents a lower value, the probability of image recognition error is low, but when the learning curve presents a value close to 1.0, the model training accuracy is high. As indicated in Figure 3(a), at around the 50th cycle, the loss curve reached stable convergence to achieve good image recognition. As presented in Figure 3(b), model training remained in a good state.

The training dataset included 5680 images, and the training involved 500 cycles. As shown in Figure 3, our model was well trained. Moreover, the accuracy for the optimal training dataset was 86%, with a final loss of 0.0576 at the end of the 500th cycle of training; nevertheless, no model overfitting was identified during training. As presented in Table 1, the model’s accuracy rates for efflorescence, cracking, and defacement were 91%, 86%, and 98%, respectively, but that for spalling was only 76%.

4.2. Defect Localization Using CAM

To further analyze the reasons for the fact that the accuracy rate for spalling was low, we visualized the dataset by applying CAM, a low-cost computation method. In the resulting image (Figure 4), large network responses, indicated in red, were noted. Figure 4 shows the focus of the various artificial neural networks.

Next, a confusion matrix (Table 2) indicated that most of the damage cases in the defacement, efflorescence, and cracking images in the test datasets were classified as spalling, indicating that spalling images in the training dataset may have been exhibiting the characteristics of defacement, efflorescence, and cracking. Thus, the training dataset’s images may have been defective. This study thus re-examined the 1386 spalling images in this dataset and found that 94.44% and 5.56% of these images presented mosaic tiles and lath bricks, respectively.

Because of the small unit area of mosaic tiles, as these tiles fell, they left dirty, black stains behind. Moreover, during the process of capturing images of sample areas, trees may have blocked the light and created shadows (Figure 5, red circles). Thus, during model training, the model misclassified various images of defacement as those of spalling (Figure 6). Similarly, lighting problems during image capture were the reasons for the misclassification of efflorescence as spalling (Figure 7, red circles). Thus, when sunlight was too bright or when the spalling pattern was irregular, the model misclassified efflorescence as spalling during model training (Figure 8). Finally, some cracking was also misclassified as spalling during model training (Figures 9 and 10, red circles).

5. Conclusions

In this study, this study combined a UAV with a deep learning model for automated detection of external wall tile deterioration of buildings and made modifications to improve the efficiency of our method. The results indicated that our model had high accuracy and recall, the respective rates of which were 91% and 80% for efflorescence, 76% and 100% for spalling, 86% and 86% for cracking, and 98% and 78% for defacement (Table 1).

Compared with traditional detection methods, the use of UAVs is inexpensive and affords higher mobility, efficiency, and safety. However, UAV efficiency can be affected by the climate, lighting, wind, and blind spots in the test area and by the limitation of UAV operational technology. In the future, these limitations may be overcome through the use of relatively robust camera lenses, sensors, systems, and automation technologies, making UAVs safer and more efficient and increasing their application in the field of construction.

In the current study, the recognition accuracy for spalling was slightly low, indicating some limitations in spalling recognition from the existing images. Therefore, in future studies, the use of infrared scanners, which detect differences in depth and recognize whether tiles have fallen, is highly recommended to improve recognition accuracy. Besides using larger data, a deeper network can be also considered. Deeper network can identify more detailed characteristics to improve the accuracy. Moreover, in the aspect of simultaneously identifying multiple defect types, different tags can be given in the image and use the corresponding loss functions. In the aspect of normal photos (without deterioration), the normal photo would be also given relatively lower belonging probabilities to the four deterioration types. Two methods are considered to further improve the model adaptation: (1) to set a basic threshold in the model; that is, if the input photos are lower than the threshold, they are classified as background (not belonging to the four types of deterioration); and (2) to take photos of normal exterior wall tiles equivalent to the number of single-deteriorated photos as the background type (the fifth type) and then retrain the model.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.