Locating a fire inside of a structure that is not in the direct field of view of the robot has been researched for intelligent firefighting robots. By classifying fire, smoke, and their thermal reflections, firefighting robots can assess local conditions, decide a proper heading, and autonomously navigate toward a fire. Long-wavelength infrared camera images were used to capture the scene due to the camera’s ability to image through zero visibility smoke. This paper analyzes motion and statistical texture features acquired from thermal images to discover the suitable features for accurate classification. Bayesian classifier is implemented to probabilistically classify multiple classes, and a multiobjective genetic algorithm optimization is performed to investigate the appropriate combination of the features that have the lowest errors and the highest performance. The distributions of multiple feature combinations that have 6.70% or less error were analyzed and the best solution for the classification of fire and smoke was identified.

1. Introduction

Intelligent firefighting humanoid robots are actively being researched to reduce firefighter injuries and deaths as well as increase their effectiveness on performing tasks [15]. One task is locating a fire inside of a structure outside the robot field of view (FOV). Fire, smoke, and their thermal reflections can be clues to determine a heading that will ultimately lead the robot to the fire so that it can suppress it. However, research for accurately classifying these clues has been incomplete.

2. Previous Features

In conventional fire (and/or smoke) detection systems [6, 7] in Table 1, temperature, ionization, and ultraviolet light were mainly used to indicate the presence of a fire and/or smoke inside the structure, but they can have a long response time in large spaces [8] and do not provide sufficient data for the location of fire and/or smoke. Recently using vision systems, color [912], motion [13, 14], both [8, 1517], and texture features [12, 18, 19] have been researched to characterize fire or smoke in Table 1. However, color features from RGB camera are not applicable to firefighting robots due to the fact that RGB cameras may operate in the visible to short wavelength infrared (IR) (less than 1 micron) and are not usable in smoke-filled environments where the visibility has sufficiently decreased [2, 14]. Motion (e.g., dynamical motion, shape changing, etc.) of the feature can be another clue to detect fire and smoke by characterizing flickering flames and smoke flow from a stationary vision system. However, the vision system onboard a robot is moving due to the dynamics of the robot itself, and this causes a large amount of noise that results in extensive computation for motion compensation. Texture features researched in [12, 18, 19] were used to identify fire or smoke. The spatial characteristics of textures can be useful to recognize patterns of fire and smoke by remote sensing and are less influenced by rotation/motion [18].

Long-wavelength infrared cameras, similar to the handheld thermal infrared cameras (TICs) that are typically used to aid in firefighting tasks within smoke-filled environments [2022] as well as fire-front and burned-area recognition in remote sensing [23], are used in this research. Due to the fact that TICs absorb infrared radiation in the long-wavelength IR (7–14 microns), they are able to image surfaces even in dense smoke and zero visibility environments [2, 14]. In addition, TIC can provide proper information under local or global darkness, for example, shadows or darkness caused by damaged lighting. Recently, thermal images from TIC are studied to recognize pattern and motion remotely [24]. The cameras will detect hot objects as well as thermal reflections off of surfaces. As a result, image processing on detected objects must be sufficiently robust to discern between desired objects and their thermal reflections.

This study ultimately leads the shipboard autonomous firefighting robot (SAFFiR), whose prototype is displayed in Figure 1, to autonomously navigate toward fire outside FOV in indoor fire environments. For this, the robot needs to identify clues such as smoke and smoke and fire-reflections by itself to correctly navigate toward the fire. However, the recognition of key features has not been fully studied. This paper analyzes appropriate combination of features to accurately classify fire, smoke, their thermal reflections, and other hot objects using thermal infrared images. Large-scale fire tests were conducted to create actual fire environments having various ranges of both temperature and smoke conditions. A long-wavelength IR camera was installed to produce 14-bit thermal images of the fire environment. These images were used to extract motion and statistical texture features in regions of interest (ROI). Bayesian classification was performed to probabilistically identify multiple classes in real-time. To identify the best combination of features for accurate classification, the multiobjective optimization was implemented using two objective functions: resubstitution and cross-validation errors.

3. Motion and Texture Features

In pattern recognition system, the choice of features plays an important role in the performance of classification. Both motion and texture features were selected because they were crucial in the previous study of fire and/or smoke detection and also best suitable for the thermal image analysis that is major information the firefighting robot can acquire under fire environments. Optical flow, a popular motion measurement, was used for the motion features, while the first- and second-statistical texture features were applied for the texture measurement.

A FLIR A35 long-wavelength IR camera, which is capable of imaging through zero visibility environments, was used to produce images. All images were from a 320 × 256-pixel focal plane array, 60 Hz frame rate that produces 14-bit images with an intensity range of −16384 for −40°C to −1 for 550°C. Fifteen features from optical flow and the statistical texture features are evaluated to find the best feature combination. Optical flow shows temporal variations due to moving objects in the FOV or motion of the robot. The first- and second-order statistical texture features display spatial characteristics of objects in the scene.

3.1. Motion Features by Optical Flow

Optical flow is a useful tool to recognize motion of an object in sequential images [30]. It consists of local and global methods. Lucas-Kanade (LK) is a local method that is relatively robust with a less dense flow field, while Horn-Schunck (HS) is a global method with a dense flow field and high sensitivity to noise [31]. Because the intensities in the thermal image change due to the varying fire environment, LK method that has higher robustness compared with HS was selected in this research to measure motion features of the objects. Two features of optical flow vector number (OFVN) and optical flow mean magnitude (OFVMM) were computed to quantitatively characterize motions of fire, smoke, and their reflections. Figure 2 contains RGB and thermal images of dense smoke in a hallway and a wood crib fire in a room. Red arrows in the thermal images indicate the direction and magnitude of the optical flow vectors with red boxes that show smoke, fire, and thermal reflections.

3.2. First- and Second-Order Statistical Texture Features

The first- and second-order statistical features were considered in this study for object classification. The first-order statistical features estimate individual property of pixels, not characterizing any relationship between neighboring pixels, and can be computed using the intensity histogram of the candidate region of interest (ROI) in the image. As described in [32], mean (MNI), variance (VAR), standard deviation (STD), skewness (SKE), and kurtosis (KUR) were calculated bywhere refers to the intensity of a pixel at and and denotes the number of pixels (NOP) of the object in the image. The second-order statistical features represent spatial relationships between a pixel and its neighbors. Gray-level cooccurrence matrix (GLCM) [33] is used to account for adjacent pixel relationships in four directions (horizontal, vertical, left, and right diagonals) by quantizing the spatial cooccurrence of neighboring pixels. A total of seven second-order statistics features were used including dissimilarity (DIS), entropy (ENT), contrast (CON), inverse difference (INV), correlation (COR), uniformity (UNI), and inverse difference moment (IDM). To measure these features, a normalized cooccurrence matrix is used which can be defined aswhere refers to the frequency of occurrences of the gray-level of adjacent pixels at and within the four directions and denotes the number of the gray-levels in the quantized image. The denominator of (2) normalizes to be estimates of the cooccurrence probabilities. After building the normalized cooccurrence matrix , seven features of the second-order statistics features were computed by

4. Object Extraction and Bayesian Classification

One of the main characteristics of fire, smoke, and their thermal reflections in thermal images is that they are higher in intensity than the background. With intensity related to temperature in the thermal image, higher temperature objects appear brighter than the background. Hence, intensity is a primary factor for object extraction from the background. Assuming that the thermal image histogram has a bimodal distribution for foreground (i.e., object) and background, the clustering-based image autothresholding method [34], called Otsu method, can calculate an optimum threshold that separates objects and background creating a binary image with 0 being the background and 1 being the objects. The binary images were filtered to remove small regions and holes inside objects through morphological filtering techniques. After convoluting the original 14-bit image with the filtered-binary image, a final image was obtained that includes the original 14-bit intensities in objects as well as zeroes in the background.

There are several classification methods commonly used in supervised machine learning; -nearest neighbors (NN), decision tree (DT), neural networks (NN), support vector machine (SVM), and Naïve Bayesian. For this study, these classification methods were analyzed by considering three points: capability to classify multiple classes such as fire, smoke, and their thermal reflections; less chance of overfitting problem because, under fire environments, there could be a number of situations that are not learned or trained; real-time implementation because firefighting robot needs to make a decision in real-time; otherwise it cannot operate its task. NN is insensitive to outliers but it needs a large amount of memory and expensive computation [35]. DT has low computation burden but, for the multiclasses classification, it may generate a complicated tree structure and may cause overfitting problem [35, 36]. NN shows high performance when processing with multidimensions and continuous features but cannot overcome overfitting problem. SVM provides fast computation and the highest accuracy but it cannot be used for the multilabel classification because it produces binary results [37]. Naïve Bayesian classification is Bayes’ theorem-based probabilistic classification and is popular for pattern recognition applications. Although this method has lower accuracy compared with other classifiers and assumes that each feature is independent, it has fast computation, robustness to untrained cases, and less chance of overfitting [35]. In addition, this classification has the capability of probabilistic decision making over multiple classes with fast computation for real-time implementation. In this study, Bayesian classification is used for evaluation of each feature.

With several given features, (motion and texture features) we can calculate the probability that one class (fire, smoke, thermal reflections, etc.) corresponds to the candidate by using a conditional probability,, also known as the posterior probability. By using Bayes’ theorem, it can be written with prior, likelihood, and evidence as shown in where is the prior probability, meaning it represents candidate probability to be and can be calculated by number of samples of class divided by the total number of samples. is the likelihood function and the denominator of (4) is the evidence that plays as a normalizing constant by the summation of production between the prior and likelihood at each class. By applying the conditional independence assumption, the likelihood function can be rewritten byThe conditional probability density function can be described aswhereAs shown in Table 2, Gaussian parameters for fifteen features with respect to smoke, smoke thermal reflection, fire, and fire thermal reflection were estimated by using the maximum likelihood estimation [38]. Probability density distributions for the entire features are illustrated in Figure 3. With (5), the evidence and then the posterior probability of each class were calculated. By applying the maximum priority decision rule in (8), the Bayesian classification was used to predict the class and probability of each candidate in the scene:

Figure 3 shows probability density distribution of each class using the Gaussian parameters of Table 2. Gaussian distribution for classes in Figure 3 shows how fire, fire-reflection, smoke, and smoke-reflection are distributed by the fifteen features. Some features split out the distribution of the four classes while others cause overlap. For example, MNI best describes a well split out case of the classes, although smoke and its reflection and fire and its reflection do overlap. SKE shows the worst case in which all classes overlap making it impossible to distinguish any of the four classes.

5. Result and Discussion

The accuracy in classifying fire objects was analyzed using data from a series of large-scale tests in the facility [1] using actual fires up to 75 kW. Fires included latex foam, wood cribs, and propane gas fires from a sand burner. These different types of fires produced a range of temperature and smoke conditions. Latex foam fires produced lower temperature conditions but dense, low visibility smoke. Conversely, propane gas fires produced higher gas temperatures and light smoke. Wood crib fires resulted in smoke and gas temperatures between those of latex foam and propane gas fires; however, these fires resulted in sparks created from the burning wood. Thermal images were collected by driving a wheeled mobile robot through the setup during a fire test. A total of 10,775 objects were collected from the experiments and categorized as either smoke, smoke-reflection, fire, fire-reflection, or other hot objects in order to be served as clues to lead the firefighting robot to navigate toward the fire source outside the FOV. In addition, as each object has sixteen corresponding data points (fifteen features and a class), the total number of data points used in this paper is 172,400. The numbers of each object in this experiment are shown in Table 3.

Two types of error criterions (resubstitution and -fold cross-validation errors [39]) were used to measure how each feature accurately performs in the classification. Resubstitution error takes the entire dataset to compare the actual classes with the predicted classes by the Bayesian classification in order to examine how well the actual and predicted classes match each other. When this criterion is used alone to enhance accuracy, the classification can be overfitted to the training dataset. Cross-validation error is advantageous to detect and prevent from overfitting. Instead of using the entire dataset, cross-validation randomly selects and splits the dataset into partitions of approximately equal size to estimate a mean error by comparing between the randomly selected partition and trained results of the remaining partitions.

5.1. Single Feature Performance

The performance results of each feature are shown in Table 4. The first-order statistical texture features MNI, VAR, and STD produced the lowest errors while NOP, SKE, and KUR show the highest. These results show that MNI and VAR are beneficial to distinguish fire, smoke, and thermal reflections while motion features are not. As NOP shows the highest error, OFVMM, one of the motion features, shows the second highest errors compared with the other features. This is in part attributed to the dynamic motion of the robot. ENT and COR second-order statistical texture features show 42~45% error, which is higher than the other second-order features.

5.2. Multiple Feature Combination Performance

The error results in Table 4 demonstrate that a single feature cannot accurately classify fire, smoke, and thermal reflections. Thus, possible combination of multiple features was considered and analyzed to find the best combination of the features. The total number of all possible combinations that have two or more features is where refers to the total number of features (i.e., ) and is the number of features in the combination. Based on all possible combination, the multiobjective genetic algorithm optimization [40] in the global optimization toolbox of MATLAB was used to find the best combination of features that has the highest performance in the classification. The objective functions in the optimization, resubstitution and -fold cross-validation errors [39], were used to measure how accurately different feature combinations perform in the classification.

Figure 4 contains a plot of the error associated with the most promising feature combinations. The behavioral solution set is defined as feature combinations with less than 7% error for both objective functions while the general set refers to all other possible feature combinations. The behavioral solution set contains 0.0061% of all possible feature combinations.

The occurrence probability of features in the behavioral solution set is illustrated in Figure 5. In the behavioral solution set, the first-order statistic texture features MNI and SKE always exist while OFVN, NOP, and OFVMM features do not. Both the first-order statistical texture features STD and VAR and the second-order statistical texture features COR, ENT, and DIS show a higher occurrence compared with the other first- and second-order texture features while KUR, IDM, UNI, IND, and CON show lower occurrence. Note that, due to the robot’s dynamical motion, motion features were not successful and even not included in the top 10 feature combinations of the behavioral set.

The top features based on the probability occurrence in Figure 5 are COR, ENT, DIS, SKE, STD, VAR, and MNI. However, the combination of these seven features does not result in the best solution for classification. Table 5 contains the classification performance of the combination of features in the behavioral solution set. In order to evaluate the performance of each feature combination, various performance measures have been used such as precision, sensitivity, F-measure, and accuracy. Precision measures the fraction of positive instances from the group that the classifier predicted to be positive, and recall measures the fraction of positive examples from the positive group of the actual class and [35]. F-measure is the harmonic mean, and accuracy is the proportion of true results. These measures can be mathematically defined aswhere TP is correctly classified positive cases, FP is incorrectly classified negative cases, and FN is incorrectly classified positive cases. For the performance measurement, confusion matrixes were created as described in Appendix and applied into (10). In the precision, index number 1 combination shows the highest performance in the behavioral solution set while index number 7 combination shows the lowest. In the sensitivity, index number 7 combination records the highest results while index number 4 does the lowest. In the F-measure and accuracy, index number 2 combination shows the highest record while index number 4 does the lowest. Based on the confusion matrixes, most of misclassification occurs in the classification of smoke, smoke-reflection, and other hot objects, because, during small fire, texture patterns of these classes were diminished and the intensity was too low to distinguish. The best solution was determined to be index number 2 combination of MNI, DIS, COR, SKE, and STD, which has the lowest of resubstitution and cross-validation errors, 6.68% and 6.70%, respectively. This combination includes all of the top features based on the probability occurrence except ENT and VAR. The four performance results at each feature combination in the behavioral solution set are shown in Figure 6 where the highest results are marked in red circles and the lowest in green-dot circles. Sensitivity appears higher than precision at each feature combination because FPs are larger than FNs in the confusion matrix. Particularly, index number 7 has the biggest difference between FP and FN resulting in the highest sensitivity and lowest precision. The summation of FP and FN in index number 4 is the highest in the behavioral solution set resulting in the lowest accuracy while index number 2 has the lowest summation of FP and FN providing the highest accuracy.

This study investigated a wide range of features from long-wavelength infrared camera images, analyzed normal distributions of fifteen features with respect to the classes of smoke, fire, and their thermal reflections, and discovered the highest performing feature combination by examining single features and multiple feature combinations. As a result, the proposed feature combination of MNI, DIS, COR, SKE, and STD increases the performance compared with the previous study [1] which used MNI, VAR, ENT, and IDM. As shown in Figure 7, the errors are reduced by 2.86% and 2.68% resubstitution and cross-validation errors and performances are increased by 2.90%, 1.58%, 0.20%, and 2.85%, accuracy, F-measure, sensitivity, and precision, respectively.

Figure 8 shows original visual and thermal images with the robot at three different locations: start point, hallway entrance, and room entrance described in the experimental facility. Each row relates to a series of images from the robot at three locations. The first row contains visible images of the robot view. As seen in the visible image at start point, further information regarding the hallway is limited due to shadowing of the light. The image at hallway entrance shows a smoke layer in the upper portion of the hallway due to a fire inside the room. The image at the room entrance displays a wood crib fire with sparks. Because of soot and relative difference in brightness, the background is shown darker and thus limiting information on the background around the fire.

Thermal infrared images are displayed in the second row to show information that RGB camera cannot provide in fire environments. Unlike visual image at start point that is obscured due to shadowing, the presence of smoke and its thermal reflections on the ventilation hood can be obviously perceived. The red boxes on thermal images indicate objects extracted through the adaptive object extraction with optical flows and identification numbers. In spite of dense smoke-filled and low visibility environments, thermal images can generate the images of smoke and fire, as well as background information that is otherwise not visible through visual imaging.

On the third row, class labels and posterior probabilities of each candidate are displayed at the center of candidate ROI as a result of Bayesian classification. Using enhanced image processing techniques, the thermal images can be more refined and clearer than the thermal images on the second row. Smoke, fire, and their thermal reflections are identified and marked in red or orange ellipses.

6. Conclusion

The appropriate combination of features was investigated to accurately classify fire, smoke, and their thermal reflections using thermal images. Gray-scale 14-bit images from a single infrared camera were used to extract motion and texture features by applying a clustering-based, autothresholding technique. Bayesian classification is performed to probabilistically identify multiple classes during real-time implementation. To find the best combination of features, a multiobjective genetic algorithm optimization was implemented using resubstitution and cross-validation errors as objective functions. Large-scale fire tests with different fire sources were conducted to create a range of temperature and smoke conditions to evaluate the feature combinations.

Fifteen motion and texture features were analyzed and the probability density functions of the features were computed by the maximum likelihood estimation. The combination of multiple features was determined to more accurately classify fire, smoke, and thermal reflections compared with a single feature. In the behavioral solution set where feature combinations produce less than 7% resubstitution and cross-validation errors, COR, ENT, DIS, SKE, STD, VAR, and MNI had 80.0% or more occurrence while other features had 40.0% or less occurrence. The feature combination of MNI, DIS, COR, SKE, and STD produced the highest performance in the classification resulting in 6.68% and 6.70%, resubstitution and cross-validation errors, and 95.64%, 97.61%, 96.62%, and 93.45%, precision, sensitivity, F-measure, and accuracy, respectively.

In the near future, the classification of fire, smoke, and their thermal reflections will be evaluated on any classifiers and features to increase performance. The convolution neural network of deep learning which has recently shown high performance could be explored as a classifier; also model-based image features such as discrete wavelet transform will be further studied.


See Figure 9.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this manuscript.


This work was sponsored by the Office of Naval Research Grant no. N00014-11-1-0074 scientific office Dr. Thomas McKenna in USA, Hwarang-dae Research Institute in Seoul, and Agency for Defense Development in Daejeon, South Korea. The authors would like to thank Mr. Joseph Starr and Mr. Josh McNeil for assisting in performing the fire tests. The authors would also like to thank Rosana K. Lee for helping and supporting this research.