As the most common serious disaster, fire may cause a lot of damages. Early detection and treatment of fires are of great significance to ensure public safety and to reduce losses caused by fires. However, traditional fire detectors are facing some focus issues such as low sensitivity and limited detection scenes. To overcome these problems, a video fire detection hybrid method based on random forest (RF) feature selection and back propagation (BP) neural network is proposed. The improved flame color model in RGB and HSI space and the visual background extractor (ViBe) in moving target detection algorithm are used to segment the suspected flame regions. Then, multidimensional features of flames are extracted from the suspected regions, and these extracted features are combined and selected according to the RF feature importance analysis. Finally, a BP neural network model is constructed for multifeature fusion and fire recognition. The test results on several experimental video sets show that the proposed method can effectively avoid feature interference and has an excellent recognition effect on fires in a variety of scenarios. The proposed method is applicable for fire recognition applied in video surveillance and detection robots.

1. Introduction

With the continuous improvement of social development, the possibility of fires with large losses is also increasing. The traditional fire recognition methods mainly use detectors to sample the temperature, spectrum, and smoke of a specific region to determine whether a fire has occurred [1]. However, due to the limitation of the detection range and sensitivity of the detector, it is difficult to achieve timely detection in open spaces and harsh environments.

In recent years, fire recognition based on image processing technology has become the focus of researchers’ attention. By analyzing the static and dynamic features of flame, researchers can distinguish flames from other objects. The static features of flame mainly include color and texture features. Chen et al. [2] proposed a flame recognition rule in RGB and HSI color space by analyzing the chromaticity and disorder of flames. Celik and Demirel [3] proposed a flame classification model based on YCbCr color space, and this flame model can better adapt to lighting changes compared with the RGB color model. Wang and Ren [4] improved the image brightness by performing histogram equalization on the V channel in HSV color space and then combined the flame pixel distribution rules in RGB and YCbCr color space to perform image segmentation. Prema et al. [5] extracted texture features based on wavelet decomposition and then used an extreme learning classifier to classify images. Their method can effectively eliminate red interference. Sheng et al. [6] studied the color, texture, and gray-scale statistical features of flame images and constructed a deep belief network (DBN) for fire detection. Jamali et al. [7] introduced texture features based on color features to detect fires. They combined different features to improve the accuracy of the fire detection system.

The dynamic features of flame mainly include motion features and geometric change features. Barnich and Van Droogenbroeck [8] proposed a visual background extractor (ViBe) algorithm, which provides a new idea for extracting flame motion features. To improve the overall performance of a fire detection system, Foggia et al. [9] attempted to combine the information of color, motion, and shape change to reduce the false alarm rate of the system. Gong et al. [10] proposed a flame centroid stabilization algorithm. They used color and motion features to segment suspected flame regions and further judged whether it is a flame by analyzing the changes of its area, shape, and center of mass.

The fire recognition model mainly includes machine learning classifiers and deep learning networks. Yang et al. [11] analyzed the shape features of flames and used support vector machine (SVM) to recognize fire images. Huang and Du [12] proposed a fire recognition method combining rough set (RS) theory and SVM. Their method can reduce overfitting and improve the accuracy of fire prediction. Qu et al. [13] extracted the red and blue color components and brightness component of flames in YCbCr color space and used the back propagation (BP) neural network model to detect fires. Li and Zhao [14] performed fire detection based on convolutional neural network (CNN) such as Faster-RCNN, R-FCN, SSD, and YOLO v3. Sheng et al. [15] improved the detection capabilities of flames and smokes by combining simple linear iterative clustering (SLIC), density-based spatial clustering of application with noise (DBSCAN), and CNN. Muhammad et al. [16] proposed a CNN framework for fire detection. The framework can be applied to video surveillance systems.

The above researches have enriched and developed fire recognition methods, but there are still some important problems. Firstly, region segmentation based on color features will inevitably overlap with the color space of other objects, which may result in inaccurate segmentation of suspected flame regions. Secondly, the extracted flame features are not comprehensive or have little correlation to the classification and recognition model, which makes it difficult to achieve accurate fire recognition in complex scenes. Thirdly, deep learning methods require high equipment performance, and the related model training and testing are complex.

In view of the shortcomings of the current fire recognition methods, the flame color model and the ViBe algorithm are combined to segment suspected flame regions in this paper. Then, multidimensional features are extracted on the suspected flame regions, and the random forest (RF) method is used to analyze the importance of each feature, and then, these features are combined and selected accordingly. Finally, a BP neural network model is built for fire detection. The proposed fire detection method is verified by using several experimental video sets.

This paper is organized as follows: Section 2 describes a hybrid approach adopted in this study. Section 3 presents experiments and experimental results with discussions. Section 4 concludes this paper.

2. Hybrid Method

2.1. Region Segmentation Based on Color Model and ViBe Algorithm

Region segmentation is a process to divide an image (being processed) into multiple regions with different characteristics and then to separate the target region from its background. The flame presents a special color distribution pattern and has the dynamic characteristics of spreading and diffusion. Therefore, the suspected flame region in an image can be segmented by using the color model and moving target detection method.

2.1.1. Color Model for Segmentation

In order to describe the color distribution of flame pixels, a statistical flame color model can be established. Yan et al. [17] established a color model of flames in RGB and HSI color space. The color model overcomes the cavity inside flames to a great extent, but there is still undersegmentation of the interference with high saturation. In the study of flame image saturation, Horng et al. [18] proposed an HSI color model of flames, which restricted the saturation component. Combining the characteristics of flames in RGB and HSI color space, we adopt the flame color segmentation rules as shown in where , , and . , , and are the red, blue, and green color components of an image, respectively. is the threshold of the red component of an image. is the threshold of the saturation of an image. For a darker environment, , and a brighter environment, . and are the improved thresholds of the red and green components of an image, respectively. , , and are the thresholds of , , and , respectively.

We use the improved color model, equation (1), to separate the region in the image that meets the constraints of the color model from the background. The images collected by Chino et al. [19] were used to test the segmentation effect. Figure 1 shows a segmentation example of a flame image using different flame color models.

From Figure 1, it can be found that compared with the method in [17], the improved color model can effectively eliminate the brown-yellow wood and the illuminated ground and achieve the accurate segmentation of the flame regions.

2.1.2. ViBe Moving Target Segmentation

It is difficult to exclude objects whose colors are highly similar to flames only by using the color model, so it is necessary to segment moving objects based on the dynamic features of flames in a video. In the process of fire recognition, the interframe difference method [20], optical flow method [21], or background subtraction method [22] are commonly used to segment moving targets. The interframe difference method has low computational complexity, but it depends too much on the moving speed of the target. The optical flow method is computationally complex and sensitive to light. The background subtraction method is simple, and the extracted target is complete. The background subtraction method includes Gaussian mixture modeling (GMM) [23] and ViBe modeling methods [24] to establish the background statistical model. Compared with the GMM method, the ViBe algorithm randomly selects neighborhood pixel values to model or update the background according to the similarity between the neighboring pixels and does not need a lot of estimation or operation, so its time complexity to build a background model is lower. The followings are the calculation process of the ViBe algorithm: (1)Background model initialization. The background model initialization is the process to fill pixel sample set for building a background model. For each pixel in an image, use to represent its pixel value. Taking a certain point as the center, samples are randomly selected within a certain radius to build a sample set of the background model. The sample set can be defined as follows:(2)Moving target detection. For the point in an image to be tested, its pixel value is compared with each sample value in the background model. If the absolute value of the difference with a certain sample point is greater than a given threshold , the sample point is considered not similar to the point to be detected. If there are more than the given number of sample points that are not similar to the point to be detected, the point is considered to be the foreground (moving target). Otherwise, the detected point is the background, as shown inwhere represents the distance between the point to be detected and the th sample point. represents the judgment result of background or foreground (3)Background model update. The ViBe algorithm uses a random updating strategy to update the background model from time to time. After a certain period of time , the probability that a certain sample in the sample set is still retained can be defined as follows:

The accuracy of region segmentation will affect feature extraction. The more accurate region segmentation is, the more extracted flame features can reflect the actual situation of flames. In the proposed hybrid method of video fire detection, color and motion features are combined for region segmentation. Based on the color distribution of flames in RGB and HSI color space, the improved color model is adopted to obtain segmented images. Meanwhile, the ViBe algorithm is adopted to obtain segmented images based on motion feature. The improved color model segmentation images and ViBe algorithm segmentation images are intersected, and the final suspected flame regions are obtained after corrosion, expansion, and region filling. Figure 2 shows some segmentation examples based on the improved color model and ViBe algorithm. As shown in Figure 2, the region segmentation method combining the improved color model and ViBe algorithm can effectively exclude the land with color similar to flame and the moving pedestrians and get an accurate flame region.

2.2. Multidimensional Feature Extraction and RF Feature Selection

With the improvement of the ability of sensors to obtain information, multifeature fusion technology is widely used in the field of image recognition [25]. After region segmentation, there may still be regions that are mistakenly segmented as flames. Thus, it is necessary to extract multidimensional features such as geometry, texture, and dynamics of images and use multifeature fusion technology to achieve fire detection. However, some features are irrelevant or redundant to the recognition model, so it is necessary to select the feature subset with high discriminative ability from the multidimensional features.

2.2.1. Multidimensional Feature Extraction

The multidimensional features extracted from an image mainly include its circularity, aspect ratio, texture features, area change rate, flicker feature, and edge jitter feature. The followings are a brief introduction and related formulas for each of them: (1)Circularity. Circularity reflects the degree to which the shape of the target region is close to a theoretical circle. The circularity is calculated aswhere represents the area of the flame region and represents the perimeter of the flame region (2)Aspect ratio. Aspect ratio reflects the stretching degree of a flame. The aspect ratio is calculated aswhere represents the width of the smallest rectangle in the flame region and represents the length of the smallest rectangle in the flame region (3)Texture features. In the description of the spatial relationship of image textures, the commonly used method is based on the statistical gray-level cooccurrence matrix (GLCM) [26], which is used to calculate the representative contrast, correlation, energy, and homogeneity of texture features. Specifically, the pixel offset is set to 1, and the texture feature statistics are calculated from the GLCM at four different angles of 0°, 45°, 90°, and 135°, and then, the average value of the four directions is taken as texture feature criterion for fire recognition.

The gray value of two points at a certain distance in an image is represented by . Suppose is the probability of the gray value . The formula of its contrast , correlation , energy , and homogeneity of the image is as follows: where represents the gray level of an image. and represent the mean value of in row and column, respectively. and represent the standard deviation of in row and column, respectively (4)Area change rate. Area change rate represents the area change of a flame region. The area change rate is calculated aswhere represents the area of the flame region in the current frame and represents the area of the flame region in the previous frame (5)Flicker feature. Flame flicker will cause the image pixels to change from nonflame to flame. In order to effectively reflect the flicker characteristics of a flame without increasing the complexity of the algorithm, the change amplitude of the flame foreground is used to characterize the flicker feature of the flame. The flicker feature is calculated as(6)Edge jitter feature. Edge jitter measures the degree of the edge change of an object in the process of deformation. The edge jitter feature is calculated aswhere represents the perimeter of the flame region of the current frame and represents the perimeter of the flame region of the previous frame

2.2.2. Random Forest Feature Selection

Random forest (RF) [27] is an integrated machine learning method that uses decision tree as the basic learner and makes decision through voting mechanism. For feature selection, the importance of a single feature variable is calculated by the RF method, and then finding the feature variables that are highly related to the dependent variable. Thus, we can select a small number of feature variables which can fully guarantee the accuracy of the prediction results.

The purpose of feature selection is to select the relevant feature subset from the existing feature set. After feature extraction on the suspected flame region, we can get a 9-dimensional feature vector . We use the RF method to select the features with a strong correlation with the actual results as the input of the BP neural network model.

Since there is no published standard dataset in the field of fire prevention and detection at present, we mainly use a series of typical databases proposed in [2830] to build our video set. Our video set includes 30 fire videos and 10 interference videos. Fire videos mainly include different shapes of flames in indoor, highway, and forest scenes. Interference videos mainly include street lights, car lights, and red objects. The resolution of the videos is uniformly adjusted to , and 20 consecutive frames of images are selected from each video to form a sample dataset. The sample dataset has a total of 800 images, some of which are shown in Figure 3.

In this paper, the suspected flame regions are segmented on the sample dataset and the multidimensional features of the flame are extracted. The RF feature importance is calculated based on the Scikit-learn tool in the Python machine learning library. Specifically, the number of decision trees in the RF is set to 100, and the maximum depth of the decision trees is set to unlimited. The minimum number of samples required to split an internal node is set to 2; the minimum number of samples required to be at a leaf node is set to 1. The measurement criterion of split quality is set to mean square error (MSE). The ranking of the feature importance is given in Figure 4. From Figure 4, it can be found that flicker feature has the highest ranking and edge jitter feature has the lowest ranking.

According to the ranking in Figure 4, the flame features are divided into combined features F1-F9, which, respectively, represent , , , , , , , , and . The BP neural network method is used for training and classification, and the correct classification results of each combination feature are given in Figure 5. The correct classification rate is the average of 10 prediction results.

According to the classification results of the combined features as shown in Figure 5, the correct classification rate of samples reaches a high level starting from the 5th combined feature, F5, and with the increase of features, the correct classification rate of samples tends to be stable. To avoid interference caused by excessive features and ensure the accuracy of classification results, the 5th combined feature, F5, is selected for fire recognition, which includes circularity, aspect ratio, contrast, energy, and flicker feature.

2.3. Construction of the BP Neural Network Model

The BP neural network [31] is also called error back propagation neural network. Because the weights of neurons in adjacent layers are interrelated, the network has a nonlinear mapping ability to solve complex problems. Considering the simplicity and practicability, a three-layer BP neural network with one input layer, one hidden layer, and one output layer is constructed for multifeature fusion and fire recognition.

The number of nodes in the input layer of the BP neural network depends on the dimension of the feature vector. In this paper, five features of each fire image are selected to form the feature vector , so there are 5 nodes in the input layer. The output layer outputs the recognition result, so there is 1 node. The normalized output value ranges from [0, 1], and when the output value belongs to [0, 0.5), it means nonflame. When the output value belongs to [0.5, 1], it is expressed as flame. The number of hidden layer nodes is calculated using empirical equation (12), in which, and represent the number of nodes in the input layer and output layer, respectively, and is a constant between [1, 10]. Through repeated experiments, it is found that when the number of the hidden layer nodes is 10, the neural network achieves the optimal training effect.

Based on above analysis, the structure of the BP neural network constructed in this paper is given in Figure 6. The BP neural network model will be used for fire recognition in the following section.

3. Experiments and Results Analysis

3.1. Experimental Environment

The experimental environment is Windows 10 operating system, 8 GB memory, Intel (R) core (TM) i5-10500 CPU @ 3.10 GHz, MATLAB 2018a platform.

3.2. Detection Process

The process of video fire detection in this paper is given in Figure 7.

3.3. Training and Experiments

The sample dataset of the training set and the test set is divided based on typical standards of 70% and 30%. The division is given in Table 1. Specifically, the maximum number of epochs is set to 2000. The Levenberg-Marquardt (LM) algorithm is selected as the learning algorithm, and the learning rate is set to 0.001. MSE is used as the loss function, and the minimum value of the loss function is set to 0.01. The neural network training performance is shown in Figure 8. From Figure 8, it can be found that the value of the loss function gradually decreases during the training process and reaches the minimum value of the set loss function at 30 epochs.

After neural network training is completed, the correct classification rate on the test sample set is 96.67%. In order to further verify the performance of the fire detection method, five videos that have not participated in the training are selected to test the trained neural network model, including three fire videos and two interference videos. The experimental video sets are shown in Figure 9. The descriptions of the experimental videos are shown in Table 2.

3.4. Results Analysis

The combined RF feature selection and BP neural network method (abbreviated as RF-BP) and the directly performed BP neural network method (abbreviated as Dir-BP) are used to do tests with several experimental videos. Furthermore, the two proposed methods are compared with the fire recognition method in [11, 13]. The accuracy, precision, recall, and -score are used to evaluate the effect on fire recognition [32], as shown in where TP represents the number of correctly classified frames of the fire videos. TN represents the number of correctly classified frames of the interference videos. FN represents the number of misclassified frames of the fire videos. FP represents the number of misclassified frames of the interference videos.

The fire recognition result is determined by the output value of BP neural network. The experimental results are given in Table 3, and the recognition effect evaluation is given in Table 4.

For the scene in Figure 9, the lights and the ground illuminated by the lights may be segmented into suspected flame areas. In [11], the shape change and centroid displacement of flame are concerned, and the recognition of fire images was realized by support vector machine. Since the lights follow the movement of the vehicle in interference videos, the ground illuminated by the lights has dynamic features similar to flames. This method did not perform well in interference videos (video4 and video5) according to the results in Table 3. In [13], the recognition of fire images was realized by the fusion of Y, Cb, and Cr components in YCbCr color space. There were too many false detection frames in fire videos (video1, video2, and video3) according to the results in Table 3. This shows that using only the color features of flames as a criterion, it is difficult to distinguish the interference of objects similar to the flame color in the background. Compared with the method in [11, 13], the proposed Dir-BP method directly performs BP neural network fusion, in which the extracted features include 9-dimensional features and are not selected. Since the texture and dynamic features of the flames are considered, this method has a certain degree of improvement in accuracy, precision, recall, and -score according to the results in Table 4. However, due to the features with small correlation that may interfere with the recognition results, the number of false detection frames is still large. The proposed RF-BP method with the combination of RF feature selection and BP neural network makes full use of the geometric features, texture features, and dynamic features of flames, which can effectively avoid some feature interferences and perform the best in the evaluation indexes.

4. Conclusion

In this paper, an effective video fire detection hybrid method based on RF feature selection and BP neural network is proposed. The improved color model and ViBe algorithm are used to segment the suspected flame regions, and the RF importance analysis method is used for the feature combination and selection. Multidimensional features of flames are extracted from the suspected regions, and these extracted features are combined and selected according to the RF feature importance analysis. In addition, a BP neural network model is built for multifeature fusion to determine the fire recognition result. Experimental results show that the features extracted by this RF-BP hybrid approach can effectively avoid the interference caused by the features with small correlation and can well complete the fire detection of experimental videos in different scenes.

The proposed method will be further investigated and have a possible application for fire recognition in video surveillance and detection robots. Considering that the performance of the BP neural network model is easily affected by the sample dataset, the next step is to expand the existing dataset to make the features covered by the training samples with more comprehensive information.

Data Availability

The data supporting this study are from previously reported studies and datasets, which have been cited in the article.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.


This work was supported by the National Key R&D Program of China (Grant No. 2019YFB1312102), the National Natural Science Foundation of China (Grant No. U20A20201), the Key R&D Program of Hebei Province (Grant No. 20311803D), and the Natural Science Foundation of Hebei Province (Grant No. E2019202338).